Send to Kindle

Wednesday, July 23, 2014

Varying Only the Import in Dart


Tonight, I explore a new kind of Dart refactoring: varying the import statement.

This occurs in the benchmarking code for the Visitor Pattern as part of my research for the future Design Patterns in Dart. I have three different approaches that I would to compare. After refactoring and refactoring and refactoring, I am finally down to very slim scripts that only vary by the implementation code being imported (and the name of the benchmark):
$ diff -u tool/benchmark.dart tool/benchmark_single_dispatch_iteration.dart
--- tool/benchmark.dart 2014-07-23 23:11:55.933361972 -0400
+++ tool/benchmark_single_dispatch_iteration.dart       2014-07-23 23:12:45.545363182 -0400
@@ -2,12 +2,12 @@
 
 import 'package:dpid_benchmarking/pattern_benchmark.dart';
 
-import 'package:visitor_code/visitor.dart';
+import 'package:visitor_code/alt/single_dispatch_iteration/visitor.dart';
 
 main(List<String> args) {
   BenchmarkRunner.main(
     args,
-    "Classic Visitor Pattern",
+    "Nodes iterate w/ single dispatch",
     _setup,
     _run
   );
The BenchmarkRunner.main() method signature is a little ugly, but aside from the minor quibble I feel pretty good about this. Except…

There are two dozen lines of code that follow this that are exactly identical in each of the three benchmark scripts. The setup and the actual code that is executed is 100% duplicate between the three files. It looks something like:
var visitor, nodes;

_setup() {
  visitor = new PricingVisitor();

  nodes = new InventoryCollection([mobile(), tablet(), laptop()]);
  // Add more sub-nodes to complexify the structure...
}

_run(loopSize) {
  // nodes used
  // visitor used
  // (nodes accept visitor a bunch of times)
}
I am creating a closure over visitor and nodes so that they can be shared between the setup and benchmark runner when executed by BenchmarkRunner.

What is important to me in this case is that the PricingVisitor and InventoryCollection classes have the same name, but are defined differently by the three different imported packages in the three different scripts that I have.

This is almost certainly unique to benchmarking considerations, but still, how can I move this duplicated code out into a single file that can be shared? Dart parts will not work because the file containing the common setup and run code would have to be the main file and only one of the three implementations could be part of it. Conditional imports do not work in Dart.

Unfortunately, I am more or less stumped on a good way to do this. Dart is not meant to used like this (I'm not sure any language is). That said, I can come up with something that works. I have to use the Factory Method pattern to create the visitor and the node structure (I also have to pull in makers for the nodes within the structure). In the end, the overhead does not seem to save much in the way of lines of code:
import '_setup.dart' as Setup;

main(List<String> args) {
  Setup.visitorMaker = ()=> new PricingVisitor();
  Setup.inventoryCollectionMaker = (items) => new InventoryCollection(items);
  Setup.mobile = mobile;
  Setup.tablet = tablet;
  Setup.laptop = laptop;
  Setup.app = app;

  BenchmarkRunner.main(
    args,
    "Classic Visitor Pattern",
    Setup.setup,
    Setup.run
  );
}
What I do gain is the ability to keep the test setup and running logic in one place: _setup.dart. And in there, I simply define top-level variables to support the Setup.XXX setters:
library visitor_setup;

var visitorMaker;
var inventoryCollectionMaker;
var mobile;
var tablet;
var laptop;
var app;

var visitor, nodes;

setup() {
  visitor = visitorMaker();
  // ...
}
// ...
I am not satisfied with that, but it does have the advantage of keeping the common logic in one place. At the risk of rationalizing this too much, I note that the 6 Setup assignments are only needed because I am using 4 node types to intentionally create complexity in the data structure being tested.

I will leave this as “good enough” for the time being. Using this on other design patterns will ultimately decide if this approach is usable. So I will pick back up with that tomorrow.



Day #131

Tuesday, July 22, 2014

Internal Dart Packages for Organizing Codebases


This may very well apply only to me…

I would like to re-use some shell and Dart benchmarking code. I will not be duplicating code so I have to find a working solution. The problem with which I am faced is that I am not working on a single Dart package, but dozens—one for each pattern that will be covered in Design Patterns in Dart. Each package has its own internal Dart Pub structure, complete with pubspec.yaml and the usual subdirectories:
$ tree -L 2
.
├── factory_method
│   ├── build
│   ├── lib
│   ├── packages
│   ├── pubspec.lock
│   ├── pubspec.yaml
│   ├── tool
│   └── web
└── visitor
    ├── bin
    ├── lib
    ├── packages
    ├── pubspec.lock
    ├── pubspec.yaml
    └── tool
The code that I would like to share currently exists only in the Visitor Pattern's tool subdirectory. I suppose that I could create another top-level directory like helpers and then import the common code into visitor, factory_method and the still-to-be-written directories. That seems like a recipe for an unmaintainable codebase—I will wind up with crazy depths and amounts of relative imports (e.g. import '../../../../helpers/benchmark.dart') strewn about. And the coding gods help me should I ever want to rename things.

Instead, I think that I will create a top-level packages directory to hold my common benchmarking code, as well as any other common code that I might want to use. As the name suggests, I can create this as an actual Dart Pub package, but instead of publishing it to pub.dartlang.org, I can keep it local:
$ tree -L 2
.
├── factory_method
│   └── ...
├── packages
│   └── benchmarking
└── visitor
    └── ...
I am probably going to regret this, but I named the package's subdirectory as the relatively brief ”benchmarking,” but name the package dpid_benchmarking in pubspec.yaml. The idea here is to save a few keystrokes on the directory name, but ensure that my local package names do not conflict with any that might need to be used as dependencies. So in packages/benchmarking, I create a pubspec.yaml for my local-only package:
name: dpid_benchmarking
dev_dependencies:
  args: any
  benchmark_harness: any
There is nothing fancy there—it reads like any other package specification in Dart, which is nice.

The first bit of common code that I would like to pull in is not Dart. Rather it is the common _benchmark.sh Bash code from last night. There is nothing in Dart's packages that prevent packaging other languages, a fact that I exploit here:
$ git mv visitor/tool/_benchmark.sh packages/benchmarking/lib
I use the lib subdirectory in the package because that is the only location that is readily shared by Dart packages.

To use _benchmark.sh from the vistor code samples, I now need to declare that it depends on my local-only package. This can be done in pubspec.yaml with a dependency. Since this is benchmarking code, it is not a build dependency. Rather it is a development dependency. And, since this is a local-only package, I have to specify a path attribute for my development dependency:
name: visitor_code
dev_dependencies:
  dpid_benchmarking:
    path: ../packages/benchmarking
I suffer a single relative path in my pubspec.yaml because Pub rewards me with a common, non-relative path after pub install. Installing this local-only package creates a symbolic link to packages/dpid_benchmarking/lib in the packages directory:
$ pwd
/home/chris/repos/design-patterns-in-dart/visitor
$ ls -l packages 
lrwxrwxrwx 1 chris chris 31 Jul 22 21:35 dpid_benchmarking -> ../../packages/benchmarking/lib
lrwxrwxrwx 1 chris chris  6 Jul 22 21:35 visitor_code -> ../lib
Especially useful here is that Dart Pub creates this packages directory in all of the standard pub subdirectories like tool:
$ cd tool
$ pwd
/home/chris/repos/design-patterns-in-dart/visitor/tool
$ ls -l packages/
lrwxrwxrwx 1 chris chris 31 Jul 22 21:35 dpid_benchmarking -> ../../packages/benchmarking/lib
lrwxrwxrwx 1 chris chris  6 Jul 22 21:35 visitor_code -> ../lib
This is wonderfully useful in my visitor pattern's benchmark.sh Bash script. Instead of sourcing _benchmark.sh in the current tool directory, I simply change it so that it sources it from the local-only package:
#!/bin/bash

source ./packages/dpid_benchmarking/_benchmark.sh

BENCHMARK_SCRIPTS=(
    tool/benchmark.dart
    tool/benchmark_single_dispatch_iteration.dart
    tool/benchmark_visitor_traverse.dart
)

_run_benchmarks $1
The symbolic links from Dart Pub takes care of the rest. Nice!

Of course, Pub is the Dart package manager, so it works with Dart code as well. I move some obvious candidates for common benchmarking from visitor/tool/src into the local-only package:
$ git mv visitor/tool/src/config.dart packages/benchmarking/lib/
$ git mv visitor/tool/src/score_emitters.dart packages/benchmarking/lib/
It is then a simple matter of changing the import statements to use the local-only package:
import 'package:dpid_benchmarking/config.dart';
import 'package:dpid_benchmarking/score_emitters.dart';
import 'package:visitor_code/visitor.dart';

main (List args) {
  // ...
}
Dart takes care of the rest!

Best of all, I rinse and repeat in the rest of my design patterns. I add the dpid_benchmarking local-only package as a development dependency, run pub install, then make use of this common code to ensure that I have beautiful data to back up some beautiful design patterns.

That is an all-around win thanks to Dart's Pub package manager. Yay!


Day #130

Monday, July 21, 2014

Refactoring Bash Scripts


I'll be honest here: I'm a pretty terribly Bash script coder. I find the man page too overwhelming to really get better at it. If I need to do something in Bash—even something simple like conditional statements—I grep through /etc/init.d scripts or fall back to the Google machine.

But tonight, there is no hiding. I have two Bash scripts (actually, just shell scripts at this point) that do nearly the same thing: benchmark.sh and benchmark_js.sh. Both perform a series of benchmarking runs of code for Design Patterns in Dart, the idea being that it might be useful to have actual numbers to back up some of the approaches that I include in the book. But, since this is Dart, it makes sense to benchmark both on the Dart VM and on a JavaScript VM (the latter because most Dart code will be compiled with dart2js). The two benchmark shell scripts are therefore responsible for running and generating summary results for Dart and JavaScript.

The problem with two scripts is twofold. First, I need to keep them in sync—any change made to one needs to go into the other. Second, if I want to generalize this for any design pattern, I have hard-coded way too much in both scripts. To the refactoring machine, Robin!

To get an idea where to start, I diff the two scripts. I have been working fairly hard to keep the two scripts in sync, so there are only two places that differ. The JavaScript version includes a section that compiles the Dart benchmarks in to JavaScript:
$ diff -u1 tool/benchmark.sh tool/benchmark_js.sh 
--- tool/benchmark.sh   2014-07-21 22:32:44.047778498 -0400
+++ tool/benchmark_js.sh        2014-07-21 20:54:20.803634500 -0400
@@ -11,2 +11,16 @@
 
+# Compile
+wrapper='function dartMainRunner(main, args) { main(process.argv.slice(2)); }';
+dart2js -o tool/benchmark.dart.js \
+           tool/benchmark.dart
+echo $wrapper >> tool/benchmark.dart.js
+
+dart2js -o tool/benchmark_single_dispatch_iteration.dart.js \
+           tool/benchmark_single_dispatch_iteration.dart
+echo $wrapper >> tool/benchmark_single_dispatch_iteration.dart.js
+
+dart2js -o tool/benchmark_visitor_traverse.dart.js \
+           tool/benchmark_visitor_traverse.dart
+echo $wrapper >> tool/benchmark_visitor_traverse.dart.js
+
...
The other difference is actually running the benchmarks—the JavaScript version needs to run through node.js:
$ diff -u1 tool/benchmark.sh tool/benchmark_js.sh 
--- tool/benchmark.sh   2014-07-21 22:32:44.047778498 -0400
+++ tool/benchmark_js.sh        2014-07-21 20:54:20.803634500 -0400
...
@@ -15,7 +29,7 @@
 do
-    ./tool/benchmark.dart --loop-size=$X \
+    node ./tool/benchmark.dart.js --loop-size=$X \
         | tee -a $RESULTS_FILE
-    ./tool/benchmark_single_dispatch_iteration.dart --loop-size=$X \
+    node ./tool/benchmark_single_dispatch_iteration.dart.js --loop-size=$X \
         | tee -a $RESULTS_FILE
-    ./tool/benchmark_visitor_traverse.dart --loop-size=$X \
+    node ./tool/benchmark_visitor_traverse.dart.js --loop-size=$X \
         | tee -a $RESULTS_FILE
For refactoring purposes, I start with the latter difference. The former is a specialization that can be performed in a single conditional. The latter involves both uses of the script.

That will suffice for initial strategy, what about tactics? Glancing at the Dart benchmark.sh script, I see that I have a structure that looks like:
#!/bin/bash

RESULTS_FILE=tmp/benchmark_loop_runs.tsv
SUMMARY_FILE=tmp/benchmark_summary.tsv
LOOP_SIZES="10 100 1000 10000 100000"

# Initialize artifact directory
...

# Individual benchmark runs of different implementations
...

# Summarize results
...

# Visualization ready
...
Apparently I have been quite fastidious about commenting the code because those code section comments are actually there. They look like a nice first pass a series of functions. Also of note here is that the script starts by setting some global settings, which seems like a good idea even after refactoring—I can use these and others to specify output filenames, benchmark scripts, and whether or not to use the JavaScript VM.

But first things first, extracting the code in each of those comment sections out into functions. I make a top-level function that will invoke all four functions-from-comment-sections:
_run_benchmarks () {
    initialize
    run_benchmarks
    summarize
    all_done
}
Then I create each function as:
# Initialize artifact directory
initialize () {
  # ...
}

# Individual benchmark runs of different implementations
run_benchmarks () {
  # ...
}

# Summarize results
summarize () {
  # ...
}

# Visualization ready
all_done () {
  # ...
}
Since each of those sections is relying on top-level global variables, this just works™ without any additional work from me.

Now for some actual refactoring. One of the goals here is to be able to use this same script not only for JavaScript and Dart benchmarking of the same pattern, but also for different patterns. To be able to use this for different patterns, I need to stop hard-coding the scripts inside the new run_benchmarks function:
run_benchmarks () {
    echo "Running benchmarks..."
    for X in 10 100 # 1000 10000 100000
    do
      ./tool/benchmark.dart --loop-size=$X \
          | tee -a $RESULTS_FILE
      ./tool/benchmark_single_dispatch_iteration.dart --loop-size=$X \
          | tee -a $RESULTS_FILE
      ./tool/benchmark_visitor_traverse.dart --loop-size=$X \
          | tee -a $RESULTS_FILE
    done
    echo "Done. Results stored in $results_file."
}
The only thing that is different between those three implementation benchmarks is the name of the benchmark file. So a list of files in a global variable that could be looped over is my next step:
BENCHMARK_SCRIPTS=(
    tool/benchmark.dart
    tool/benchmark_single_dispatch_iteration.dart
    tool/benchmark_visitor_traverse.dart
)
Up until this point, I think I could get away with regular Bourne shell scripting, but lists like this are only available in Bash. With that, I can change run_benchmarks to:
run_benchmarks () {
    echo "Running benchmarks..."
    for X in 10 100 1000 10000 100000
    do
        for script in ${BENCHMARK_SCRIPTS[*]}
        do
            ./$script --loop-size=$X | tee -a $results_file
        done
    done
    echo "Done. Results stored in $results_file."
}
At this point, I would like to get a feel for what the common part of the script is and what specialized changes are needed for each new pattern benchmark. So I move all of my new functions out into a _benchmark.sh script that can be ”sourced” by the specialized code:
#!/bin/bash

source ./tool/_benchmark.sh

BENCHMARK_SCRIPTS=(
    tool/benchmark.dart
    tool/benchmark_single_dispatch_iteration.dart
    tool/benchmark_visitor_traverse.dart
)

RESULTS_FILE=benchmark_loop_runs.tsv
SUMMARY_FILE=benchmark_summary.tsv

_run_benchmarks
That is pretty nice. I can easily see how I would use this for other patterns—for each implementation being benchmarked, I would simple add them to the list of BENCHMARK_SCRIPTS.

Now that I see what is left of the code, I realize that the RESULTS_FILE and SUMMARY_FILE variables were only varied to keep from overwriting artifacts of the different VM runs. The basename for each remains the same between the two original scripts—the JavaScript artifacts include an additional _js. That is the kind of thing that can be moved into my new common _benchmark.sh file:
#...
results_basename="benchmark_loop_runs"
summary_basename="benchmark_summary"
results_file=""
summary_file=""
type="js"
# ...
initialize () {
    if [ "$type" = "js" ]; then
        results_file=$tmpdir/${results_basename}_js.tsv
        summary_file=$tmpdir/${summary_basename}_js.tsv
    else
        results_file=$tmpdir/${results_basename}.tsv
        summary_file=$tmpdir/${summary_basename}.tsv
    fi
    # ...
}
If _run_benchmarks is invoked when type is set to "js", then the value of the results_file variable will then include _js in the basename. If "js" is not supplied, then the old basename will be used.

To obtain that information from the command-line, I read the first argument supplied when the script is called ($1) and send it to _run_benchmarks:
_run_benchmarks $1
I can then add a new (very simple) parse_options function to see if this script is running the Dart or JavaScript benchmarks:
_run_benchmarks () {
    parse_options $1
    initialize
    run_benchmarks
    summarize
    all_done
}

parse_options () {
  if [ "$1" = "-js" -o "$1" = "--javascript" ]
  then
      type="js"
  fi
}
# ...
The $1 inside _run_benchmarks is not the same as the $1 from the command-line. Inside a function, $1 refers to the first argument supplied to it.

I can also make use of this to pull in the last piece of refactoring—the compiling that I decided to save until later. Well, now it is later and I can compile as needed when type has been parsed from the command line to be set to "js":
_run_benchmarks () {
    parse_options $1
    initialize
    if [ "$type" = "dart" ]; then
        run_benchmarks
    else
        compile
        run_benchmarks_js
    fi
    summarize
    all_done
}
With that, I have everything I need to keep the specialized versions of the benchmarking script small:
#!/bin/bash

source ./tool/_benchmark.sh

BENCHMARK_SCRIPTS=(
    tool/benchmark.dart
    tool/benchmark_single_dispatch_iteration.dart
    tool/benchmark_visitor_traverse.dart
)

_run_benchmarks $1
No doubt there is much upon which I could improve if I wanted to get really proficient at Bash scripting, but I am reasonably happy with this. It ought to suit me nicely when I switch to other patterns. Which I will try next. Tomorrow.



Day #129

Sunday, July 20, 2014

Reading Files from STDIN in Dart


I need to read from STDIN on the command-line. Not just a single line, which seems to well understood in Dart, but from an entire file redirected into a Dart script.

My benchmarking work has a life of its own now with several (sometimes competing) requirements. One of those requirements is that the code that actually produces the benchmark raw data cannot use dart:io for writing to files (because it also compiled to JavaScript which does not support dart:io). Given that, my benchmark results are sent to STDOUT and redirected to artifact files:
    ./tool/benchmark.dart --loop-size=$X \
        | tee -a $RESULTS_FILE
Or, the JavaScript version:
    node ./tool/benchmark.dart.js --loop-size=$X \
        | tee -a $RESULTS_FILE
That works great, mostly because Dart's print() sends its data to STDOUT regardless of whether or not it is being run the Dart VM or in Node.js after being put through dart2js.

What I need now is a quick way to summarize the raw results that can work with either the Dart or JavaScript results. Up until now, I have hard-coded the current filename inside the summary script which is then run as:
./tool/summarize_results.dart > $SUMMARY_FILE
The problem with that is twofold: I should not hard-code a filename like that and the artifact filenames should all be managed in the same location.

I could send the filename into the ./tool/summarize_results.dart as an argument. That would, in fact, require a minimal amount of change. But I have a slight preference to be consistent with shell file redirection—if STDOUT output is being redirected to a file, then my preference is to (at least support) reading from a file and piping into the script via STDIN.

Fortunately, this is fairly easy with dart:io's stdin top-level getter. It actually turns out to be very similar to the single-line-of-keyboard-input solutions that are already out there:
_readTotals() {
  var lines = [];

  var line;
  while ((line = stdin.readLineSync()) != null) {
    var fields = line.split('\t');
    lines.add({
      'name': fields[0],
      'score': fields[1],
      'loopSize': fields[2],
      'averageScore': fields[3]
    });
  }

  return lines;
}
The only real “trick” here is the null check at the end of the STDIN input stream. Pleasantly, that was easy to figure out since is it similar to many other languages implementation of STDIN processing.



Day #128

Saturday, July 19, 2014

Command Line Arguments with Node.js and Dart2js


How on earth did it come to this?

I need for my compiled Dart to respond to command line arguments when compiled to JavaScript and run under Node.js. I think that sentence reads as proper English.

It all seemed so simple at first. I want to benchmark code for Design Patterns in Dart. I may use benchmarks for every pattern, I may use them for none—either way I would like the ability to quickly get benchmarks so that I can compare different approaches that I might take to patterns. So I would like to benchmark.

So I benchmark my Dart code. I added features to command-line Dart scripts so that they can read command line switches to vary the loop size (which seems to make a difference). Then I try to compile this code to JavaScript to be run under node.js. That actually works. Except for command line options to vary the loop size.

And so here I am. I need to figure out how to get node.js running dart2js code to accept command-line (or environment) options.

Last night I found that the usual main() entry point simply does not see command line arguments when compiled via dart2js and run with node. The print() of the command line args is an empty List even when options are supplied:
main (List<String> args) {
  print(args);
  // ...
}
With the Dart VM, I see command line options:
$ dart ./tool/benchmark.dart --loop-size=666
[--loop-size=666]
And, with the very nice args package, I can even process them to useful things. But when run with node.js, I only get a blank list:
$ node ./tool/benchmark.dart.js --loop-size=666
[]
I had thought to use some of the process environment options, but they are all part of the dart:io package which is unsupported by dart2js. Even initializing a string from the environment does not work:
const String LOOP_SIZE = const String.fromEnvironment("LOOP_SIZE", defaultValue: "10");
No matter how I tried to set that environment variable, my compiled JavaScript would only use the default value.

So am I stuck?

The answer turns out to be embedded in the output of dart2js code. At the very top of the file are a couple of hints including:
// dartMainRunner(main, args):
//    if this function is defined, the Dart [main] method will not be invoked
//    directly. Instead, a closure that will invoke [main], and its arguments
//    [args] is passed to [dartMainRunner].
It turns out that I can use this dartMainRunner() function to grab command line arguments the node.js way so that they can be supplied to the compiled Dart. In fact, I need almost no code to do it:
function dartMainRunner(main, args) {
  main(process.argv.slice(2)); 
}
I slice off the first two command line arguments, supplying the main() callback with the rest. In the case of my node.js code, the first two arguments are the node executable and the script name:
$ node ./tool/benchmark.dart.js --loop-size=666
With those chomped off, I am only supplying the remainder of the command line arguments—the switches that control how my benchmarks will run.

This is a bit of a hassle in my Dart benchmarks since I will need to add that wrapper code to each implementation every time that I make a change. I probably ought to put all of this into a Makefile (or equivalent). For now, however, I simply make it part of my benchmarking shell script:
#...
# Compile
wrapper='function dartMainRunner(main, args) { main(process.argv.slice(2)); }';
dart2js -o tool/benchmark.dart.js \
           tool/benchmark.dart
echo $wrapper >> tool/benchmark.dart.js

# More compiling then actual benchmarks ...
And that works!

Thanks to my already in place gnuplot graphing solution, I can even plot my three implementations of the Visitor Pattern compiled from Dart into JavaScript. The number of microseconds it takes a single run relative to the number of loops winds up looking like this:



That is a little different than last night's Dart numbers, but the bottom line seems to be that it does not matter too much which implementation I choose. At least for this pattern. Regardless, I am grateful to be able to get these numbers quickly now—even from node.js.


Day #127

Friday, July 18, 2014

Close, But Not Quite Visualizing dart2js Benchmark Comparisons


Thanks to benchmark_harness and gnuplot I can make quick, pretty, and useful visualizations of the performance of different code implementations:



The actual numbers are almost inconsequential at this point. I need to know that I can make and visualize them in order to know that I can choose the right solution for inclusion in Design Patterns in Dart. I can investigate the numbers later—for now I only need know that I can generate them.

And I think that I finally have that sorted out. I think that I have eliminated inappropriate comparisons, loops, types and just plain silly mistakes. To be sure, the code could (and probably will need to be) better. But, it works. More importantly, it can be run through a single tool/benchmark.sh script:
#!/bin/sh

# Initialize artifact directory
mkdir -p tmp
cat /dev/null > tmp/benchmark_loop_runs.tsv
cat /dev/null > tmp/benchmark_summary.tsv

# Individual benchmark runs of different implementations
echo "Running benchmarks..."
for X in 10 100 1000 10000 100000
do
    ./tool/benchmark.dart --loop-size=$X
    ./tool/benchmark_single_dispatch_iteration.dart --loop-size=$X
    ./tool/benchmark_visitor_traverse.dart --loop-size=$X
done
echo "Done. Results stored in tmp/benchmark_loop_runs.tsv."

# Summarize results
echo "Building summary..."
./tool/summarize_results.dart
echo "Done. Results stored in tmp/benchmark_summary.tsv."

# Visualization ready
echo ""
echo "To view in gnuplot, run tool/visualize.sh."
I am little worried about the need for artifacts, but on some level they are needed—the VM has to be freshly started with each run for accurate numbers. To retain the data between runs—and between the individual runs and the summary—artifacts files will need to store that information. I will worry about that another day.

My concern today is that I am benchmarking everything on the Dart VM. For the foreseeable future, however, the Dart VM will not be the primary runtime environment for Dart code. Instead, most Dart code will be compiled to JavaScript via dart2js. I cannot very well recommend a solution that works great in Dart, but fails miserably in JavaScript.

So how to I get these numbers and visualizations in JavaScript?

Well, I already know how to benchmark dart2js. I just compile to JavaScript and run with Node.js. Easy-peasey, right?
$ dart2js -o tool/benchmark.dart.js \
>            tool/benchmark.dart
tool/src/score_emitters.dart:3:8:
Error: Library not found 'dart:io'.
import 'dart:io';
       ^^^^^^^^^
tool/src/score_emitters.dart:39:18:
Warning: Cannot resolve 'File'.
  var file = new File(LOOP_RESULTS_FILE);
                 ^^^^
tool/src/score_emitters.dart:40:23:
Warning: Cannot resolve 'FileMode'.
  file.openSync(mode: FileMode.APPEND)
                      ^^^^^^^^
Error: Compilation failed.
Arrrgh. All of my careful File code is for naught if I want to try this out in JavaScript. On the bright side, this is why you try a solution in a bunch of different environments before generalizing.

This turns out to have a fairly easy solution thanks to tee. I change the score emitter to simply print to STDOUT in my Dart code:
recordTsvTotal(name, results, loopSize, numberOfRuns) {
  var averageScore = results.fold(0, (prev, element) => prev + element) /
    numberOfRuns;

  var tsv =
    '${name}\t'
    '${averageScore.toStringAsPrecision(4)}\t'
    '${loopSize}\t'
    '${(averageScore/loopSize).toStringAsPrecision(4)}';

  print(tsv);
}
Then I make the benchmark.sh responsible for writing the tab separated data to the appropriate file:
RESULTS_FILE=tmp/benchmark_loop_runs.tsv
# ...
dart2js -o tool/benchmark.dart.js \
           tool/benchmark.dart
# ...
for X in 10 100 1000 10000 100000
do
    ./tool/benchmark.dart --loop-size=$X | tee -a $RESULTS_FILE
    # ...
done
That works great—for the default case:
$ node tool/benchmark.dart.js
Classic Visitor Pattern 6465    10      646.5
Unfortunately, the dart:args package does not work with Node.js. Supplying a different loop size still produces results for a loop size of 10:
$ node tool/benchmark.dart.js --loop-size=100
Classic Visitor Pattern 6543    10      654.3
My initial attempt at solving this is String.fromEnvironment():
const String LOOP_SIZE = const String.fromEnvironment("LOOP_SIZE", defaultValue: "10");

class Config {
  int loopSize, numberOfRuns;
  Config(List<String> args) {
    var conf = _parser.parse(args);
    loopSize = int.parse(LOOP_SIZE);
    // ...
  }
  // ...
}
But that does not work. When I run the script, I still get a loop size of 10:
$ LOOP_SIZE=100 node tool/benchmark.dart.js
Classic Visitor Pattern 6160    10      616.0
Stumped there, I call it a night.

I would have preferred to get all the way to a proper visualization, but the switch to tee for building the tab separated artifact is already a win. The filename is now located in a single script (the overall shell script) instead of two places (the shell script and Dart code). Hopefully I can figure out some way to read Node.js command line arguments from compiled Dart. Tomorrow.


Day #126

Thursday, July 17, 2014

Gnuplot Dart Benchmarks


I am embracing the idea of benchmarking the solutions that I add to Design Patterns in Dart. This early in the research, I am unsure if I will have a strong need for benchmarks. What I do know is that if I cannot make benchmarking easy, then I will almost certainly not use them.

So tonight, I take a quick look at the last piece of my puzzle: visualization. After last night, I have a solution that can store the average run time of a single benchmark run against the number of loops to gather that data. The CSV output looks something like:
Loop_Size, Classic_Visitor_Pattern, Nodes_iterate_w/_single_dispatch, Visitor_Traverses
10, 122.4, 110.6, 117.5, 
100, 118.5, 112.4, 117.6, 
1000, 120.4, 112.4, 116.8, 
10000, 148.6, 148.2, 117.9, 
100000, 148.1, 148.7, 118.5, 
I can paste that data in a spreasheet to produce nice looking graphs like:



But let's face it, if I have to open a spreadsheet every time that I want to look at my data, I am never going to do it. So unless I can find some way to quickly visualize that data, all of this would be for naught. Enter gnuplot.

The preferred data format for gnuplot seems to be tab separated values, so I rework the row output in my Dart benchmark reporter to output tabs instead of commas:
    // ...
    var row = '${loopSize}\t';
    implementations.forEach((implementation){
      var rec = records.firstWhere((r)=> r['name'] == implementation);
      row += '${rec["averageScore"]}\t';
    });
    // ...
The graph type that I want in this case is a clustered histogram culled from data. In gnuplot parlance, that translates as:
set style data histogram
set style histogram clustered
I fiddle a bit with styles (and cull a few from Google / Stack Overflow) to settle on these settings:
set style fill solid border
set xtics rotate out
set key tmargin
The fill style makes the bars a little easier to see and the other two make the data and legends easier to see. Finally, I plot the data as:
plot for [COL=2:4] 'bar.tsv' using COL:xticlabels(1) title columnheader
Which results in:



That is quite nice for just a little bit of work. I do, however, note that this graph suffers from a pretty serious flaw—the same flaw from which my spreadsheet graph suffered unnoticed by me until now. The y-axis does not start at zero, which greatly exaggerates the discrepancies between my three implementations. To start the y-axis at zero, I use:
set yrange [0:]
Now my graph looks like:



Now that I see the “jump” in values at the 10k loop size in that perspective, it does not seem nearly as worrisome.

If I put that all in a gnuplot script file:
# set terminal png size 1024,768
# set output  "bar.png"
set style data histogram
set style histogram clustered
set style fill solid border
set xtics rotate out
set key tmargin
set yrange [0:]
plot for [COL=2:4] 'bar.tsv' using COL:xticlabels(1) title columnheader
pause -1 "Hit any key to continue"
Then I can quickly see my benchmarking results whenever I need to. No excuses.


Day #125