Tuesday, July 15, 2014

Constant Constructors and Benchmark Scoring


I really appreciated the graph from last night's benchmarking post:



It makes it much more obvious where things change than just looking at a table of numbers:

Loop SizeClassicNodes Traverse (single dispatch)Visitor Traverses
10131.6122.7121.4
100125.9113.6120.15
1000123.8122.8120.65
10000157.0153.25120.45
100000158.05154.65121.05

Building the actual graph was a pain. If there is pain involved, it is quite likely that I am not going to make use of what I just described as very useful. So, before moving on to other learning territory, let's see if I ease the pain.

The pain involved manually copying CSV benchmarks of three different implementations of the Visitor Pattern in Dart. The benchmarking code is available in the tool subdirectory of the publicly available visitor book code for the forthcoming Design Patterns in Dart (last night's SHA1 was 14dea8561a).

To make the graph, I pasted the CSV into Google Docs spreadsheets, which is where the work (and pain) really started. The pain involved:
  1. adding a column dividing the time by the number of loops
  2. adding another sheet that averaged the numbers in different runs to create the table above
  3. generating the graph with the correct labels and headers
Step #1 should be easy—I probably should have done that myself in the Dart benchmarking code. That I did not are the hazards of plowing ahead in a solution without thinking.

Step #2 was particularly hard because I had to manually edit the range of cells to be averaged and the cell that identified the number of loops. Copying and pasting always got the wrong column and or row. In a 5×3 table, that is 15 edits of 3 values. There is no way I am doing that again by hand.

Step #3 may very well need to remain in Google spreadsheets. I know of no Dart packages that can manipulate Google Sheets (really?) nor are there any Dart packages that I can find that can manually build pretty graphs in PNG form. I am more than happy to be corrected on either point!

Anyhow, let's see what I can do about steps #1 and #2. Ugh... strike that. Let's see what I can do about the “easy” #1. In theory, #1 ought to be easy—all I need to do is get the loopSize from my benchmark's main() entry point down to the benchmark class:
main (List args) {
  // Determine loopSize from command-line args...

  _setup();
  for (var i=0; i<<numberOfRuns; i++) {
    VisitorBenchmark.main();
  }
}
If VisitorBenchmark knows the loopSize it can emit a score that includes the actual run time of the benchmark divided by loopSize:
class LoopScorer implements ScoreEmitter {
  const LoopScorer(this.numLoops);

  void emit(String testName, double value) {
    print(
      '$testName (RunTime in µs), '
      '${value.toStringAsPrecision(4)}, '
      '${numLoops}, '
      '${(value/numLoops).toStringAsPrecision(4)}'
    );
  }
}
Ah, my old friend constant constructors. Wait, not “friend” in this case—the loopSize is coming in from the command-line so there is no way that it will ever be a compile time constant. So there is no way to create a constant scorer.

So I do something bad instead: storing the score in a global and make the loop responsible for printing the results:
double lastScore;
class GloballyPersistingScoreEmitter implements ScoreEmitter {
  const GloballyPersistingScoreEmitter();
  void emit(String testName, double value) {
    lastScore = value;
  }
}
The benchmark then needs to use this scorer:
class VisitorBenchmark extends BenchmarkBase {
  const VisitorBenchmark() :
    super(NAME, emitter: const GloballyPersistingScoreEmitter());
  // ...
}
And finally, my loop reports the results:
main (List args) {
  // ...
  for (var i=0; i<numberOfRuns; i++) {
    VisitorBenchmark.main();
    print(
      '${NAME} (RunTime in µs), '
      '${lastScore.toStringAsPrecision(4)}, '
      '${loopSize}, '
      '${(lastScore/loopSize).toStringAsPrecision(4)}'
    );
  }
}
The results are a nice set of CSV values for inclusion in a spreadsheet:
$ ./tool/benchmark.dart
Classic Visitor Pattern (RunTime in µs), 1205, 10, 120.5
Classic Visitor Pattern (RunTime in µs), 1213, 10, 121.3
Classic Visitor Pattern (RunTime in µs), 1192, 10, 119.2
I can understand why benchmarks rely on compile-time constants like the benchmarker and scorer. No matter how many times I create a new instance, no new memory is consumed. In other words, the benchmark harness will not be responsible for triggering garbage collection. At the same time, it sure makes reporting anything more than the simplest information a pain.

Update: I do not need the scoring emitter after all. I can get away with returning the results of measure() directly from main() in my benchmark class:
class VisitorBenchmark extends BenchmarkBase {
  const VisitorBenchmark(): super(NAME);
  static double main()=> new VisitorBenchmark().measure();
  // ...
}
Since I am no longer calling the benchmarker's report() method, the score emitter is not invoked. The script's main entry point can then report to its heart's content using this return value:
  // ...
  for (var i=0; i<numberOfRuns; i++) {
    double score = VisitorBenchmark.main();
    print(
      '${NAME} (RunTime in µs), '
      '${score.toStringAsPrecision(4)}, '
      '${loopSize}, '
      '${(score/loopSize).toStringAsPrecision(4)}'
    );
  }
  // ...
And no global values.


Day #123

No comments:

Post a Comment