Monday, July 14, 2014

Benchmarking at Different Run Length


It seems that my benchmarks may be hitting a garbage collection limit somewhere along the way. Or something.

I continue to benchmark three (slightly) different implementations of the Visitor Pattern in Dart. The actual goal is reasonable assurance that I am suggesting the most performant version of each pattern in the forthcoming Design Patterns in Dart. Toward that end, I am seeking out as many considerations involved in benchmarking patterns as possible (and I have found a number). At this point, the actual numbers are not as important as the process. Still, the numbers do suggest considerations of which I need to be aware.

The source code for this is located in the public facing code repository for the book.

I have noted that the time-per-run seems dependent on the number of loops used in the benchmark. Normally, the benchmark_harness package will run as many loops as necessary to fill a 2 second run. In order to get the numbers to be relatively stable between runs, I have been experimenting with increasing the number of loops in the benchmark. The results seem relatively consistent for limited numbers of loops in my benchmark code:
class VisitorBenchmark extends BenchmarkBase {
  // ...
  void run() {
    for (var i=0; i<10; i++) {
      visitor.totalPrice = 0.0;
      nodes.accept(visitor);
    }
  }
}
But, if I change the comparison from 10 to 1,000 to 100,000, then strange things happen to my numbers.

To test this out in detail, I add command line options to my benchmarks (using the nice args Dart package), write a shell script to exercise them:
#!/bin/sh

for X in 10 100 1000 10000 100000
do
    echo ''
    echo '=='
    echo "Loop size: $X"
    ./tool/benchmark.dart --loop-size=$X
    echo '--'
    ./tool/benchmark_single_dispatch_iteration.dart --loop-size=$X
    echo '--'
    ./tool/benchmark_visitor_traverse.dart --loop-size=$X
done
What I find is (numbers are microseconds for a single run):
Loop SizeClassicNodes Traverse (single dispatch)Visitor Traverses
10131.6122.7121.4
100125.9113.6120.15
1000123.8122.8120.65
10000157.0153.25120.45
100000158.05154.65121.05

Or, in graphical form:



I am unsure if that is really garbage collection, but something reliably affects the two implementations in which the node structure is responsible for traversing itself. And somehow making the visitor responsible for traversing the node structure (even with double dispatching) is not affected.

It is worth noting that the last two loop sizes push the benchmark run to and past the 2 second built-in lower limit of benchmark_harness. This may be a coincidence, especially since this does not seem to affect all three implementations.


Day #122

No comments:

Post a Comment