japh(r) by Chris Strom: Benchmarking at Different Run Length

Monday, July 14, 2014

Benchmarking at Different Run Length

It seems that my benchmarks may be hitting a garbage collection limit somewhere along the way. Or something.

I continue to benchmark three (slightly) different implementations of the Visitor Pattern in Dart. The actual goal is reasonable assurance that I am suggesting the most performant version of each pattern in the forthcoming Design Patterns in Dart. Toward that end, I am seeking out as many considerations involved in benchmarking patterns as possible (and I have found a number). At this point, the actual numbers are not as important as the process. Still, the numbers do suggest considerations of which I need to be aware.

The source code for this is located in the public facing code repository for the book.

I have noted that the time-per-run seems dependent on the number of loops used in the benchmark. Normally, the benchmark_harness package will run as many loops as necessary to fill a 2 second run. In order to get the numbers to be relatively stable between runs, I have been experimenting with increasing the number of loops in the benchmark. The results seem relatively consistent for limited numbers of loops in my benchmark code:

class VisitorBenchmark extends BenchmarkBase {
  // ...
  void run() {
    for (var i=0; i<10; i++) {
      visitor.totalPrice = 0.0;
      nodes.accept(visitor);
    }
  }
}

But, if I change the comparison from 10 to 1,000 to 100,000, then strange things happen to my numbers.

To test this out in detail, I add command line options to my benchmarks (using the nice args Dart package), write a shell script to exercise them:

#!/bin/sh

for X in 10 100 1000 10000 100000
do
    echo ''
    echo '=='
    echo "Loop size: $X"
    ./tool/benchmark.dart --loop-size=$X
    echo '--'
    ./tool/benchmark_single_dispatch_iteration.dart --loop-size=$X
    echo '--'
    ./tool/benchmark_visitor_traverse.dart --loop-size=$X
done

What I find is (numbers are microseconds for a single run):

Loop Size	Classic	Nodes Traverse (single dispatch)	Visitor Traverses
10	131.6	122.7	121.4
100	125.9	113.6	120.15
1000	123.8	122.8	120.65
10000	157.0	153.25	120.45
100000	158.05	154.65	121.05

Or, in graphical form:

I am unsure if that is really garbage collection, but something reliably affects the two implementations in which the node structure is responsible for traversing itself. And somehow making the visitor responsible for traversing the node structure (even with double dispatching) is not affected.

It is worth noting that the last two loop sizes push the benchmark run to and past the 2 second built-in lower limit of benchmark_harness. This may be a coincidence, especially since this does not seem to affect all three implementations.

Day #122

‹prev | My Chain | next›

japh(r) by Chris Strom

Monday, July 14, 2014

Benchmarking at Different Run Length

No comments:

Post a Comment