Friday, July 11, 2014

Crazy Warm Dart Benchmarks


After last night, I am happy to have a new benchmark reporter that does not report 16 digits of precision when times vary after 3 digits:
$ ./tool/benchmark.dart; \
      echo '--'; \
        ./tool/benchmark_single_dispatch_iteration.dart; \
      echo '--'; \
        ./tool/benchmark_visitor_traverse.dart
Classic Visitor Pattern (RunTime): 1409 µs.
Classic Visitor Pattern (RunTime): 1375 µs.
Classic Visitor Pattern (RunTime): 1374 µs.
--
Nodes iterate w/ single dispatch (RunTime): 1311 µs.
Nodes iterate w/ single dispatch (RunTime): 1276 µs.
Nodes iterate w/ single dispatch (RunTime): 1290 µs.
--
Visitor Traverses (RunTime): 1388 µs.
Visitor Traverses (RunTime): 1350 µs.
Visitor Traverses (RunTime): 1362 µs.
But I am still not satisfied that I have a solid handle on benchmarking different approaches to patterns for Design Patterns in Dart.

I can live with small deviations in reported times, but the numbers that I get now indicate that the “warm-up” time for my benchmarks is woefully insufficient. I currently have three variations of the Visitor Pattern that I am benchmarking. For each of the three, I run my benchmarks three times. And, for every group, the first run takes the longest while the second two runs are very close to each other.

The idea of a warm-up run is not some brilliant insight on my part. It is baked right into Dart's benchmark_harness package:
part of benchmark_harness;

class BenchmarkBase {
  // ...
  // Runs a short version of the benchmark. By default invokes [run] once.
  void warmup() {
    run();
  }
  // ...
}
One might be tempted to think the problem obvious: only running the benchmark run() method once is woefully inadequate. Except that the run() method in each of my three benchmark variations runs the actual code ONE MILLION times:
class VisitorBenchmark extends BenchmarkBase {
  const VisitorBenchmark() :
    super(
      "Classic Visitor Pattern",
      emitter: const ProperPrecisionScoreEmitter()
    );

  static void main() { new VisitorBenchmark().report(); }

  void run() {
    for (var i=0; i<10^6; i++) {
      visitor.totalPrice = 0.0;
      nodes.accept(visitor);
    }
  }
}
OK. OK. It's more fun to say ONE MILLION than it is effective. But what might be effective?

The warmup() method of BenchmarkBase seems like it might be fit my needs:
class VisitorBenchmark extends BenchmarkBase {
  const VisitorBenchmark() :
    super(
      "Classic Visitor Pattern",
      emitter: const ProperPrecisionScoreEmitter()
    );
  static void main() { new VisitorBenchmark().report(); }

  void warmup() {
    for (var i=0; i<10^20; i++) {
     visitor.totalPrice = 0.0;
      nodes.accept(visitor);
    }
  }

  void run() { /* ... */  }
}
But even at 10^20 iterations, this has little effect (the stinking Visitor Pattern is just too darn efficient).

I need the warmup to run for even longer. For that, I think I'll override BenchmarkBase's measure() method:
class VisitorBenchmark extends BenchmarkBase {
  const VisitorBenchmark() :
    super(
      "Classic Visitor Pattern",
      emitter: const ProperPrecisionScoreEmitter()
    );
  // ...
  double measure() {
    setup();
    // Warmup for at least 100ms. Discard result.
    measureFor(() { this.warmup(); }, 100);
    // Run the benchmark for at least 2000ms.
    double result = measureFor(() { this.exercise(); }, 2*1000);
    teardown();
    return result;
  }
}
That will not work as-is because the measureFor() method invoked by measure() is a static method of BenchmarkBase:
$ ./tool/benchmark.dart
Unhandled exception:
Class 'VisitorBenchmark' has no instance method 'measureFor'.

NoSuchMethodError: method not found: 'measureFor'
Receiver: Instance of 'VisitorBenchmark'
Arguments: [Closure: () => dynamic, 10000]
#0      Object.noSuchMethod (dart:core-patch/object_patch.dart:45)
#1      VisitorBenchmark.measure (file:///home/chris/repos/design-patterns-in-dart/visitor/tool/benchmark.dart:44:15)
#2      BenchmarkBase.report (package:benchmark_harness/src/benchmark_base.dart:65:31)
#3      VisitorBenchmark.main (file:///home/chris/repos/design-patterns-in-dart/visitor/tool/benchmark.dart:35:53)
#4      main (file:///home/chris/repos/design-patterns-in-dart/visitor/tool/benchmark.dart:10:24)
So I have to manually resolve it:
class VisitorBenchmark extends BenchmarkBase {
  const VisitorBenchmark() :
    super(
      "Classic Visitor Pattern",
      emitter: const ProperPrecisionScoreEmitter()
    );
  // ...
  double measure() {
    setup();
    BenchmarkBase.measureFor(() { this.warmup(); }, 10*1000);
    // Run the benchmark for at least 2000ms.
    double result = BenchmarkBase.measureFor(() { this.exercise(); }, 2*1000);
    teardown();
    return result;
  }
}
While I am at it, I call BenchmarkBase.measureFor() for 10 seconds worth of warm-up instead of the base 100ms. With that, I get more consistent number for each of my benchmark runs:
$ ./tool/benchmark.dart
Classic Visitor Pattern (RunTime): 204.5 µs.
Classic Visitor Pattern (RunTime): 203.7 µs.
Classic Visitor Pattern (RunTime): 203.7 µs.
$ ./tool/benchmark.dart
Classic Visitor Pattern (RunTime): 201.8 µs.
Classic Visitor Pattern (RunTime): 203.2 µs.
Classic Visitor Pattern (RunTime): 201.4 µs.
That is all well and good, but it may be overkill. Even with the original numbers, the comparison between the different approaches would still be valid for the first run. Even though the numbers are elevated in the first run, they are consistently elevated. All that I need to determine which, if any, approaches should be discounted for performance reasons is a consistent comparison. And, for the different approaches to Visitor, the difference is so small that any would work just as well as the others.

Still, it was good to dig into Dart's benchmark harness some more. I have a better handle on how it works—and how I might tailor it should I need to.


Day #119

No comments:

Post a Comment