Thursday, July 24, 2014

Logarithmic Scales: For When the Winner is That Big


Up tonight: whatever breaks when I try my all-encompassing, Design Patterns in Dart benchmarking suite on a different pattern. I have been building it out on code for the Visitor Pattern and I think it is finally ready (more or less). So let's see what happens when I pull it back into the code for the Factory Method Pattern.

Note: the code is located at https://github.com/eee-c/design-patterns-in-dart. The current HEAD points to bec37fd41e.

What I found in the Visitor Pattern code was that the benchmark numbers were dependent on the number of times that a particular pattern was run through a loop. This could be specific to patterns that work on node structures like then Visitor Pattern. But for all I know, different approaches like storing a list of Factories in a Map might also be influenced by the number of times through a loop. There is only one way to find out…

I start with a little code re-organization. I think I have settled on storing the preferred solution directly in the lib directory of the particular pattern being explored while alternatives go in lib/alt. After that, I create a separate benchmark script for each approach in the tool directory:
$ tree tool
tool
├── benchmark.sh
├── benchmark.dart
├── benchmark_map_of_factories.dart
├── benchmark_mirrors.dart
└── packages -> ../packages
I also create a mostly-configuration benchmark.sh Bash script file that pulls in common code to run each of those Dart scripts. And it all works pretty brilliantly. Except for the 2 errors per script that cause each benchmark to crash:
Unhandled exception:
FileSystemException: Cannot open file, path = '/home/chris/repos/design-patterns-in-dart/factory_method/tool/packages/args/args.dart' (OS Error: No s)
#0      _rootHandleUncaughtError.<anonymous closure>. (dart:async/zone.dart:713)
...
The solution is fairly simple, though it does expose yet another gap in my thinking—this time regarding Dart Pub packages. Before moving the benchmarking code into a common package location, it was a development dependency for the Visitor Pattern code:
name: visitor_code
dev_dependencies:
  args: any
  benchmark_harness: any
That is no longer needed directly by the pattern code—the common benchmarking code parses command-line arguments now. Only it is clearly not being pulled into this factory method “application.” The factory method application depends on the common dpid_benchmarking package:
name: factory_method_code
dependencies:
  browser: any
dev_dependencies:
  dpid_benchmarking:
    path: ../packages/benchmarking
And dpid_benchmarking does depend on args:
name: dpid_benchmarking
dev_dependencies:
  args: any
  benchmark_harness: any
So why doesn't args get pulled into the application as well?

The answer is simple, really. It is a development dependency of dpid_benchmarking. In other words, it will get installed when used by other applications or packages (like my factory method code). As a development dependency, it is only installed when working directly with the package—when developing it in isolation.

This is not completely obvious. I had not given this much thought, but I realize now that I expected this package's development dependencies to be installed by my application's development dependencies. Somewhere in the back of my mind I was expecting the development dependency on dpid_benchmarking to pull in dpid_benchmarking's development dependencies as well.

Now that I have exposed this flawed thinking, I acknowledge that it was erroneous. Development dependencies are for single package development only. Hopefully I will remember that.

That issue aside, everything else just works. I get nice graphs that seem to prove that the number of times that a Factory Method implementation is used has no impact on its performance:



It may be a little hard to see, but there is a number associated with the “classic” subclass implementation of the Factory Method pattern—it is just really small compared to the other two approaches. For this, I tell gnuplot to plot the y axis logarithmically:
set style data histogram
set style histogram clustered
set style fill solid border
set xtics rotate out
set key tmargin
set xlabel "Loop Size"
set ylabel "µs per run"

set logscale y

plot for [COL=2:4] filename using COL:xticlabels(1) title columnheader
That gives me:



Any way that you look at it, the class subclass approach is the clear performance winner. There may be some maintainability wins for the other approaches, but they would have to be significant to warrant using them over the subclass approach. Even when compiled to JavaScript:



I think that will close out my initial research into object oriented design patterns for the book. I may investigate some concurrency patterns tomorrow. Or it may be time to switch back to Patterns in Polymer.


Day #132

No comments:

Post a Comment