Monday, October 26, 2009

"Newbie" Feedback

I recently had the privilege to supply the challenge for the second ever Ruby Programming Challenge For Newbies. I thought it pretty cool that the challenge provoked 40+ "newbies" to submit responses. As one might expect, there was some rough Ruby, but very few butcherings of the language.

In the spirit of the code review, I tried to provide constructive feedback to all participants. Following is a summary of some common suggestions that I had to offer...

Bang Methods

Method parameters are mutable in Ruby. If you modify the value of the parameter inside a method, the original value changes. This can lead to all sorts of unexpected side-effects. There are some exceptions to this, but it is best to take no chances and treat it as a rule.

This can be a bit frightening at first, but Ruby has a convention of adding a bang (!) to the end of a method name when side-effects are expected. Many of the core methods follow this convention. For instance consider the difference between map and map!:
a = [1, 2, 3]

# Double every value in the array
>> a.map{|x| x * 2}
=> [2, 4, 6]

# The original value has not changed
>> a
=> [1, 2, 3]

# Double every value in the array with a bang method
>> a.map!{|x| x * 2}
=> [2, 4, 6]

# The original value is changed
>> a
=> [2, 4, 6]
The problem requested a solution with a method signature of average_time_of_day(times), where the times parameter was list of strings (e.g. %w{6:41am 6:51am 7:01am}). Since the method does not end with a bang, it should not modify the input (e.g. by calling a bang method like map!). A third of the submissions modified the times array inside that method—or worse, in other methods that were called by average_time_of_day.

Since average_time_of_day was the entirety of the challenge, modifying the input was not a huge problem. Still, not a good habit to start.

Enumerable

Since the challenge involved averaging times, most solutions involved summing up these times. A typical solution looked something like this:
sum = 0
i = 0
while i < times.length
time = times[i]
# parse / manipulate the time
sum = sum + time
i = i + 1
end
It is nice that Ruby provides syntax like for and while, making the transition from other languages so easy. There is a cleaner way to accomplish this in Ruby:
sum = 0
times.each do |time|
# parse / manipulate the time
parsed_time = #....
sum = sum + parsed_time
end
Eschewing while in favor of an each block eliminates the need for the tracking variable, i. With the assignment and computation of the tracking variable eliminated, there is less to distract from the actual intent of the code. Less code makes intent clearer.

The only thing still getting in the way of intent is the accumulator variable, sum. It is being initialized outside of the each block and manipulated inside the block. If you think of a block like a function, then this is akin to modifying a global variable inside a function.

Ruby has an inject method (also known as reduce and fold in Ruby 1.8.7+) to handle this:
sum = times.inject(0) do |accumulator, time|
# Manipulate the time somehow
accumulator + time
end
An inject starts with a seed value for an accumulator (sometimes called a memo). As inject iterates over each value, the last value in the block is used to seed the accumulator the next iteration. The end result of the iteration is the accumulation of each of these values. In the case of the code snippet above, it is the summation of the times.

But wait, there's more! It is possible to compact this even further...

Chaining

If you are not using bang methods, then you can chain iterators together. Let's say your solution involved mapping time strings to time objects and then sorting them, this can be accomplished via:
def average_time_of_day(times)
sorted_times = times.
map { |t| Time.parse(t) }.
sort
# Do more stuff ...
end
The sorted_times local variable now contains, well... sorted times. The time strings were mapped to actual time objects, which could then be sorted. That's just lovely!

Going back to the accumulator example, it is possible to chain together iterator to make intent crystal clear:
sum = times.
map { |str| Time.parse(str) }.
inject { |sum, t| sum + t }
It is possible to achieve even greater clarity through symbol to proc shorthand:
class String
def to_time
Time.parse(self)
end
end

def average_time_of_day(times)
sum = times.map(&:to_time).inject(&:+)
# Calculate the average ...
end
The theory behind symbol-to-proc is probably a bit beyond "newbie" scope, but still worth seeing to give an idea how close to the domain Ruby allows you to get. In that last code snippet, I am taking the times strings, mapping them to Time instances (with a monkey patch to String) and adding them all together. Intent does not get much clearer than that.

Read the documentation on Ruby's Enumerable module. Even seasoned Rubyists can benefit from re-reading it regularly.

Test Harness

Some will claim that the Ruby community is too "test obsessed". There are very good reasons to embrace TATFT / BDD, etc. I could write for months on why this is a good idea. In fact I did. But this is not one of those times.

Using a test harness for an algorithm like average_time_of_day makes too much sense not to use it. Many of the submissions were clearly iterating toward a solution. The author got it working for the simple case, then tried to get it working crossing midnight—only to break the simple case. Many authors never noticed that the simple case was no longer working and submitted half working solutions.

As soon as you have a simple solution working, write a test (if not sooner!):
def average_time_of_day(times)
# incredibly simplistic solution
end

require 'test/unit'

class TestAverageTimeOfDay < Test::Unit::TestCase
def test_simple_times
assert_equal('6:51am', average_time_of_day(%w{6:41am 6:51am 7:01am}))
end
end
If you had saved this file as average_time_of_day.rb, you could execute this test after each change to the average_time_of_day method thusly:
jaynestown% ruby average_time_of_day.rb
Loaded suite average_time_of_day
Started
.

1 test, 1 assertion, 0 failures, 0 errors
Ruthlessly running this command as the average_time_of_day method evolves will tell you immediately when something is broken. The sooner you know something is wrong, the sooner you can address it.

Don't test because a segment of the Ruby community forces you to, do it because it is so damn easy there is no reason not do.

Conclusion

That about sums up my feedback to the "newbies". I really did enjoy providing the challenge and very much appreciated so many quality solutions. I hope most of you continue your learning and I look forward to seeing some of your names driving the future of Ruby!

18 comments:

  1. Great post!

    I think blocks are also difficult to understand.

    ReplyDelete
  2. "Don't test because a segment of the Ruby community forces you to, do it because it is so damn easy there is no reason not do."

    As a ruby newbie who's just started playing with test/unit, this seems to be a perfectly valid statement.

    Thanks for the feedback and the challenge.

    ReplyDelete
  3. Automate things. Write test and have them executed in the background everytime you save a file. What might seem like more work in the beginning, turns out to be less work in the end.

    ReplyDelete
  4. Felipe: Dunno if this helps or not, but I didn't really grok blocks until I coded the equivalent in prototype.js (which mimics much of Ruby in Javascript).

    Consider the "doubler" mapping in Ruby: [1, 2, 3].map{ |x| x * 2 }

    The equivalent in prototype.js: [1, 2, 3].map(function(x) { return x * 2 })

    Looking at it this way, I realized that Ruby blocks are just anonymous functions with less verbosity.

    There are some semantic differences between blocks and methods in Ruby, but 90% of the time, you can think of them as equivalent. Of course, you can do some interesting stuff with the other 10%, but that's something to look forward to :)

    ReplyDelete
  5. I love the comment you made above about how close Ruby allows you to get to the domain of the problem and clearly express the intent.

    The newbie challenge was a very interesting problem and one that I actually spent three hours on just trying to do the math than actually coding.

    Once I finally had the math figured out, the Ruby came almost effortlessly. I spent about 15 minutes coding and testing. But the remarkable thing is that the solution decidedly took the exact shape of the hand-written worksheet where I solved the problem mathematically. Powerful stuff!

    ReplyDelete
  6. I can't get the tests to run without placing them in a class which is a subclass of Test::Unit::TestCase. Is it possible to run test/unit tests without doing so? Could you provide a gist? Thanks!

    Also, test_simple_times is missing a paren.

    ReplyDelete
  7. nstielau - Thanks! Two copy & paste errors in three lines of code is not a good ratio :-/

    You are correct, the test cases do need to be a sub-class of Test::Unit::TestCase (corrected above). Not sure how I missed that, but thanks for pointing it out!

    ReplyDelete
  8. In the symbol-to-proc blog posting you mentioned, there is a comment that the symbol-to-proc method was over 3x slower than the {|i| i.to_time} version. I don't know if this is still the case (2009 vs 2006), but its worth considering.

    Isn't symbol-to-proc essentially syntactic sugar, or is there a more substantial / less obvious benefit that might outweigh the slower performance?

    ReplyDelete
  9. Pete - There is no overhead for symbol-to-proc... at least not in Ruby 1.8.7+. The compiler remembers the initial symbol-to-proc conversion and uses it in subsequent iterations.

    So you can have clearer intent without worry of performance penalty.

    ReplyDelete
  10. Hi Chris,
    Great document - a must read for all newbies. After reading the inject, chaining and symbol-to-proc -- I do consider myself a newbie again ! :D

    ReplyDelete
  11. There is a subtle issue with your time summing up: in case the array is empty you get nil and not 0 (omitting the map here):

    irb(main):006:0> [].inject(&:+)
    => nil

    You rather want this form:

    irb(main):007:0> [].inject(0, &:+)
    => 0

    ReplyDelete
  12. I don't believe the remark about adding a ! to average_time_of_day is quite correct. To begin with, that particular method shouldn't have any side effects anyway. If you really want to manipulate an object passed into a method, call #dup on it first. The second thing is that adding a ! to methods that change data only really applies to situations where there is also a method without ! that works on/returns a copy. There are exceptions, of course, supposedly for "dangerous" situations. But suffixing ! to every method that changes data would be a bit excessive.

    Excellent advice about testing that method. Using TDD on functions makes it so easy, I always feel it's practically cheating.

    ReplyDelete
  13. Mark -- I agree. I would not suggest adding a bang to average_time_of_day. I think it more appropriate to leave it side-effect free and not use any bang methods inside it. The rest of your point is well taken.

    ReplyDelete
  14. In the interest of correcting mistakes... In Ruby parameters are NOT passed by reference. They are passed by value. You are confusing the concepts of immutability and pass by reference.

    In pass by reference, the formal parameter becomes an *alias* for the actual argument in the method body and so any changes made inside the method will be reflected afterwards.

    Simple proof of Ruby's not pass-by-reference-ness:

    irb(main):018:0> def changeme(a)
    irb(main):019:1> a = 10
    irb(main):020:1> end
    => nil
    irb(main):021:0> b = 20
    => 20
    irb(main):022:0> changeme(b)
    => 10
    irb(main):023:0> b
    => 20

    Under pass-by-reference semantics, b's value would be 10. But in Ruby it clearly isn't, it's still 20.

    However, Ruby *is* an impure or mutable language (unlike Haskell, Erlang, etc..) and so does have side effects. This means that if a method body is given a copy of a reference to an object, and makes a change to that object, that too will be reflected after the method call. It is for these types of methods that Ruby uses the bang operator.

    The reasons this is confusing to many people is that in Ruby everything is an object. But it's still pass by value, you're just passing around references to objects. Subtle, but different. (In fact, it's the exact same function passing semantics as Java minus Java's primitive types).

    ReplyDelete
  15. El Popa - Agreed. You are completely right. I've updated the post accordingly. Thanks!

    ReplyDelete
  16. hey Chris,

    I was wondering about performance when you chain things like x.map(blah).inject(blah).
    I know it is really practical and cool to use and nice to read but aren't we adding up lots of avoidable iterations(comparing with using a simple while loop) slowing down the code?

    ReplyDelete
  17. Nice post... Now I wish I had done the exercise before reading the post to see how close we were :)

    ReplyDelete
  18. # A method that generates a side effect:
    def changeme!(a)
    puts a.object_id
    a[3] = "yep"
    end

    b = {}
    b.object_id # ==>12860680
    changeme!(b)
    12860680
    b # ==>{3=>"yep"}

    # A method with no side effect:
    def changeme(a)
    puts a.object_id
    a = "rudolph"
    puts a.object_id
    end
    ==>nil
    b = {1=>2}
    ==>{1=>2}
    b.object_id
    ==>12535460
    changeme(b)
    12535460
    12459832
    ==>nil
    b
    ==>{1=>2}
    b.object_id
    ==>12535460

    ReplyDelete