Sunday, September 12, 2010

It Turns Out You Can Read the Node.js Source Code

‹prev | My Chain | next›

While working with the upcoming v0.5 branch of fab.js, I came across what seemed to be a bug (or undocumented regression) in node.js. Specifically, url.parse stopped parsing URLs without protocol schemes:
node> require('url').parse('//localhost:8124/foo')
{ href: '//localhost:8124/foo'
, pathname: '//localhost:8124/foo'
}
I fixed the issue by adding "http:" to the URL. I did the right thing by submitting a pull request to Jed.

Case closed. Or so I thought.

Jed closed the issue, not by accepting my patch, but by adding a third parameter to url.parse:
node> require('url').parse('//localhost:8124/foo', false, true)
{ href: '//localhost:8124/foo'
, slashes: true
, host: 'localhost:8124'
, port: '8124'
, hostname: 'localhost'
, pathname: '/foo'
}
The second parameter is documented, but what is that third parameter? Maybe a better question is: how did Jed know about it? I googled for any information on url.parse and looked through the changelog... and saw nothing.

At this point, I could let it go and move on with my exploration of fab.js, but I have to know. How did Jed know? I'll ask if I cannot figure it out myself, but first, I do something I have never done before—ack though the node.js source.

I surely should have done this before, but as far as I know, node.js is a bunch of C code wrapped around the v8 engine (I am content to let my C skills wither). But this is my problem—"as far as I know".

In reality, there is loads of Javascript in the node.js source:
cstrom@whitefall:~/src/node-v0.2.1$ ls lib
assert.js dns.js http.js readline.js url.js
buffer.js events.js net.js repl.js utils.js
child_process.js file.js path.js string_decoder.js
crypto.js freelist.js posix.js sys.js
dgram.js fs.js querystring.js tcp.js
Damn. I cannot believe that I have spent this much time trying to improve my Javascript skills without even bothering to check if there was something in node.js that might help! Pretty dumb. I must make a mental note to always look through framework source code in the future—even if I suspect that I do not care for the language.

For now, I get started with my node.js exploration by reading through url.js. In there I find the answer to the new behavior:
function urlParse (url, parseQueryString, slashesDenoteHost) {

// ...

// figure out if it's got a host
// user@server is *always* interpreted as a hostname, and url
// resolution will treat //foo/bar as host=foo,path=bar because that's
// how the browser resolves relative URLs.
if (slashesDenoteHost || proto || rest.match(/^\/\/[^@\/]+@[^@\/]+/)) {
var slashes = rest.substr(0, 2) === "//";
if (slashes && !(proto && hostlessProtocol[proto])) {
rest = rest.substr(2);
out.slashes = true;
}
}

// ...
};
The commit that saw the introduction of the new behavior indicates that it was done to behave as expected in the most common use cases. The nice thing about it is that the behavior of url.parse(), when the third argument is true, does not change between node 0.2.0 and 0.2.1. Thus, I can update my table from yesterday:
0.2.00.2.1
node> require('url').parse('/foo')
{ href: '/foo', pathname: '/foo' }

node> require('url').parse('//localhost:8124/foo')
{ href: '//localhost:8124/foo'
, slashes: true
, host: 'localhost:8124'
, port: '8124'
, hostname: 'localhost'
, pathname: '/foo'
}

node> require('url').parse('http://localhost:8124/foo')
{ href: 'http://localhost:8124/foo'
, protocol: 'http:'
, slashes: true
, host: 'localhost:8124'
, port: '8124'
, hostname: 'localhost'
, pathname: '/foo'
}

node> require('url').parse('//localhost:8124/foo', false, true)
{ href: '//localhost:8124/foo'
, slashes: true
, host: 'localhost:8124'
, port: '8124'
, hostname: 'localhost'
, pathname: '/foo'
}
node> require('url').parse('/foo') 
{ href: '/foo', pathname: '/foo' }

node> require('url').parse('//localhost:8124/foo')
{ href: '//localhost:8124/foo'
, pathname: '//localhost:8124/foo'
}





node> require('url').parse('http://localhost:8124/foo')
{ href: 'http://localhost:8124/foo'
, protocol: 'http:'
, slashes: true
, host: 'localhost:8124'
, port: '8124'
, hostname: 'localhost'
, pathname: '/foo'
}

node> require('url').parse('//localhost:8124/foo', false, true)
{ href: '//localhost:8124/foo'
, slashes: true
, host: 'localhost:8124'
, port: '8124'
, hostname: 'localhost'
, pathname: '/foo'
}
And that is why Jed wisely opted to make use of the third argument instead of my patch.

Good stuff. I am extremely disappointed in myself for never even bothering to look through the node.js source code. There is a wealth of learning in there that I so far ignored—for no good reason. Happily, I know about it now, so I can start making up for lost time.


Day #224

No comments:

Post a Comment