Saturday, September 11, 2010

URL Parsing in Node.js 0.2.0/0.2.1

‹prev | My Chain | next›

I nearly finished my investigation into the new hotness that will be v0.5 fab.js last night. Up tonight, I plan to investigate the built-in HTML templating tools in the upcoming fab.js.

My initial thought on the new templating is that it could be very useful for small pages, but might get painful for more involved stuff. I have a medium sized HTML page in my (fab) game that ought to be a good starting point.

Unfortunately, when I try to access the server, I get nothing but 404s. Geez. I thought this worked the last time I was working with it. As far as git status is concerned, I have not made any changes. Could it be the new version (0.2.1) of node.js that I installed last night?

Ugh. Looks like there will be no investigation of HTML in fabjs tonight.

I begin troubleshooting with curl:
cstrom@whitefall:~$ curl -i http://localhost:4011/javascript/raphael.js
HTTP/1.1 404 Not Found
Content-Type: text/html
Connection: keep-alive
Transfer-Encoding: chunked

<p>ENOENT, No such file or directory './html//localhost:4011/javascript/raphael.js.html'</p>
Whoa. That is odd. It should be looking for just the raphael.js file, but it looks like it is trying to find /localhost:4011/javascript/raphael.js.html—everything after the HTTP protocol.

I add some spy code to see what the request URL is:
          console.log(inspect(head));
Checking the backend, I find:
{ method: 'GET'
, headers:
{ 'user-agent': 'curl/7.19.7 (i486-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15'
, host: 'localhost:4011'
, accept: '*/*'
}
, url:
{ href: '//localhost:4011/javascript/raphael.js'
, pathname: '//localhost:4011//raphael.js'
, capture: []
}
}
Pathname is "//localhost:4011//raphael.js"? What is going on there?

I find it hard to believe that a regression in node.js could have occurred, but I will start there. I set up the basic hello world app with some console debug statements to see what the URL and pathname are:
#!/usr/bin/env node

var http = require('http');

var inspect = require("sys").inspect;

http.createServer(function (request, response) {
console.log(request.url);
console.log(require('url').parse(request.url).pathname);


response.writeHead(200, {'Content-Type': 'text/plain'});
response.end('Hello World\n');
}).listen(8124);

console.log('Server running at http://127.0.0.1:8124/');
I then access http://localhost:8124/foo via curl and see the following in the server output:
cstrom@whitefall:~/tmp$ ./test.js
Server running at http://127.0.0.1:8124/
/foo
/foo
OK. That seems to work just fine.

Looking back in the fab.js code, I see that the head value is being set thusly:
        {
method: req.method,
headers: req.headers,
url: url.parse( "//" + req.headers.host + req.url )
}
Hunh. That seems like and strange URL to ask url.parse to parse. It also looks very much like the bogus pathname that I am seeing. I do believe that I am on to something here. I console.log that in my test nodejs server:
http.createServer(function (request, response) {
console.log(request.url);
console.log(require('url').parse(request.url).pathname);
console.log(require('url').parse( "//" + request.headers.host + request.url ));

response.writeHead(200, {'Content-Type': 'text/plain'});
response.end('Hello World\n');
}).listen(8124);
... and, sure enough, I get that odd-looking URL:
{ href: '//localhost:8124/foo'
, pathname: '//localhost:8124/foo'
}
Switching back to node 0.2.0, I see this instead:
{ href: '//localhost:8124/foo'
, slashes: true
, host: 'localhost:8124'
, port: '8124'
, hostname: 'localhost'
, pathname: '/foo'
}
Aha! So something has changed in between node 0.2.0 and 0.2.1—parsing weird URLs.

A side-by-side comparison:
0.2.00.2.1
node> require('url').parse('/foo')
{ href: '/foo', pathname: '/foo' }

node> require('url').parse('//localhost:8124/foo')
{ href: '//localhost:8124/foo'
, slashes: true
, host: 'localhost:8124'
, port: '8124'
, hostname: 'localhost'
, pathname: '/foo'
}

node> require('url').parse('http://localhost:8124/foo')
{ href: 'http://localhost:8124/foo'
, protocol: 'http:'
, slashes: true
, host: 'localhost:8124'
, port: '8124'
, hostname: 'localhost'
, pathname: '/foo'
}
node> require('url').parse('/foo') 
{ href: '/foo', pathname: '/foo' }

node> require('url').parse('//localhost:8124/foo')
{ href: '//localhost:8124/foo'
, pathname: '//localhost:8124/foo'
}





node> require('url').parse('http://localhost:8124/foo')
{ href: 'http://localhost:8124/foo'
, protocol: 'http:'
, slashes: true
, host: 'localhost:8124'
, port: '8124'
, hostname: 'localhost'
, pathname: '/foo'
}
I would hazard to guess that Jed opted for the odd URL so as not to presume a protocol scheme. Since that works fine with nodejs 0.2.0, but not 0.2.1, this not longer seems to be a viable option. Since the behaviors with an HTTP scheme are identical between the two versions, I opt for that in my fork of fab.js v0.5:
        {
method: req.method,
headers: req.headers,
url: url.parse( "http://" + req.headers.host + req.url )
}
And just like that, my (fab) game is working with node.js 0.2.1.

OK tomorrow I will fiddle with the HTML (fab) apps.


Day #223

2 comments:

  1. I rather prefer the 0.2.0 way. It's been used to great effect in Ruby apps, and browsers interpret it as "same protocol, different everything else."

    I might have to file a bug.

    ReplyDelete
  2. it looks like this behavior is intentional:

    http://github.com/ry/node/commit/0e311717b5bbec7f3fbf0b06592bfa19cc11becc

    i'm not sure i agree, but you can use the old method by using url.parse( url, false, true ).

    ReplyDelete