Thursday, May 19, 2011

It Even Works in Python

‹prev | My Chain | next›

I have effectively dead-ended attempting to get Ruby to decompress multiple SPDY / ZLib packets. It works fine in node.js, but I cannot translate that knowledge into workable Ruby code. So today, I will try working through python code.

First up, I verify that it still works (it is somewhat old code). I copy the data from my node-spdy work into a python REPL and use the decompressor from the example code:
>>> d = Decompressor("optionsgetheadpostputdeletetraceacceptaccept-charsetaccept-encodingaccept-languageauthorizationexpectfromhostif-modified-sinceif-matchif-none-matchif-rangeif-unmodifiedsincemax-forwardsproxy-authorizationrangerefererteuser-agent100101200201202203204205206300301302303304305306307400401402403404405406407408409410411412413414415416417500501502503504505accept-rangesageetaglocationproxy-authenticatepublicretry-afterservervarywarningwww-authenticateallowcontent-basecontent-encodingcache-controlconnectiondatetrailertransfer-encodingupgradeviawarningcontent-languagecontent-lengthcontent-locationcontent-md5content-rangecontent-typeetagexpireslast-modifiedset-cookieMondayTuesdayWednesdayThursdayFridaySaturdaySundayJanFebMarAprMayJunJulAugSepOctNovDecchunkedtext/htmlimage/pngimage/jpgimage/gifapplication/xmlapplication/xhtmltext/plainpublicmax-agecharset=iso-8859-1utf-8gzipdeflateHTTP/1.1statusversionurl\x00")
>>> d1 = [0x38,0xea,0xdf,0xa2,0x51,0xb2,0x62,0xe0,0x62,0x60,0x83,0xa4,0x17,0x06,0x7b,0xb8,0x0b,0x75,0x30,0x2c,0xd6,0xae,0x40,0x17,0xcd,0xcd,0xb1,0x2e,0xb4,
0x35,0xd0,0xb3,0xd4,0xd1,0xd2,0xd7,0x02,0xb3,0x2c,0x18,0xf8,0x50,0x73,0x2c,0x83,0x9c,0x67,0xb0,0x3f,0xd4,0x3d,0x3a,0x60,0x07,0x81,0xd5,0x99,0xeb,0x40,0xd4,
0x1b,0x33,0xf0,0xa3,0xe5,0x69,0x06,0x41,0x90,0x8b,0x75,0xa0,0x4e,0xd6,0x29,0x4e,0x49,0xce,0x80,0xab,0x81,0x25,0x03,0x06,0xbe,0xd4,0x3c,0xdd,0xd0,0x60,0x9d,
0xd4,0x3c,0xa8,0xa5,0x2c,0xa0,0x3c,0xce,0xc0,0x0f,0x4a,0x08,0x39,0x20,0xa6,0x15,0x30,0xe3,0x19,0x18,0x30,0xb0,0xe5,0x02,0x0b,0x97,0xfc,0x14,0x06,0x66,0x77,
0xd7,0x10,0x06,0xb6,0x62,0x60,0x7a,0xcc,0x4d,0x65,0x60,0xcd,0x28,0x29,0x29,0x28,0x66,0x60,0x06,0x79,0x9c,0x51,0x9f,0x81,0x0b,0x91,0x5b,0x19,0xd2,0x7d,0xf3,
0xab,0x32,0x73,0x72,0x12,0xf5,0x4d,0xf5,0x0c,0x14,0x34,0x00,0x8a,0x30,0x34,0xb4,0x56,0xf0,0xc9,0xcc,0x2b,0xad,0x50,0xa8,0xb0,0x30,0x8b,0x37,0x33,0xd1,0x54,
0x70,0x04,0x7a,0x3e,0x35,0x3c,0x35,0xc9,0x3b,0xb3,0x44,0xdf,0xd4,0xd8,0x44,0xcf,0x18,0xa8,0xcc,0xdb,0x23,0xc4,0xd7,0x47,0x47,0x21,0x27,0x33,0x3b,0x55,0xc1,
0x3d,0x35,0x39,0x3b,0x5f,0x53,0xc1,0x39,0x03,0x58,0xec,0xa4,0xea,0x1b,0x1a,0xe9,0x01,0x7d,0x6a,0x62,0x04,0x52,0x16,0x9c,0x98,0x96,0x58,0x94,0x09,0xd5,0xc4,
0xc0,0x0e,0x0d,0x7c,0x06,0x0e,0x58,0x9c,0x00,0x00,0x00,0x00,0xff,0xff]
>>> struct.pack('BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB', *d1)
"8\xea\xdf\xa2Q\xb2b\xe0b`\x83\xa4\x17\x06{\xb8\x0bu0,\xd6\xae@\x17\xcd\xcd\xb1.\xb45\xd0\xb3\xd4\xd1\xd2\xd7\x02\xb3,\x18\xf8Ps,\x83\x9cg\xb0?\xd4=:`\x07\x81\xd5\x99\xeb@\xd4\x1b3\xf0\xa3\xe5i\x06A\x90\x8bu\xa0N\xd6)NI\xce\x80\xab\x81%\x03\x06\xbe\xd4<\xdd\xd0`\x9d\xd4<\xa8\xa5,\xa0<\xce\xc0\x0fJ\x089 \xa6\x150\xe3\x19\x180\xb0\xe5\x02\x0b\x97\xfc\x14\x06fw\xd7\x10\x06\xb6b`z\xccMe`\xcd())(f`\x06y\x9cQ\x9f\x81\x0b\x91[\x19\xd2}\xf3\xab2sr\x12\xf5M\xf5\x0c\x144\x00\x8a04\xb4V\xf0\xc9\xcc+\xadP\xa8\xb00\x8b73\xd1Tp\x04z>5<5\xc9;\xb3D\xdf\xd4\xd8D\xcf\x18\xa8\xcc\xdb#\xc4\xd7GG!'3;U\xc1=59;_S\xc19\x03X\xec\xa4\xea\x1b\x1a\xe9\x01}jb\x04R\x16\x9c\x98\x96X\x94\t\xd5\xc4\xc0\x0e\r|\x06\x0eX\x9c\x00\x00\x00\x00\xff\xff"
>>> d(struct.pack('BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB', *d1))
'\x00\n\x00\x06accept\x00?text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\x00\x0eaccept-charset\x00\x1eISO-8859-1,utf-8;q=0.7,*;q=0.3\x00\x0faccept-encoding\x00\x11gzip,deflate,sdch\x00\x0faccept-language\x00\x0een-US,en;q=0.8\x00\x04host\x00\x0flocalhost:10000\x00\x06method\x00\x03GET\x00\x06scheme\x00\x05https\x00\x03url\x00\x01/\x00\nuser-agent\x00gMozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.30 Safari/534.30\x00\x07version\x00\x08HTTP/1.1'
>>> d2 = [0x42,0x8a,0x02,0x66,0x60,0x60,0x0e,0xad,0x60,0xe4,0xd1,0x4f,0x4b,0x2c,0xcb,0x04,0x66,0x33,0x3d,0x20,0x31,0x58,0x42,0x14,0x00,0x00,0x00,0xff,0xff]
>>> struct.pack('BBBBBBBBBBBBBBBBBBBBBBBBBBBBB', *d2)
'B\x8a\x02f``\x0e\xad`\xe4\xd1OK,\xcb\x04f3= 1XB\x14\x00\x00\x00\xff\xff'
>>> d(struct.pack('BBBBBBBBBBBBBBBBBBBBBBBBBBBBB', *d2))
'\x00\n\x00\x06accept\x00\x03*/*\x00\x0eaccept-charset\x00\x1eISO-8859-1,utf-8;q=0.7,*;q=0.3\x00\x0faccept-encoding\x00\x11gzip,deflate,sdch\x00\x0faccept-language\x00\x0een-US,en;q=0.8\x00\x04host\x00\x0flocalhost:10000\x00\x06method\x00\x03GET\x00\x06scheme\x00\x05https\x00\x03url\x00\x0c/favicon.ico\x00\nuser-agent\x00gMozilla/5.0 (X11; Linux x86_64) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.30 Safari/534.30\x00\x07version\x00\x08HTTP/1.1'
OK, so now I can decompress two SPDY packets in node.js and Python. Grrr... Ruby: you are making me so mad.

Working through the Python code to see if I can learn anything, I start with the object initialization:
    def __init__(self, dictionary=None):
self.dictionary = dictionary
self.st = _z_stream()
err = _zlib.inflateInit2_(C.byref(self.st), 15, ZLIB_VERSION, C.sizeof(self.st))

assert err == Z_OK, err # FIXME: more specific error
The most important bits in there are:
  • setting an instance attribute that holds the dictionary
  • builds a Z_STREAM
  • initializes the Z_STREAM for inflate
The only difference here than what I have seen elsewhere is the inflateInit2 method (instead of the inflateInit). I cannot imagine this makes much of a difference, but it is something to try as I try to get this working in ruby.

The actual decompression takes place in the __call__ method:
    def __call__(self, input):
outbuf = C.create_string_buffer(CHUNK)
self.st.avail_in = len(input)
self.st.next_in = C.cast(C.c_char_p(input), C.POINTER(C.c_ubyte))
self.st.avail_out = CHUNK
self.st.next_out = C.cast(outbuf, C.POINTER(C.c_ubyte))
err = _zlib.inflate(C.byref(self.st), Z_SYNC_FLUSH)

if err == Z_NEED_DICT:
assert self.dictionary, "no dictionary provided" # FIXME: more specific error
dict_id = _zlib.adler32(
0L,
C.cast(C.c_char_p(self.dictionary), C.POINTER(C.c_ubyte)),
len(self.dictionary)
)
# assert dict_id == self.st.adler, 'incorrect dictionary (%s != %s)' % (dict_id, self.st.adler)
err = _zlib.inflateSetDictionary(
C.byref(self.st),
C.cast(C.c_char_p(self.dictionary), C.POINTER(C.c_ubyte)),
len(self.dictionary)
)
assert err == Z_OK, err # FIXME: more specific error
err = _zlib.inflate(C.byref(self.st), Z_SYNC_FLUSH)
if err in [Z_OK, Z_STREAM_END]:
return outbuf[:CHUNK-self.st.avail_out]
else:
raise AssertionError, err # FIXME: more specific error
The bulk of the method involves assigning the dictionary, but the important stuff is at the beginning. Each time the decompression was called—as in these cases:
>>> d(struct.pack('B…', *d1))
>>> d(struct.pack('B…', *d2))
...then each of the following occurs:
  • a new output buffer is initialized
  • Z_STREAM's avail_in is set to the length of the data being decompressed
  • Z_STREAM's next_in is set to a pointer to the data itself
  • Z_STREAM's avail_out is set to an arbitrary CHUNK size
  • Z_STREAM's next_out is set to a pointer to the output buffer
  • the stream is inflated (with the dictionary, if necessary)
  • the data in the output buffer is returned
As best I can tell, this is exactly what I am doing in Ruby land.

Next I break that down into a pure python script without the object stuff:



That Python code looks very, very similar to my Ruby/FFI code:



They look so similar that I do not see any difference in overall approach. Both hit the above bullet points. Clearly, the syntax that I use for the Ruby version is OK otherwise the first packet would not work. But something just does not translate when I set the next_in pointer to the next packet data in Ruby land.

At this point, I am ready to put ruby on the back burner in favor of node.js. I will continue to try to resolve this in spare time, but I need to make progress in my understanding of SPDY, not just Ruby + FFI idiosyncrasies.

(full gist for ruby & python code)


Day #25

No comments:

Post a Comment