Yesterday, I began my efforts to re-use epub files when generating mobi in git-scribe. The hope is that this will produce better results and DRY up the toolchain a bit.
The
do_mobi
method currently generates epub first (if the epub was previously generated, do_epub
will return immediately) and then decorates the epub for mobi use:def do_mobi return true if @done['mobi'] do_epub info "GENERATING MOBI" decorate_epub_for_mobi cmd = "kindlegen -verbose book_for_mobi.epub -o book.mobi" return false unless ex(cmd) @done['mobi'] = true endLast night, I was able to produce the Table of Contents file,
toc.html
, that the kindle uses for the Table of Contents (and to determine the start page). This is accomplished via the add_epub_toc
method in decorate_epub_for_mobi
:def decorate_epub_for_mobi add_epub_etype add_epub_toc zip_epub_for_mobi endCurrently,
add_epub_toc
only generates the toc.html
file, but that is not sufficient for mobi—I also need to modify the Open Packaging File to include the table of contents. So I add a call to the new add_html_toc_to_opf
:def add_epub_toc build_html_toc add_html_toc_to_opf endThe OPF is an XML file with three different sections that need to be updated to include
toc.html
. The three sections are:<manifest>
—the actual contents of the epub/mobi file. It associates an ID with the file / url<spine>
—describes the order in which the documents are read (and which are optional)<guide>
—points to "meta" documents (the table of contents, the cover, etc)
toc.html
to each of the three sections:def add_html_toc_to_opf Dir.chdir('book.epub.d/OEBPS') do opf = File.read('content.opf') opf = add_html_toc_to_opf_manifest(opf) opf = add_html_toc_to_opf_spine(opf) opf = add_html_toc_to_opf_guide(opf) File.open('content.opf', 'w') do |f| f.puts opf end end endEach of those three sections follows a similar pattern—replace a known element with the same element plus
toc.html
. Adding toc.html
to the <manifest>
section of the OPF works thusly:def add_html_toc_to_opf_manifest(opf) opf.sub(/<item id="ncxtoc".+?>/) { |s| s + "\n" + %q| <item id="htmltoc" | + %q|media-type="application/xhtml+xml" | + %q|href="toc.html"/>| } endAnd, happily, that works! After regenerating the mobi version of The SPDY Book and replacing the previous version on my Kindle, I have a Table of Contents, a correct start page, and pretty decent formatting. There are still a few details in need of cleaning up, but this approach seems promising.
At some point, I will certainly need to clean this code. The fact that so many methods are named the same thing screams for a module or class to encapsulate functionality. I also completely lack testing of any kind. The problem, of course, is that I do not know my target state. I have to keep guessing, trying it on the Kindle and then correcting my mistakes. That is all well and good for spiking towards understanding, but it will not hold up as a robust implementation going forward.
But all that is for another day.
Day #111
No comments:
Post a Comment