16 December 2017

Hacking Jupyter Notebook HTML for Blogger

One thing Blogger sucks at is formatting code and mathematics. This is something Jupyter notebooks excel at. What I want is a way to turn a Jupyter notebook into a blog post in Blogger so I can poke around with math and programming, document it, and then post my findings where no one actually reads them (but they could!). Fortunately, Blogger does offer a way to edit HTML directly, which means it is possible to modify an HTML export from a Jupyter notebook, then cut and paste that into the HTML editor in Blogger. Or, in a fact-finding mission, paste the entirety of the HTML export into blogger, see what happens, then edit and iterate.

This is what I did for my post on iterative v. closed form Fibonacci calculations. I learned a lot hacking CSS for that post, but one thing I failed to do is try to embed the Jupyter notebook HTML within text written in blogger. Say this post.

Here's what happens when you try to embed Jupyter notebook generated HTML inside a Blogger post with text already in it:

That ain't right.
Looking at the HTML source directly, there are some obvious things we can remove:

  • <!DOCTYPE html>
  • <html> and </html> 
  • <head> and </head>, though we probably want to keep the <script> tags inside <head>. The <title> tag and everything in between it and its closing tag can go, though. <meta> can probably go, too.
Doing that, this is what you get:

Obvious edits.


Didn't really help. Our blog format right margin is being ignored. Since it's a formatting issue, searching the inline CSS for 'width' seemed like a prudent thing to do.

There are a lot of 'width' values in the CSS. Most are percentages of something, so can be safely ignored. The ones that grabbed my attention were min-width values in pixels in at-statements, such as "@media (min-width: 768px) {...}". If you remove everything within the parentheses on statements like this, the formatting becomes:

Eureka!


Now we're getting somewhere. The Jupyter notebook HTML separates itself from the blog post proper, but that might not be a bad thing. However, if I want to get rid of it, the outermost <div> tag is the most likely candidate for removal. That would be <div tabindex="-1" id="notebook" class="border-box-sizing"> and, presumably, the last </div> in the HTML...

...Only, that doesn't work. I forwent the screen capture because it looks like the previous one, and I got to use the word "forwent" in a sentence. Twice.

Pruning that much HTML should make the notebook embed-able between "standard" blogger markup, too:

Looks well behaved.


At any rate, knowing this is good enough to start automating the pruning of Jupyter notebook-generated HTML. Then I can play in Jupyter notebook, save the notebook to HTML, push the HTML through my script, then copy and paste it into a Blogger post.

I believe the British call this a bodge. The act of creating a bodge is bodging. And the code for this bodge can be found on GitHub, here.

A bodge, according to Merriam-Webster, is also "about half a peck".

A peck is a quarter bushel.

And a bushel, according to Merriam-Webster, is "any of various units of dry capacity".

Is it no wonder most of the world switched to SI units?