Compiling Python to JS

Aug 24, 2022

Way back when I wrote a blog post about why we're using Python in this book. I still think it's a good choice, with weighty advantages over Java or Rust or JavaScript or any number of other languages we might have chosen. But one thing I've always regretted a bit about choosing Python is that there was no way to incorporate the code into the book itself, since there isn't a great way to run Python in the browser. This became especially painful when Chris and I started working on interactive widgets: running the actual browser code as part of the widget would eliminate any chance of confusion or bugs that prevent the widget from faithfully representing the browser.1

So one of the crazy projects I've been working on for the last year or so, along-side writing and editing the book, is a compiler from Python to JavaScript. You can see the results of this in the few interactive widgets we've already developed, or even more forcefully in the runnable, client-side web browser included in the bottom of Chapters 2–7.2

How to compile

My background is in programming languages, and when I'm not writing about web browsers I work on topics like compilers, static analysis, and verification. Hell, I teach the University of Utah compilers course. So, though Chris thought compiling from Python to JavaScript was madness, I felt like it could be a nice hack.

Moreover, it's a lot more fun to do this from Python than most languages. See, the Python standard library exposes the Python parser as the ast module. Because it's provided by the language itself, I knew the parse trees it produced would always be correct, and the ASTs are reasonably structured and easy to work with. I'd worked with ast before, when working on code outlines for each chapter.

So the plan was pretty simple: read the Python source code for a given chapter; parse it with ast; and then recursively traverse the resulting parse tree, converting each node to a string containing JavaScript source code. Something like this:

tree = ast.parse(python_file.read(), python_file.name) js = compile_module(tree)

The compile_module method works recursively, using a bunch of helper methods for the different kinds of Python syntax: compile_module calls compile_statement which calls compile_expr which calls things like compile_op, compile_lhs, and compile_method. Each of these functions takes a parse tree as input and produces a string (of JavaScript) as output.

Renaming methods

One thing the compiler has to do is rename methods from Python to their JavaScript equivalent. For example, in Python you call s.lower() to convert a string to lower-case; in JavaScript that's s.toLowerCase(). Or, in Python you can copy an array with a.copy(), while in JavaScript you misuse the slice call, as in a.slice(). The same happens for top-level functions like parseInt for int or console.log for print. Sometimes these are trickier, like compiling Python's isalnum method to a regex test in JavaScript.

There's also some more complicated differences that we can handle through the same "renaming" mechanism. For example, Python distinguishes between "bytes" and "strings", and has encode and decode methods that convert between them. That's good, but in this book it isn't a focus, and in fact we only ever use the utf8 encoding. Since JavaScript doesn't have a separate bytes type, I just reuse strings, meaning that encode and decode compile to no-ops.

In my compiler, this logic is handled by the compile_method and compile_function methods, which include special cases for each method like this. Of course, this would be a pain to do for general-purpose Python code, but luckily we've already been trying to avoid relying on too many obscure Python features, so the list of exceptions isn't too long.

Dropping features

The key to making the compiler project doable is to carefully choose which differences between Python and JavaScript to handle. A lot of the basic syntactic constructs, like functions, classes, loops, and conditionals are pretty similar between JavaScript and Python. And for some constructs that were just too tricky to compile to JavaScript, I always had the option of modifying the *browser* to not use that construct, or to use it in a more limited way.

For example, here's how the compiler handles Python for loops:

assert not tree.orelse ctx2 = Context("for", ctx) lhs = compile_lhs(tree.target, ctx2) rhs = compile_expr(tree.iter, ctx) body = "\n".join([ compile_stmt(line, indent=indent + INDENT, ctx=ctx2) for line in tree.body]) fstline = " "*indent + f"for (let {lhs} of {rhs}) {{\n" return fstline + body + "\n" + " " * indent + "}"

Going through this line by line, I first check that this for loop doesn't have an else clause, which is a Python feature without a clear JavaScript analog.3 That's not used in any of the chapters we've managed to compile, so I don't need to handle it.

Next, I create a Context object, which handles scoping.

Third, I compile the left hand side and right hand side of the iterator using compile_lhs and compile_expr, plus all of the lines in the loop body using compile_stmt. (The indent argument indents the code, and the ctx argument tracks scoping.) And finally I put all those pieces together into a JavaScript for-of loop.

This is simple and works for iterating over, say, an array. But it would be wrong in a lot of circumstances. For example, in Python, iterating over a dictionary means iterating over its keys, but the for-of loop doesn't do that in JavaScript.4 That's not a problem only because Chris and I have committed to not iterating over a dictionary in our browser. In this case, that's an easy commitment to make, because for i in dict.keys() is more readable anyway, but in other places this is a tougher standard to stick to.

Unavoidable differences

Other times, there's no choice but to handle differences directly. The most painful of these is truth testing. In Python, an empty array, dictionary, or similar structure is treated as a "false" value in tests, and our browser makes use of this fairly often. But in JavaScript, those things are all "true"! This is really annoying to debug since nothing crashes: control just goes the wrong way. And just rewriting our code not to do this wouldn't work, because constantly writing if len(a) instead of if a looked odd and would raise questions for readers.

So in this case, we needed to add some special runtime support. There's a truthy function that our compiler requires, and our compiler wraps every boolean test in a call to truthy. This means that something like if a compiles to if (truthy(a)) and (annoyingly) something more complex like if a and b compiles to:

if (truthy(truthy(a) && truthy(b)))

That's definitely ugly, and I could probably do more to skip truthy calls where they're not necessary (like for the result of a boolean and) but luckily I kept the number of these "runtime support" methods to a minimum: there's truthy for booleans, pysplit and pyrsplit for mocking Python's split and rsplit methods, and comparator for sorting by a key. Just about everything else is handled by the compiler, or by commitments not to use certain features.

User help

Well—except that in some cases, there's something that's too hard to add to the compiler, too convenient to avoid, and not possible to mock with a helper function. One great example of that is Python's in operator. In Python, you can use in for strings, lists, dictionaries, or any number of other data structures. But there's no equivalent general-purpose element-of operator in JavaScript. For lists and strings, you can use indexOf (which returns -1 if the element is missing), but for objects (which I use to represent dictionaries) there's no indexOf operator and you have to do something else. But there's no way for my compiler to tell whether a given use of in is working with a dictionary or a list or a string.5

So for this, and a few other things, the compiler requests help from the user in the form of a "hint". I wanted to keep these out of the Python source, so the hints are stored in a hints file in JSON form; here's some example hints from Chapter 7:

{"code": "'href' in node.attributes", "type": "dict"}, {"code": "'href' in elt.attributes", "type": "dict"}, {"code": "('TextLayout(x={}, y={}, width={}, height={}, ' + 'node={}, word={})').format(self.x, self.y, self.width, self.height, self.node, self.word)", "js": "''"}

The first two hints point to code that uses an in operator; in these hints, the type field indicates the type of the container. The third hint points to code with a complicated format call6 and replaces it with a simple empty string, which isn't really the right compilation, but doesn't matter because this code happens to be there for debugging.

The hints mechanism would be a huge annoyance if we needed to write a lot of them, but so far we have chapters 2–7 compiling with only 19 total hints (some of which are duplicates), which doesn't feel too bad. Plus, I've rigged the compiler to suggest hints when it finds something it can't compile, like this (from Chapter 4):

Could not find type key for `'=' in attrpair` Hint: {"line": 68, "code": "'=' in attrpair", "type": "???"} Could not find type key for `tag in self.SELF_CLOSING_TAGS` Hint: {"line": 99, "code": "tag in self.SELF_CLOSING_TAGS", "type": "???"} Could not find type key for `tag in self.HEAD_TAGS` Hint: {"line": 120, "code": "tag in self.HEAD_TAGS", "type": "???"} Could not find type key for `tag in self.HEAD_TAGS` Hint: {"line": 120, "code": "tag in self.HEAD_TAGS", "type": "???"} Could not find type key for `tag not in ['/head'] + self.HEAD_TAGS` Hint: {"line": 125, "code": "tag not in ['/head'] + self.HEAD_TAGS", "type": "???"}

I can just copy those to a file and fill in the question marks to write the relevant hint.

Conclusion

The compiler is on Github, same as the book. If you just want a sense of what it does, the test suite is written in a readable, literate style and should give you a good idea of what's going on.

All told, compiling Python to JavaScript is a crazy idea and it's not clear that doing it has saved time over just writing a JavaScript version of the browser. But it's been a lot of fun, and it works fairly well now. Plus, it does give me quite a bit of confidence that the JavaScript has the same behavior as the Python, and it means I don't have to think about the JavaScript version when I change things in the book.

Plus, those little widgets are fun to play with and let readers check what the expected behavior of the browser is, which probably helps with debugging.

With enough care, we could have avoided issues if the text of the book was frozen—but I can't imagine how we would have prevented bugs as we changed the book.

We hope to eventually have these for the whole book.

You'd use a flag variable and a condition after the loop.

It iterates over key-value pairs for a Map and doesn't work at all over objects, which are often used as maps in JavaScript.

I thought about making my compiler track types, but it seemed like way too much work.

Simple format calls are supported by the compiler, but here instead of a constant string the code uses the concatenation of two constant strings to avoid a long line, and that's not supported.

Web Browser Engineering Blog

Discussion about this post