Python Networking from JavaScript

Sep 09, 2022

As part of my ongoing project to run the WBE browser in your browser, by compiling it from Python to JavaScript, I need to somehow hook up the WBE browser’s networking code to use browser APIs. This post describes how I did it.

How the browser uses sockets

As a quick reminder, the WBE browser makes network requests by opening sockets, writing HTTP requests to them, and parsing the resulting HTTP responses. The sockets-using part of the code looks like this:

s = socket.socket( family=socket.AF_INET, type=socket.SOCK_STREAM, proto=socket.IPPROTO_TCP, ) s.connect((host, port))

if scheme == "https": ctx = ssl.create_default_context() s = ctx.wrap_socket(s, server_hostname=host)

s.send(...)

response = s.makefile("r", encoding="utf8", newline="\r\n") statusline = response.readline()

This works great in Python, which offers easy access to the sockets API provided, ultimately, by the operating system. But it doesn’t work in the browser, which doesn’t provide a sockets API at all. Instead, browser networking is done almost exclusively via fetch or related APIs, which differ from sockets in a lot of ways.

Importantly, since the goal is to compile the browser from Python to JavaScript, I don’t want to make too many—ideally, any—modifications on the Python side. For example, I don’t want the browser to support multiple “networking backends”, nor do I want to switch to a higher-level “request” API on the Python side, because as much as possible I want the same code running in both Python and JavaScript, lest readers check their Python implementation against the JavaScript one and have to wonder where differences in behavior come from.

So the solution I came up with is to create a JavaScript object, also called socket (and ssl) that offers the same methods as the Python library of that name, but forwards all the requests to browser APIs. That way, the JavaScript code can faithfully call

let s = (socket.socket({"family": socket.AF_INET, "type": socket.SOCK_STREAM, "proto": socket.IPPROTO_TCP})); (s.connect([host, port]));

if (truthy((scheme === "https"))) { ctx = (ssl.create_default_context()); s = (ctx.wrap_socket(s, {"server_hostname": host})); }

(s.send(...));

let response = s.makefile("r", {"encoding": "utf8", "newline": "\r\n"}); let statusline = (response.readline());

and yet this will result in calls to fetch and related APIs. As with the compiler itself, this requires understanding the limited way we are actually using the sockets API in the browser and coming up with some hacks that make those uses work in the browser.

A Mock socket

The most important API provided by the Python socket module is the socket class, and it’s pretty straight-forward to define a class like that in JavaScript. Of course, this class won’t wrap an actual socket—no such thing exists in the browser—but it will have a similar API. For example, you can construct it:

class socket { constructor(params) { console.assert(params.family == "inet", "socket family must be inet") console.assert(params.type == "stream", "socket type must be stream") console.assert(params.proto == "tcp", "socket proto must be tcp") } }

Note that the actual parameter values don’t matter (since we’re not actually creating a socket) and instead they’re just checked against the expected values. This code, by the way, is real—you can find the full thing in rt.js in the WBE repository.

Next, the connect and send methods also don’t actually have to “do” anything; they just record their arguments so we know what request we’re supposed to make later:

class socket { connect(pair) { let [host, port] = pair; this.host = host; this.port = port; this.input = ""; this.closed = true; this.scheme = "http"; }

send(text) { this.input += text; }

close() { this.closed = true; } }

This means that by the time the compiled browser calls makefile, this fake socket object has recorded the host name, port, and HTTP request, and is ready to produce a result.

Parsing the Request

One big difference between fetch and sockets is that, when using fetch, you don’t need to write out raw HTTP requests. This is basically always a good thing (HTTP syntax has some sharp corners) but for our use case it’s a bit of a disaster. After all, to call fetch, we need a URL, a method type, and possibly POST data. But the only way the compiled browser passes us that information is through the s.send(...) call.

The only solution I could find is to make our fake socket object parse the HTTP request to pull out the path and method name:

let [line1] = this.input.split("\r\n", 1); let [method, path, protocol] = line1.split(" "); this.url = this.scheme + "://" + this.host + path;

I do this in the makefile method, which the compiled browser calls after calling send and before doing anything else, though I suppose it would have been just as fine to do it in send. Note that this is not a fool-proof HTTP parser, but it is good enough for anything the compiled browser might do, so it’s good enough for my purposes.

The HTTP request sent by the compiled browser also has headers and sometimes a POST body. For now let’s ignore those, and go on to send the request.

Producing a Response

Now that we know what URL the browser is requesting, we can request it through the browser using fetch. That looks like this:

let response = await fetch(path);

I’ll get back to the “await” part in a second, but for now note that the “response” defined here is basically the HTTP response body, while the compiled browser is expecting a complete HTTP response. So we need turn the fetch response object back into a full HTTP response:

this.output = "HTTP/1.0 " + response.status + \ " " + response.statusText + "\r\n"; for (let [header, value] of response.headers.entries()) { this.output += header + ": " + value + "\r\n"; } this.output += "\r\n"; this.output += await response.text(); this.closed = false;

Now the output field stores the HTTP response that we want the compiled browser to read. But annoyingly, we can’t just return it as a string. Instead, the browser wants to read it line by line with readline.

So we need to fake that too. To do so, I store a cursor idx into the HTTP response, and increment it by one line every time readline is called:

class socket { async makefile(mode, params) { // … this.idx = 0; return this; }

readline() { console.assert(!this.closed, "Attempt to read from a closed socket") let nl = this.output.indexOf("\r\n", this.idx); if (nl === -1) nl = this.output.length - 2; let line = this.output.substring(this.idx, nl + 2); this.idx = nl + 2; return line; }

read() { console.assert(!this.closed, "Attempt to read from a closed socket"); let rest = this.output.substring(this.idx); this.idx = this.output.length; return rest; } }

Now successive calls to readline will fetch each line of the response, one by one, until it reaches the end, exactly as the compiled browser expects.

By the way, some of you may be wondering why I went through this effort of packaging the browser’s fetch result into a string, and then reading it line by line, instead of creating an array of lines right away. I considered that as an alternative, but doing so meant that if an HTTP status message or header value ever contained a newline, you’d see different results in Python and JavaScript,1 and I couldn’t rule out that I’d want to demonstrate that at some point.

Async/await

One issue I papered over above is that Python’s socket library uses blocking APIs like send and makefile, while the web’s fetch and related APIs don’t block; they return Promises. In some cases this isn’t a problem—the fake socket’s send method doesn’t do anything anyway—but for makefile, which actually makes a network request, it’s a problem.

My solution to this is JavaScript’s support for async/await. This basically lets you write code that looks like it’s blocking, but internally it’s all with Promises. For example, this code:

this.output += "\r\n"; this.output += await response.text(); this.closed = false;

Is run as if it were instead:

this.output += “\r\n”; return response.text().then((res) => { this.output += res; this.closed = false; });

The simple way of putting it is that async/await let you write code that looks like it’s using a blocking API, but have it automatically be transformed into using callbacks instead.

In JavaScript, await function can only be used inside a function declared async, and these functions always return a Promise, so basically you want to call them with await as well. This means that using async somewhere can “infect” its callers and cause you to rewrite more and more of your code using async/await.

This is an annoying state of affairs sometimes referred to as “function colors”.2

Luckily, in this case I happen to control all of the JavaScript code, and can easily write it a different way by modifying the compiler. Basically, I modified my Python to JavaScript compiler to output not “normal” JavaScript code, but maximally async JavaScript code, where every function is defined to be async, and every function call (except to built-in functions) uses await.

This way, the compiled JavaScript code calls await s.makefile(...) instead of just s.makefile, and so looks a lot like the blocking Python code, but can still use the underlying non-blocking fetch API.

Edge cases with async/await

Once I hit upon the idea of compiling all of the Python code to async/await Python code I knew I was close to getting network calls working in the compiled browser. But there were a few annoyances left to resolve.

One is that JavaScript class constructors can’t be async, which is a problem any time a constructor calls another function. For example, in Chapter 3, the Layout class has this in its constructor:

class Layout: def __init__(self, tokens): # ...

for tok in tokens: self.token(tok) self.flush()

This would compile into something like this in JavaScript:

class Layout { constructor(tokens) { // ... for (let tok of tokens) { (await this.token(tok)); } (await this.flush()); } }

The issue is that we can’t use await in the constructor since it’s not declared async, and we can’t declare a constructor async because that’s not allowed.

My solution to this is to not actually use JavaScript constructors when compiling. Instead, the constructor code is moved to an async init method which returns this, and to construct a Layout object, instead of calling new Layout(tokens), you call await (new Layout().init(tokens)).

A kind-of similar issue happened when compiling Python list comprehensions. I wanted to compile a list comprehension like [f(x) for x in l if g(x)] into calls to JavaScript’s map and filter methods, like this:

l.filter((x) => g(x)).map((x) => f(x))

However, this won’t work with async/await, because passing an async function to map and filter won’t do the right thing. With map, my compiler instead uses Promise.all, like this:3

await Promise.all(l.map(async (x) => await f(x)))

This is a little tricky, but the map call converts the list l into a list of promises, which Promise.all converts back into a single promise containing a list.

Filter doesn’t have a similar trick, unfortunately, so here I just had to write a helper function:

async function asyncfilter(fn, arr) { let out = []; for (var i = 0; i < arr.length; i++) { if (await fn(arr[i])) { out.push(arr[i]); } } return out; }

Because these “async versions” of map and filter are so wordy, I actually only output them if I detect that the list comprehension needs to use await; otherwise I use the shorter map/filter form.

The Same-Origin Policy

One last, stark way in which the browser networking APIs are very different from the Python socket API is that, in general, JavaScript code is not allowed to just connect to and read content from arbitrary web pages. (For security reasons, as covered in Chapter 10.)

Luckily, the main web pages I use as an example in the book are the book itself, and if I host the compiled browser on the book website, there’s no security issue with it accessing itself.

In later chapters, these security restrictions become more of a problem. For example, in Chapter 7, we add the capability to click on links, and the book has some links to external websites. If you click on one of those in the compiled browser, it won’t work due to these security restrictions.

In principle, it might be possible to fix this. For example, host basically an open proxy on the book domain to tunnel requests from the compiled browser to other pages on the internet. But that seems dangerous (I hear open proxies are bad) and I think readers will be understanding of browser security restrictions, since after all the book itself covers them. So for now, I’ve made my fake sockets restricted to only allow URLs to the browser.engineering host name.

Conclusion

If you want to see how all of this works at run time, please do head over to any of the first few chapters of the book,4 scroll to the bottom, and play with the embedded browser widget. You’ll see that you can browser around the browser.engineering web page. If you’ve been reading the book and have your own Python browser, you should see the widget’s close resemblance to it. And if you want to read the compiler or this fake socket code, do check out the repo.

That is, Python would mis-parse it, while JavaScript would inexplicably parse things correctly.

To me it is reminiscent of writing Haskell code and finding you have to move more and more code to happen inside of a monad. Async/await is a close cousin of continuation-passing style, which basically means writing all of your code, from the start, in the free monad.

In this case, defining the inline function is unnecessary, but my compiler isn’t smart enough to know that.

As of this writing, Chapters 2–10 have associated widgets.

Web Browser Engineering Blog