Python Networking from JavaScript
As part of my ongoing project to run the WBE browser in your browser, by compiling it from Python to JavaScript, I need to somehow hook up the WBE browser’s networking code to use browser APIs. This post describes how I did it.
How the browser uses sockets
As a quick reminder, the WBE browser makes network requests by opening sockets, writing HTTP requests to them, and parsing the resulting HTTP responses. The sockets-using part of the code looks like this:
s = socket.socket(
family=socket.AF_INET,
type=socket.SOCK_STREAM,
proto=socket.IPPROTO_TCP,
)
s.connect((host, port))
if scheme == "https":
ctx = ssl.create_default_context()
s = ctx.wrap_socket(s, server_hostname=host)
s.send(...)
response = s.makefile("r", encoding="utf8", newline="\r\n")
statusline = response.readline()
This works great in Python, which offers easy access to the sockets API provided, ultimately, by the operating system. But it doesn’t work in the browser, which doesn’t provide a sockets API at all. Instead, browser networking is done almost exclusively via fetch
or related APIs, which differ from sockets in a lot of ways.
Importantly, since the goal is to compile the browser from Python to JavaScript, I don’t want to make too many—ideally, any—modifications on the Python side. For example, I don’t want the browser to support multiple “networking backends”, nor do I want to switch to a higher-level “request” API on the Python side, because as much as possible I want the same code running in both Python and JavaScript, lest readers check their Python implementation against the JavaScript one and have to wonder where differences in behavior come from.
So the solution I came up with is to create a JavaScript object, also called socket
(and ssl
) that offers the same methods as the Python library of that name, but forwards all the requests to browser APIs. That way, the JavaScript code can faithfully call
let s = (socket.socket({"family": socket.AF_INET, "type": socket.SOCK_STREAM, "proto": socket.IPPROTO_TCP}));
(s.connect([host, port]));
if (truthy((scheme === "https"))) {
ctx = (ssl.create_default_context());
s = (ctx.wrap_socket(s, {"server_hostname": host}));
}
(s.send(...));
let response = s.makefile("r", {"encoding": "utf8", "newline": "\r\n"});
let statusline = (response.readline());
and yet this will result in calls to fetch
and related APIs. As with the compiler itself, this requires understanding the limited way we are actually using the sockets API in the browser and coming up with some hacks that make those uses work in the browser.
A Mock socket
The most important API provided by the Python socket
module is the socket
class, and it’s pretty straight-forward to define a class like that in JavaScript. Of course, this class won’t wrap an actual socket—no such thing exists in the browser—but it will have a similar API. For example, you can construct it:
class socket {
constructor(params) {
console.assert(params.family == "inet", "socket family must be inet")
console.assert(params.type == "stream", "socket type must be stream")
console.assert(params.proto == "tcp", "socket proto must be tcp")
}
}
Note that the actual parameter values don’t matter (since we’re not actually creating a socket) and instead they’re just checked against the expected values. This code, by the way, is real—you can find the full thing in rt.js
in the WBE repository.
Next, the connect
and send
methods also don’t actually have to “do” anything; they just record their arguments so we know what request we’re supposed to make later:
class socket {
connect(pair) {
let [host, port] = pair;
this.host = host;
this.port = port;
this.input = "";
this.closed = true;
this.scheme = "http";
}
send(text) {
this.input += text;
}
close() {
this.closed = true;
}
}
This means that by the time the compiled browser calls makefile
, this fake socket
object has recorded the host name, port, and HTTP request, and is ready to produce a result.
Parsing the Request
One big difference between fetch
and sockets is that, when using fetch
, you don’t need to write out raw HTTP requests. This is basically always a good thing (HTTP syntax has some sharp corners) but for our use case it’s a bit of a disaster. After all, to call fetch
, we need a URL, a method type, and possibly POST data. But the only way the compiled browser passes us that information is through the s.send(...)
call.
The only solution I could find is to make our fake socket
object parse the HTTP request to pull out the path and method name:
let [line1] = this.input.split("\r\n", 1);
let [method, path, protocol] = line1.split(" ");
this.url = this.scheme + "://" + this.host + path;
I do this in the makefile
method, which the compiled browser calls after calling send
and before doing anything else, though I suppose it would have been just as fine to do it in send
. Note that this is not a fool-proof HTTP parser, but it is good enough for anything the compiled browser might do, so it’s good enough for my purposes.
The HTTP request sent by the compiled browser also has headers and sometimes a POST body. For now let’s ignore those, and go on to send the request.
Producing a Response
Now that we know what URL the browser is requesting, we can request it through the browser using fetch
. That looks like this:
let response = await fetch(path);
I’ll get back to the “await
” part in a second, but for now note that the “response
” defined here is basically the HTTP response body, while the compiled browser is expecting a complete HTTP response. So we need turn the fetch
response object back into a full HTTP response:
this.output = "HTTP/1.0 " + response.status + \
" " + response.statusText + "\r\n";
for (let [header, value] of response.headers.entries()) {
this.output += header + ": " + value + "\r\n";
}
this.output += "\r\n";
this.output += await response.text();
this.closed = false;
Now the output
field stores the HTTP response that we want the compiled browser to read. But annoyingly, we can’t just return it as a string. Instead, the browser wants to read it line by line with readline
.
So we need to fake that too. To do so, I store a cursor idx
into the HTTP response, and increment it by one line every time readline
is called:
class socket {
async makefile(mode, params) {
// …
this.idx = 0;
return this;
}
readline() {
console.assert(!this.closed,
"Attempt to read from a closed socket")
let nl = this.output.indexOf("\r\n", this.idx);
if (nl === -1) nl = this.output.length - 2;
let line = this.output.substring(this.idx, nl + 2);
this.idx = nl + 2;
return line;
}
read() {
console.assert(!this.closed,
"Attempt to read from a closed socket");
let rest = this.output.substring(this.idx);
this.idx = this.output.length;
return rest;
}
}
Now successive calls to readline
will fetch each line of the response, one by one, until it reaches the end, exactly as the compiled browser expects.
By the way, some of you may be wondering why I went through this effort of packaging the browser’s fetch
result into a string, and then reading it line by line, instead of creating an array of lines right away. I considered that as an alternative, but doing so meant that if an HTTP status message or header value ever contained a newline, you’d see different results in Python and JavaScript,1 and I couldn’t rule out that I’d want to demonstrate that at some point.
Async/await
One issue I papered over above is that Python’s socket library uses blocking APIs like send
and makefile
, while the web’s fetch
and related APIs don’t block; they return Promises. In some cases this isn’t a problem—the fake socket
’s send
method doesn’t do anything anyway—but for makefile
, which actually makes a network request, it’s a problem.
My solution to this is JavaScript’s support for async/await. This basically lets you write code that looks like it’s blocking, but internally it’s all with Promises. For example, this code:
this.output += "\r\n";
this.output += await response.text();
this.closed = false;
Is run as if it were instead:
this.output += “\r\n”;
return response.text().then((res) => {
this.output += res;
this.closed = false;
});
The simple way of putting it is that async
/await
let you write code that looks like it’s using a blocking API, but have it automatically be transformed into using callbacks instead.
In JavaScript, await
function can only be used inside a function declared async
, and these functions always return a Promise, so basically you want to call them with await
as well. This means that using async
somewhere can “infect” its callers and cause you to rewrite more and more of your code using async
/await
.
This is an annoying state of affairs sometimes referred to as “function colors”.2
Luckily, in this case I happen to control all of the JavaScript code, and can easily write it a different way by modifying the compiler. Basically, I modified my Python to JavaScript compiler to output not “normal” JavaScript code, but maximally async
JavaScript code, where every function is defined to be async
, and every function call (except to built-in functions) uses await
.
This way, the compiled JavaScript code calls await s.makefile(...)
instead of just s.makefile
, and so looks a lot like the blocking Python code, but can still use the underlying non-blocking fetch
API.
Edge cases with async/await
Once I hit upon the idea of compiling all of the Python code to async
/await
Python code I knew I was close to getting network calls working in the compiled browser. But there were a few annoyances left to resolve.
One is that JavaScript class
constructors can’t be async
, which is a problem any time a constructor calls another function. For example, in Chapter 3, the Layout
class has this in its constructor:
class Layout:
def __init__(self, tokens):
# ...
for tok in tokens:
self.token(tok)
self.flush()
This would compile into something like this in JavaScript:
class Layout {
constructor(tokens) {
// ...
for (let tok of tokens) {
(await this.token(tok));
}
(await this.flush());
}
}
The issue is that we can’t use await
in the constructor since it’s not declared async
, and we can’t declare a constructor async
because that’s not allowed.
My solution to this is to not actually use JavaScript constructors when compiling. Instead, the constructor code is moved to an async init
method which returns this
, and to construct a Layout object, instead of calling new Layout(tokens)
, you call await (new Layout().init(tokens))
.
A kind-of similar issue happened when compiling Python list comprehensions. I wanted to compile a list comprehension like [f(x) for x in l if g(x)]
into calls to JavaScript’s map
and filter
methods, like this:
l.filter((x) => g(x)).map((x) => f(x))
However, this won’t work with async
/await
, because passing an async
function to map
and filter
won’t do the right thing. With map
, my compiler instead uses Promise.all
, like this:3
await Promise.all(l.map(async (x) => await f(x)))
This is a little tricky, but the map
call converts the list l
into a list of promises, which Promise.all
converts back into a single promise containing a list.
Filter doesn’t have a similar trick, unfortunately, so here I just had to write a helper function:
async function asyncfilter(fn, arr) {
let out = [];
for (var i = 0; i < arr.length; i++) {
if (await fn(arr[i])) {
out.push(arr[i]);
}
}
return out;
}
Because these “async
versions” of map
and filter
are so wordy, I actually only output them if I detect that the list comprehension needs to use await
; otherwise I use the shorter map
/filter
form.
The Same-Origin Policy
One last, stark way in which the browser networking APIs are very different from the Python socket
API is that, in general, JavaScript code is not allowed to just connect to and read content from arbitrary web pages. (For security reasons, as covered in Chapter 10.)
Luckily, the main web pages I use as an example in the book are the book itself, and if I host the compiled browser on the book website, there’s no security issue with it accessing itself.
In later chapters, these security restrictions become more of a problem. For example, in Chapter 7, we add the capability to click on links, and the book has some links to external websites. If you click on one of those in the compiled browser, it won’t work due to these security restrictions.
In principle, it might be possible to fix this. For example, host basically an open proxy on the book domain to tunnel requests from the compiled browser to other pages on the internet. But that seems dangerous (I hear open proxies are bad) and I think readers will be understanding of browser security restrictions, since after all the book itself covers them. So for now, I’ve made my fake sockets restricted to only allow URLs to the browser.engineering
host name.
Conclusion
If you want to see how all of this works at run time, please do head over to any of the first few chapters of the book,4 scroll to the bottom, and play with the embedded browser widget. You’ll see that you can browser around the browser.engineering
web page. If you’ve been reading the book and have your own Python browser, you should see the widget’s close resemblance to it. And if you want to read the compiler or this fake socket code, do check out the repo.
That is, Python would mis-parse it, while JavaScript would inexplicably parse things correctly.
To me it is reminiscent of writing Haskell code and finding you have to move more and more code to happen inside of a monad. Async
/await
is a close cousin of continuation-passing style, which basically means writing all of your code, from the start, in the free monad.
In this case, defining the inline function is unnecessary, but my compiler isn’t smart enough to know that.
As of this writing, Chapters 2–10 have associated widgets.