Patching Existing Python Classes

Sep 21, 2022

My last few posts have been about big, (overly) complicated code I wrote. Let’s change it up and talk about this eight-line function instead:

def patch(existing_cls): def decorator(new_cls): for attr in dir(new_cls): obj = getattr(new_cls, attr) if isinstance(obj, type(decorator)): setattr(existing_cls, attr, obj) return existing_cls return decorator

Deduplicating a Changing Browser

In Web Browser Engineering, we have a copy of the book’s browser after every chapter. So, for example, there’s lab1.py, which is the browser as of the end of Chapter 1, and lab8.py, which is the browser as of the end of Chapter 8, and so on. We keep each of these copies separate so we can test them independently, for example to make sure that the Chapter 8 browser can submit forms correctly.

But what this means is that we have fifteen very very similar files. And any time we fix a bug or make a change, we used to have to go through and edit every file. Naturally, this was annoying and bug prone.

So a while ago, we switched to importing instead. With importing, lab4.py imports, say, the request method from lab1. This works great and cuts down on a lot of duplication.

However, in later chapters importing isn’t quite sufficient. For example, in Chapter 12, we make a lot of changes to Tab and Browser classes to implement concurrency. Now, while the changes are pretty large, some of those classes’ methods stay the same. But since we’re updating at least one method, we can’t import the Tab and Browser classes from a prior chapter and instead need to copy over the whole file.

More trouble happens when we update more “core” classes. For example, the Element class is defined in Chapter 4, and stays pretty much unchanged between then and Chapter 13, when we add fields to store ongoing animations. Of course, since we’re adding a field, we can’t import Element and need to copy it in.

But let’s think through the mechanics here. There’s a class in lab4.py called Element, and a different class in lab13.py, also called Element. Any code in lab4.py that refers to Element, like the HTMLParser class, is refering to lab4.Element, not the new lab13.Element, so if we import HTMLParser from lab4, it’ll output an HTML tree made of the wrong Element class, which means it won’t work. We’d need to copy the HTMLParser class into lab13.Element as well, just so it would refer to the right Element class.

And even worse, loads of other methods in Chapter 5 through 12 also refer to the Element class, and since they all import it from lab4, they all refer to the wrong one. So actually all of that code (and it’s a lot—the CSS selector classes, the CSS parser, all sorts of stuff!) needs to be copied in as well. Which means there’s basically no deduplication going on. Clearly something better was needed

A solution

So what would a solution look like? Well, I need a way to modify an existing class. That way, lab13.py could change the existing lab4.Element class instead of defining a new one. And Python supports this kind of monkey-patching. For example, you can do something like:

def new_method(self): # ... Element.__init__ = new_method

This defines a function and replaces the Element class’s __init__ method with it.

But this doesn’t quite work for Web Browser Engineering, because we’ve got a linter process that makes sure any code that appears in a chapter also appears in the source code. In the chapter, I’d describe a modification to the Element class’s __init__ method like this:

class Element: def __init__(self): # ...

So to please the linter, I actually need to define an Element class in lab13.py, yet I don’t want there to be two Element classes. So I’d need to do something like this:

from lab4 import Element OldElement = Element class Element: def __init__(self): # ... OldElement.__init__ = Element.__init__ Element = OldElement

Here I import the existing Element class from lab4.py, save it under another name, define a new Element class, copy over its method to the saved existing Element class, and then overwrite the new class with the old one.

This works, but is kind of tedious. Luckily, Python supports something called “class decorators”, which work like so. Suppose you have a decorator called decorate. Then you can write:

@decorate class Foo: # ...

Python then treats this as the moral equivalent to:

class _temp_Foo: # ... Foo = decorate(_temp_Foo)

In other words, the function decorate is called on the class, and the return value of that function is what’s actually bound to the class name.

This is perfect for my purpose. To do that it will also need it passed in, so you’ll actually need to write:

@patch(Element) class Element: # ...

Here the decorator is patch(Element), meaning that patch took a class as input and returned a function:

def patch(existing_cls): def decorator(new_cls): # ... return decorator

I need my decorate function (which I call patch) to return the existing Element class. So:

def patch(existing_cls): def decorator(new_cls): # ... return existing_cls

But before returning the existing class, it needs to copy every method from new_cls to existing_cls. Luckily, Python has a function called dir, which returns every field on an object, including every method on a class:

def patch(existing_cls): def decorator(new_cls): for attr in dir(new_cls): # ... # ...

Unfortunately, classes have a lot of fields besides methods, including stuff like __class__ (a type), __repr__ (a wrapper_descriptor), or __weakref__ (a getset_descriptor). To get all the fields that the programmer deliberately defined, we need to look for stuff with type function, which is the type of stuff defined by Python’s def keyword. Since the function type isn’t bound to any Python global, the easiest way to get access to it is to call type on an existing function:

def patch(existing_cls): def decorator(new_cls): for attr in dir(new_cls): obj = getattr(new_cls, attr) if isinstance(obj, type(decorator)): # ...

Here I use getattr to access a field on a class by name, and type to get the type of an object. Finally, it’s these fields that can be copied over:

def patch(existing_cls): def decorator(new_cls): for attr in dir(new_cls): if isinstance(obj, type(decorator)): setattr(existing_cls, attr, obj)

Here setattr sets a field on a class by name.

Putting it all together, this patch decorator lets me write this in lab13.py:

from lab4 import Element @patch(Element) class Element: def __init__(self): # ...

And end up with lab13.Element being bound to the same class as lab4.Element, while overwriting its __init__ method.

Don’t use this

Now, any time you have one module modify another module’s classes, that’s bad news, and you shouldn’t do that. Please don’t use this in production code.

But for Chris and my unusual constraints—keeping many versions of the same code around, while deduplicating where possible—it works pretty well.

Web Browser Engineering Blog

Patching Existing Python Classes

Deduplicating a Changing Browser

A solution

Don’t use this