In this post I will explore Python metaclasses. There are many great posts out there explaining the mechanics and possible usages of metaclasses. Not many of them try to examine the actual usage in popular libraries though – so here’s a niche I will try to fill with this article. I’ve chosen Django as my test subject – it’s popular, uses metaclasses, and with 1.8 release just around the corner – there’s official support for model _meta api – so it might be a good moment to get some more insight into it’s mechanics.
Let’s start with finding out what metaclasses are there in Django. A metaclass is a subclass of type or other metaclass – I’ve used this code to find all metaclasses in Django codebase:
import os import pyclbr import collections def bfs_order(adj_list, start_node): if start_node not in adj_list: return  visited = collections.defaultdict(lambda: False) queue = [start_node] result =  while queue: current = queue.pop() visited[current] = True result.append((current, adj_list[current])) for neighbor in adj_list[current]: if not visited[neighbor]: queue.append(neighbor) return result def find_metaclasses_in_directory(dirname): class_hierachy = build_class_hierachy(dirname) return bfs_order(class_hierachy, 'type') def build_class_hierachy(dirname): class_hierarchy = collections.defaultdict(list) for dirpath, dirnames, filenames in os.walk(dirname): for filename in filenames: if filename.endswith(".py"): class_infos = get_classes_in_file(os.path.join(dirpath, filename)) for class_name, parents in class_infos: for parent in parents: class_hierarchy[parent].append(class_name) return class_hierarchy def get_classes_in_file(filepath): """ returns a pair of tuples (class_name,: string, parent_classes: list needs a few workaround for pyclbr, which uses strings and class objects interchangeably """ # we need to recreate the package/module name out of a given path # which boils down to traversing directory tree and finding directories # with __init__.py path_root = os.path.dirname(filepath) module_name_parts = [os.path.splitext(os.path.basename(filepath))] while True: if not os.path.exists(os.path.join(path_root, '__init__.py')): break path_root, last_dir = os.path.split(path_root) module_name_parts.append(last_dir) if last_dir == '': break module_name = '.'.join(reversed(module_name_parts)) class_infos = pyclbr.readmodule(module_name, path=[path_root]) return [( class_name, [super_class.name if hasattr(super_class, "name") else super_class for super_class in class_info.super] ) for class_name, class_info in class_infos.items() if os.path.abspath(class_info.file) == os.path.abspath(filepath)]
Pyclbr is a module which returns basic class information, without importing the actual module. Basically, we build a DAG linking all the classes and their subclasses, and then run a BFS starting from type.
Here are the results of running this against latest 1.8 codebase:
[('type', ['ModelBase', 'InstanceCheckMeta', 'SubfieldBase', 'MediaDefiningClass', 'RenameMethodsBase', 'FormMixinBase']), ('FormMixinBase', ), ('RenameMethodsBase', ['RenameFieldMethods', 'BaseMemcachedCacheMethods']), ('BaseMemcachedCacheMethods', ), ('RenameFieldMethods', ), ('MediaDefiningClass', ['DeclarativeFieldsMetaclass']), ('DeclarativeFieldsMetaclass', ['ModelFormMetaclass']), ('ModelFormMetaclass', ), ('SubfieldBase', ), ('InstanceCheckMeta', ), ('ModelBase', )]
Some of those metaclasses are intended for use by django app devs, some might
be used internally though. Unfortunately, pyclbr does not expose metaclass
information, so to find out metaclass usages I had a choice of using inspect,
some low-level parsing mechanism, or… grep. Laziness being my greatest
virtue, I chose the latter.
Here’s the snippet used for finding metaclass usage in django codebase (unlike previous snippet which could be used with other codebases pretty safely, this one makes some assumption about the code structure, namely that all metaclasses are declared by adding
import subprocess import re def find_metaclass_usages(dirpath): command = "ack -i --noheading \"class .*six.with_metaclass.*:$\" %s" % dirpath output = subprocess.getoutput(command) ack_match_pattern = "(?P.*):(?P.*):.*class" " (?P.*)\(six.with_metaclass\((?P.*)\)\):$" ret =  print(output) for line in output.splitlines(): m = re.match(ack_match_pattern, line) ret.append((m.group('classname'), m.group('bases').split(','))) return ret
And the results (against Django tag 1.8.a1, omitting tests):
[('Model', ['ModelBase']), ('EmptyQuerySet', ['InstanceCheckMeta']), ('ModelForm', ['ModelFormMetaclass', ' BaseModelForm']), ('Field', ['RenameFieldMethods', ' object']), ('Form', ['DeclarativeFieldsMetaclass', ' BaseForm']), ('Widget', ['MediaDefiningClass']), ('BaseMemcachedCache', ['BaseMemcachedCacheMethods', ' BaseCache']), ('FormMixin', ['FormMixinBase', ' ContextMixin']), ('BaseModelAdmin', ['forms.MediaDefiningClass'])]
We have 10 metaclasses declared in the code, and 9 usages of those (not including tests). Let’s start analyzing the results (19 usages to go):
This is a pretty simple case, and a good example of metaclass usage.
Here’s the whole definition of
class BaseMemcachedCacheMethods(RenameMethodsBase): renamed_methods = ( ('_get_memcache_timeout', 'get_backend_timeout', RemovedInDjango19Warning), )
The data-structure is self-explanatory – the purpose of this metaclass is to raise warnings in case of users defining or calling old method names in an API that is changing alongside releases. That’s exactly what
RenameMethodsBase does – it goes through all bases in mro, warns if old method is defined, adds it if it’s not defined (wrapped so that every call also issues a warning) and that’s basically it. The RenameFieldMethods and Field is just another instance of this, with different methods renamed –
Field class does not contain any additional metaclass magic except for the depreciation warning mechanism.
14 usages to go.
This is a pretty short usage, so I can paste the whole source:
class InstanceCheckMeta(type): def __instancecheck__(self, instance): return instance.query.is_empty() class EmptyQuerySet(six.with_metaclass(InstanceCheckMeta)): """ Marker class usable for checking if a queryset is empty by .none(): isinstance(qs.none(), EmptyQuerySet) -> True """ def __init__(self, *args, **kwargs): raise TypeError("EmptyQuerySet can't be instantiated")
The rationale for this is to allow you to write
isinstance(some_query_set, EmptyQuerySet) and this is the way to redefine isinstance as described in pep 3119. This must be done through metaclasses, as isinstance checks the class of it’s second argument for
12 usages to go
This mechanism is responsible for associating Assets with Forms and Widgets, as described in Django docs. If you read the documentation for this, it seems pretty odd, that defining a class inside a
Widget could have any special effect. That’s where the metaclass steps in – scanning the class attributes for `Media1 attribute, and assigning proper media property as a result.
BaseModelAdmin use the metaclass directly, It’s also used in forms (as they also might require additional media assets for rendering) – they have some additional meta behavior though, which will get covered next.
9 usages to go
The purpose of those metaclasses is to gather all attributes which are subclasses of Field class (if you ever created a Django form, you are probably familiar with the declarative DSL used there) into declaredfields, and basefields attributes. The logic for gathering differs for
ModelForm (in model forms, the field information is also gathered from the
inner class Meta, in a manner similar to media assets) – but it’s still a pretty basic attributes traversal.
It’s interesting that the Form and ModelForm classes are actually empty:
class ModelForm(six.with_metaclass(ModelFormMetaclass, BaseModelForm)): pass
They exist, so that application developers do not have to declare metaclass usage themselves. The handling of logic related with fields is done by plain classes –
BaseModelForm, which have no idea how the fields were created in the first place – which is a nice example of separation of
concerns. You could also try to create your own mechanism of populating fields, or subclass from
5 usages to go
Previously used in creating custom fields – now deprecated, hence no usages in django codebase (except for tests). It might also be the only metaclass programmers are instructed to use directly – as described here
4 usages to go
Turns out this is another depreciation warning use of metaclass – this time, it’s not method renaming, but method signature that is being deprecated. To be precise the getform method of FormMixin should have a default value for formclass argument – if it doesn’t have one – a method with required signature is generated. The pattern of providing metaclasses for depreciation
warnings is a clear one though – maybe a library of metaclasses for handling that could be extracted from Django or other libs.
2 usages to go
Finally, the last but not least – the
ModelBase metaclass. The
__new__ method alone is about 250 lines of code. Django models need to handle not only things seen previously like declarative Fields (the code from
DeclarativeFieldsMetaclass is not reused) the
Meta inner class, but also some more complex usages like abstract models, multi-table inheritance, proxy models and possibly a plethora of other usages I do not even know I don’t know about. The mechanics are similar to previously described usages though. Like forms, the distinction between gathering meta information and using it is also here – although the
Model class does directly specify
metaclass. A slightly new technique is generating unique types for each model – examples being
MultipleObjectsReturned exceptions. Model managers are also set up in the metaclass. The most important field being set is
_meta which, as the name promptly suggest, contains all meta
information about the model, and as from Django 1.8 will be officially supported – you can read about the information retrievable from
_meta field here.
We’ve made it through all the metaclasses in Django codebase.
To sum up, there are 2 basic usages of metaclasses in Django – the first is generally known – the DSL for specifying Forms and Models, and this is what most metaclass guides tell us – metaclasses are a way of creating DSLs, make our code more declarative. The second usage is protecting the framework user
from making mistakes – depreciation warnings are usable when upgrading between Django versions or simply not being aware of API changes, which in a dynamically typed language can cause more problems if breaking changes are not noticed by library users – and though it is a lesser known use, I think it is an interesting idea.
One more though – I’ve started writing this article with little knowledge of metaclasses (I knew they existed) and as it turns out, understanding them by simply reading the code is not incredibly difficult (maybe except the ModelBase, but metaclasses or not, 250 loc methods are just difficult to
comprehend) – so if you encounter metaclasses in code you have to hack on don’t turn your back just because you’ve heard it’s complex – probably it’s easier than you think.
Hope you enjoyed the read. We hope you found this entry helpful.
Let us know what you have learned!
The onset of AI is being called industry 4.0, or the 4th industrial revolution which will cause a major shift in the manufacturing industry by using machine learning techniques instead of employing human beings for tasks that require repetition in a faster, cheaper and more efficient manner.