If it is a duck - it better quack like a duck.

As you probably know – Python is a dynamically typed language. And it’s a great feature of the language – it allows rapid prototyping, easier testing, makes codebase smaller, thus more comprehensible. This is only one side of the story though – as the project grows, the permissive nature of dynamic type system might become a problem – it’s not always obvious which types are passed and expected, and it takes longer to produce working code, as each programmer has to deduce them my himself. Meanwhile in the statically typed language universe, type inference features have become prevalent – C# has var, C++ has auto, Scala has val (Java programmers still wonder where’s unsigned int, so type inference is currently out of their scope) which makes it less burdening on coders to declare types, while preserving all the other benefits. Dynamically types languages did not lie dormant though – tool authors and language designers are well aware of the problem and there are solutions that can help, especially with large codebases. In JavaScript for instance – Typescript and Flow are examples of tools that can seamlessly add type checks to code (I say TypeScript is seamless because it is a superset of JavaScript, though of course switching from js to ts is not a completely smooth transition). But what does Python has to offer? In this article I am going to explore existing solutions for type-inference in Python (at least those I am aware of).

PEP 48[234]

But before we jump into currently available solution, let’s see what’s coming in the future. At the end of 2014, beginning of 2015 a set of free new PEPs were drafted:

482 – Literature Overview for Type Hints
483 – The Theory of Type Hints
484 – Type Hints

Those 3 PEPs lay the theoretical foundation, and implementation proposal for a type hinting mechanism in Python. You might wonder whether those PEPs have any chances of being accepted – you needn’t worry – Guido von Rossum is co-author of the documents, so it seems like, those features will be added. Gossip on the interwebs indicated that those features are desired for Python 3.5, but with it’s feature-freeze set to May 2015, this is not 100% certain. Another important point is that those PEPs only propose a very general mechanism of type annotations, tooling support would need to be implemented separately (which is understandable, as there’s no real need to include such tools in standard library from the very beginning). As for the mechanics of the proposal, there is no code in cpython’s repository at this moment, but from what can be read in the PEPs, it will be based on function annotations (available since Python 3.0, released in 2008) and optionally comments (for variable type annotations) – this approach will not any additional syntax, so using it with Python versions smaller than 3.5 seems feasible. The mechanism is designed for development tools (IDEs and linters) as well as to increase code readability, without any additional costs at runtime (but, the type information could be used by some optimizing tools if deemed appropriate).

This is the (possibly not so distant) future – what about current state of affairs. PEP 482 lists a set of currently available approaches to type inference in Python and other languages – I’ll go through most interesting – beginning with mypy, which is the foundation for mentioned PEP proposals.

Mypy

Let’s start with a simple example to show mypy capabilities


def add(x):  
    return x + 1

if __name__ == '__main__':  
    print(add('1'))

this code is obviously incorrect, though the error will only occur at runtime. Can mypy help? Of course, that’s what is is for. We need to annotate our code though:


def add(x:int) -> int:  
    return x + 1

if __name__ == '__main__':  
    print(add('1'))

This is still syntactically correct Python (3), it will still execute and fail with the same error, but now we can analyze it with mypy, and check for errors before execution (to be precise, we could analyze the previous version, although it would not yield any errors). Let’s see what we will get:


tests/str_plus_int_mypy.py, line 5: Argument 1 to "add" has incompatible type "str"; expected "int"

This time, the obvious mistake with the add function is marked as error. That’s great – if the complexity of the project increases and you want to introduce type-checks, you can do this in an iterative fashion. This is an overly simple example though, let’s go with more complex examples, starting with typo tests:


class TypoClass(object):  
    def __init(self, a, b):
        self.a = a
        self.b = b


def main():  
    a = TypoClass(1, 2)
    return a


if __name__ == '__main__':  
    result = main # type:TypoClass

The errors are (hope you spotted them yourselves) missing brackets in call to main and __init method name, mypy detects them correctly:


test_mypy3.py: In function "main":  
test_mypy3.py, line 8: Too many arguments for "TypoClass"  
test_mypy3.py: At top level:  
test_mypy3.py, line 13: Incompatible types in assignment (expression has type Function[[], Any], variable has type "TypoClass")

Correcteg version:


class TypoClass(object):  
    def __init__(self, a, b):
        self.a = a
        self.b = b


def main() -> TypoClass:  
    a = TypoClass(1, 2)
    return a


if __name__ == '__main__':  
    result = main() # type:TypoClass

Notice the -> TypoClass annotation in main() function – without it, we could annotate result type to anything, and no errors would be generated. That is because mypy does not try to guess the return type from analyzing code, just from annotations – it’s important to remember that this is not a magic tool to find all possible type error, just an aid.

Let’s go with one more example, this time using special type definitions provided in mypy


from typing import NamedTuple

TypeA = NamedTuple('TypeA', [('a', int), ('b', str)])


def main():  
    x = TypeA(1, '1')
    y = TypeA(2, 2)

if __name__ == '__main__':  
    main()

We imported NamedTuple from typing module provided by mypy, so that we could annotate. You might be expecting that analyzing this code will show use some error (the second field of the tuple is not a string). Unfortunately, no error is produced. Mypy is still very much work in progress, so you should not expect everything to work.

The ability to annotate existing code is not always the case – you are probably using some 3rd party libraries, which do not have any annotations in the source code. Fortunately, there’s an answer for that – stub files. You can specify the interface of a given library, using the same type, method names as the original, just with type annotation, put it in a place mypy will recognize, and you’re good to go.

Overall, mypy is a promising tool, with a nice set of features (not all of them described here, but the documentation gives a thourough overview). There are some drawbacks of using it:
– it is in early development phase – this means there can be bugs, performance issues (there’s no incremental analysis at this moment), api changes, and not every python feature is supported (which is understandable, but if you are creating metaclasses via string manipulation and eval statements, you really shouldn’t be blaming any linter for not detecting bugs) – Python 3 – even though Python 3 is not really new, 2.7 is still in widespread use. The product road-map mentions 2.7 support in 2015, although since mypy is the basis for the cpython PEPs it is unclear how it will continue to evolve. I am sure it will survive in one form or another, so taking some time in getting to know it is certainly not a waste – The stubs for other libraries are work in progress – currently stdlib is mostly covered, but 3rd party libs are not. Other tools have more extensive stub libraries, and there is some effort in porting those stubs to mypy format, but at this point, you might end up writing the stubs by yourself, or live without type checks (unannotated code is assumed to work with any type)

Pycharm

Pycharm is a great IDE for Python available in both commercial and free flavors. It’s able to run inspections on you code, including type checking. The types are deduced mostly from docstrings (with a special syntax), but function annotations are also supported, and there is a repository of stubs on github available for parts of stdlib and 3rd party libraries. In fact, pycharm sometimes does a better job than mypy – and as mypy/python 3.5 type hinting will progress I am pretty sure, it will be integrated into the product. You might think the downside is that IDE does not run on your continuous integration server – but this is not the case. Even the community edition has a script bundled that will run the inspections on a specified project without any user interaction required – the only part left for developers is interpreting the results (gathered in a straightforward xml file). One downside is that the free community edition can only run one pycharm process at a time, so for multiple parallel inspections on CI, it might create a bottleneck – or you can buy the commercial version. And of course, this is a proprietary solution, so if there are bugs, or it does not meet your needs, your capabilities are limited.

Other mentions

There;s a plethora of new and old tools available for python source code analysis. Covering all of them is too big of a task for a single article, I do feel they’re worth at least a mention:

Jedi – this is mainly an auto-completion library, but comes with type hinting capabilities (with type inference, or extraction from most popular docstring conventions, similarly to PyCharm). I haven’t played with it, but it is relatively popular on github, and has integration components available for many popular editors. It actually does have a linter mode available (invokable by running python -m jedi linter) which basically tries instead of autocompleting one location in code, tries to analyze whole and detect mismatches. It might be a little slow for large repositories, and it currently does not infer any information from annotations, but is worth checking out if you’re unsatisfied with previous options.
pysonar2 – this is both a type inferencer and indexer, written in Java, relatively popular on github (though it seems development has stopped about a year ago).
pylint, pyflakes, pychecker, pyfrosted – those are generic linter tools, mostly limited in scope of the inspections they perform. Most of them have been around for quite some time, so there’s plenty of resources available already available on how to integrate them into your projects.

Plus, there’s a whole set of other tools, mentioned in PEP 482, mypy FAQ section, Pycharm documentation etc (some of them are aimed at increasing performance, which is often achievable when types upon which a program operates are known).

Typescript and flow

Those two solutions come from the JavaScript world, which is not only dynamically, but also loosely-typed, so the need to detect bugs early is probably even greater. Typescript shares a similar approach to typing as mypy, though it is a different language than JavaScript. One thing to notice is the approach to adding type information to existing untyped-libs – a github repo called DefinitelyTyped already hosts an impressive collection of popoular js libraries along with type definitions – I think we will see an equivalent in mypy/py3.5 soon. The second tool is Facebook’s flow, which (apart from the type inference part) has a client-server architecture, aimed at increasing scalability in larger source code repositories.

Wrapup

Type hinting is a welcomed addition to the language. Combining both dynamic and static typing in an easily deployable way, not requiring extensive preparations and refactoring, combined with BDFL’s and other code devs blessing might become a great solution for increasing maintainability of large projects. It’s very probable that more information on the upcoming changes will be shared on Pycon US 2015 – April 8th-16th 2015 (in Montreal, Canada, as the name implies). Until that, and Python 3.5 release, there’s plenty of options for calming down your type anxiety – so look them up and introduce in your projects as needed.

Hope you enjoyed the read.

Data and AI

Data and AI

If it is a duck – it better quack like a duck.

PEP 48[234]

Mypy

Pycharm

Other mentions

Typescript and flow

Wrapup

USA, Durham

Germany, Berlin

Poland, Gdynia

Sweden, Stockholm