High Performance Python (from Training at EuroPython 2011) by Ian Ozsvald - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

CHAPTER

ELEVEN

 

PYPY

PyPy is a new Just In Time compiler for the Python programming language. It runs on Windows, Mac and Linux and as of the middle of 2011 it runs Python 2.7. Generally you code will just run in PyPy and often it’ll run faster (I’ve seen reports of 2-10* speed-ups). Sometimes small amounts of work are required to correct code that runs in CPython but shows errors in PyPy. Generally this is because the programmer has (probably unwittingly!) used shortcuts that work in CPython that aren’t actually correct in the Python specification.

Our example runs without modification in PyPy. I’ve used both PyPy 1.5 and the latest HEAD from the nightly builds (taken on June 20th for my Mac). The latest nightly build is a bit faster than PyPy 1.5, I’ve used the timings from the nightly build here.

If you aren’t using a C library like numpy then you should try PyPy - it might just make your code run several times faster. At EuroPython 2011 I saw a Sobel Edge Detection demo than runs in pure Python - with PyPy it runs 450* faster than CPython! The PyPy team are committed to making PyPy faster and more stable, since it supports Python 2.7 (which is the end of the Python 2.x line) you can expect it to keep getting faster for a while yet.

If you use a C extension like numpy then expect problems - some C libraries are integrated, many aren’t, some like numpy will probably require a re-write (which will be a multi-month undertaking). During 2011 at least it looks as though numpy integration will not happen. Note that you can do import numpy in pypy and you’ll get a minimal array interface that behaves in a numpy-like fashion but for now it has very few functions and only supports double arithmetic.

By running pypy pure_python.py 1000 1000 on my MacBook it takes 5.9 seconds, running pypy pure_python_2.py 1000 1000 it takes 4.9 seconds. Note that there’s no graphical output - PIL is supported in PyPy but numpy isn’t and I’ve used numpy to generate the list-to-RGB-array conversion (update see the last section of this document for a fix that removes numpy and allows PIL to work with PyPy!).

As an additional test (not shown in the graphs) I ran pypy shedskin2.py 1000 1000 which runs the expanded math version of the shedskin variant below (this replaces complex numbers with floats and expands abs to avoid the square root). The shedskin2.py result takes 3.2 seconds (which is still much slower than the 0.4s version compiled using shedskin).

11.1 numpy

Work has started to add a new numpy module to PyPy.  Currently (July 2011) it only supports arrays of double precision numbers and offers very few vectorised functions:

Python 2.7.1 (65b1ed60d7da, Jul 12 2011, 02:00:13)

[PyPy 1.5.0-alpha0 with GCC 4.0.1] on darwin

Type "help", "copyright", "credits" or "license" for more information.

And now for something completely different: ‘‘2008 will be the year of the

desktop on #pypy’’

>>>> import numpy

>>>> dir(numpy)

[’__doc__’, ’__file__’, ’__name__’, ’__package__’, ’abs’, ’absolute’, ’array’, ’average’,

’copysign’, ’empty’, ’exp’, ’floor’, ’maximum’, ’mean’, ’minimum’, ’negative’, ’ones’,

’reciprocal’, ’sign’, ’zeros’]

>>>> a = numpy.array(range(10))

>>>> [x for x in a] # print the contents of a

[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

>>>>

>>>> [x for x in a+3] # perform a vectorised addition on a

[3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0]

It would be possible to rewrite the Mandelbrot example using these functions by using non-complex arithmetic (see e.g. the shedskin2.py example later). This is a challenge I’ll leave to the reader.

I strongly urge you to join the PyPy mailing list and talk about your needs for the new numpy library. PyPy shows great promise for high performance Python with little effort, having access to the wide range of algorithms in the existing numpy library would be a massive boon to the community.