High Performance Python (from Training at EuroPython 2011) by Ian Ozsvald - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

CHAPTER

THIRTEEN

 

CYTHON

Cython lets us annotate our functions so they can be compiled to C. It takes a little bit of work (30-60 minutes to get started) and then typically gives us a nice speed-up. If you’re new to Cython then the official tutorial is very helpful: http://docs.cython.org/src/userguide/tutorial.html

To start this example I’ll assume you’ve moved pure_python_2.py into a new directory (e.g. cython_pure_python\cython_pure_python.py). We’ll start a new module called calculate_z.py, move the calculate_z function into this module.  In cython_pure_python.py you’ll have to import calculate_z and replace the reference to calculate_z(...) with calculate_z.calculate_z(...).

Verify that the above runs. The contents of your calculate_z.py will look like:

# calculate_z.py

# based on calculate_z_serial_purepython

def calculate_z(q, maxiter, z):

    output=[0] * len(q)

    for i in range(len(q)):

        zi=z[i]

        qi=q[i]

        for iteration in range(maxiter):

          zi=zi * zi+qi

          if abs(zi)>2.0:

             output[i]=iteration

             break

    return output

Now rename calculate_z.py to calculate_z.pyx, Cython uses .pyx (based on the older Pyrex project) to indicate a file that it’ll compile to C.

Now add a new setup.py with the following contents:

# setup.py

from distutils.core import setup

from distutils.extension import Extension

from Cython.Distutils import build_ext

 

# for notes on compiler flags see:

# http://docs.python.org/install/index.html

 

setup(

       cmdclass={’build_ext’: build_ext},

       ext_modules=[Extension("calculate_z", ["calculate_z.pyx"])]

       )

Next run:

>> python setup.py build_ext --inplace

This runs our setup.py script, calling the build_ext command. Our new module is built in-place in our directory, you should end up with a new calculate_z.so in this directory.

Run the new code using python  cython_pure_python.py  1000  1000 and confirm that the result is calculated more quickly (you may find that the improvement is very minor at this point!).

You can take a look to see how well the slower Python calls are being replaced with faster Cython calls using:

» Cython -a calculate_z.pyx

This will generate a new .html file, open that in your browser and you’ll see something like:

img14.png

Figure 13.1: Result of “cython -a calculate_z.pyx” in web browser

Each time you add a type annotation Cython has the option to improve the resulting code. When it does so successfully you’ll see the dark yellow lines turn lighter and eventually they’ll turn white (showing that no further improvement is possible).

If you’re curious, double click a line of yellow code and it’ll expand to show you the C Python API calls that it is making (see the figure).

Let’s add the annotations, see the example below where I’ve added type definitions. Remember to run the cython -a  ... command and monitor the reduction in yellow in your web browser.

# based on calculate_z_serial_purepython

def calculate_z(list q, int maxiter, list z):

    cdef unsigned int i

    cdef int iteration

    cdef complex zi, qi # if you get errors here try ’cdef complex double zi, qi’

    cdef list output

 

    output = [0] * len(q)

    for i in range(len(q)):

img15.png

Figure 13.2: Double click a line to show the underlying C API calls (more calls mean more yellow)

zi = z[i]

qi = q[i]

for iteration in range(maxiter):

    zi = zi * zi + qi

    if abs(zi) > 2.0:

      output[i] = iteration

      break

return output

Recompile using the setup.py line above and confirm that the result is much faster!

As you’ll see in the ShedSkin version below we can achieve the best speed-up by expanding the complicated complex object into simpler double precision floating point numbers. The underlying C compiler knows how to execute these instructions in a faster way.

Expanding complex multiplication and addition involves a little bit of algebra (see WikiPedia for details). We declare a set of intermediate variables cdef  double  zx,  zy,  qx,  qy,  zx_new,  zy_new, dereference them from z[i] and q[i] and then replaced the final abs call with the expanded if  (zx*zx  +  zy*zy)  >  4.0 logic (the sqrt of 4 is 2.0, abs would otherwise perform an expensive square-root on the result of the addition of the squares).

# calculate_z.pyx_2_bettermath

def calculate_z(list q, int maxiter, list z):

    cdef unsigned int i

    cdef int iteration

    cdef list output

    cdef double zx, zy, qx, qy, zx_new, zy_new

 

    output = [0] * len(q)

    for i in range(len(q)):

        zx = z[i].real # need to extract items using dot notation

        zy = z[i].imag

        qx = q[i].real

        qy = q[i].imag

 

        for iteration in range(maxiter):

            zx_new = (zx* zx - zy *zy) + qx

            zy_new = (zx *zy + zy * zx) + qy

            # must assign after else we’re using the new zx/zy in the fla

            zx = zx_new

            zy = zy_new

            # note - math.sqrt makes this almost twice as slow!

            #if math.sqrt(zx*zx + zy*zy) > 2.0:

            if (zx*zx + zy*zy) > 4.0:

              output[i] = iteration

              break

return output

13.1 Compiler directives

Cython has several compiler directives that enable profiling with cProfile and  can  improve  performance: http://wiki.cython.org/enhancements/compilerdirectives

The directives can be enabled globally (in the Cython) file using a comment at the top of the file or by altering setup.py and you can decorate each function individually. Generally I only have a few functions in a .pyx file so I enable the directives globally in the module using the comment syntax.

profile lets you enable or disable cProfile support.  This is only useful when profiling (and adds a minor overhead). It gives you exactly the same output as running cProfile on a normal Python module.

boundscheck lets you disable out-of-bounds index checking on buffered arrays (mostly this will apply to numpy arrays - see next section).  Since it