CHAPTER
SHEDSKIN
ShedSkin automatically annotates your Python module and compiles it down to C. It works in a more restricted set of circumstances than Cython but when it works - it Just Works and requires very little effort on your part. One of the included examples is a Commodore 64 emulator that jumps from a few frames per second with CPython when demoing a game to over 50 FPS, where the main emulation is compiled by ShedSkin and used as an extension module to pyGTK running in CPython.
Its main limitations are:
The release announce for v0.8 includes a scalability graph http://shed-skin.blogspot.com/2011/06/shed-skin-08-programming-language.html showing compile times for longer Python modules. It can output either a compiled executable or an importable module.
You run it using shedskin your_module.py. In our case move pure_python_2.py into a new directory (shedskin_pure_python\shedskin_pure_python.py). We could make a new module (as we did for the Cython example) but for now we’ll just one the one Python file.
shedskin shedskin_pure_python.py
make
After this you’ll have shedskin_pure_python which is an executable. Try it and see what sort of speed-up you get.
ShedSkin has local C implementations of all of the core Python library (it can only import C-implemented modules that someone has written for ShedSkin!). For this reason we can’t use numpy in a ShedSkin executable or module, you can pass a Python list across (and numpy lets you make a Python list from an array type), but that comes with a speed hit.
The complex datatype has been implemented in a way that isn’t as efficient as it could be (ShedSkin’s author Mark Dufour has stated that it could be made much more efficient if there’s demand). If we expand the math using some algebra in exactly the same way that we did for the Cython example we get another huge jump in performance:
def calculate_z_serial_purepython(q, maxiter, z):
output=[0] * len(q)
for i in range(len(q)):
zx, zy=z[i].real, z[i].imag
qx, qy=q[i].real, q[i].imag
for iteration in range(maxiter):
# expand complex numbers to floats, do raw float arithmetic
# as the shedskin variant isn’t so fast
# I believe MD said that complex numbers are allocated on the heap
# and this could easily be improved for the next shedskin
zx_new=(zx * zx – zy * zy) + qx
zy_new=(2* (zx * zy))+qy # note that zx(old) is used so we make zx_new on previous line
zx=zx_new
zy=zy_new
# remove need for abs and just square the numbers
if zx * zx+zy * zy>4.0:
output[i]=iteration
break
return output
When debugging it is helpful to know what types the code analysis has detected. Use:
shedskin -a your_module.py
and you’ll have annotated .cpp and .hpp files which tie the generated C with the original Python.
I’ve never tried profiling ShedSkin but several options (using ValGrind and GProf) were presented in the Google Group: http://groups.google.com/group/shedskin-discuss/browse_thread/thread/fd39b6bb38cfb6d1
You can disable bounds-checking with the -b flag, generally this gives a small speed improvement. Wrap-around checking can be disabled with -w. Neither optimisation improved the run-time for this problem. For int64 long integer support add -1. For other flags see the documentation.
The author made some notes in the ShedSkin Google Group http://groups.google.com/group/shedskin- discuss/browse_thread/thread/c5bf965a80292a43 on speeding up the code by editing the generated Makefile:
It is possible that automatic vectorisation (e.g. with gcc http://gcc.gnu.org/projects/tree-ssa/vectorization.html) will help, I don’t have an up to date gcc (e.g. 4.6) on my MacBook so I’ve yet to experiment with this.