High Performance Python (from Training at EuroPython 2011) by Ian Ozsvald - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

CHAPTER

EIGHTEEN

 

NUMEXPR ON NUMPY VECTORS

numexpr is a wonderfully simple library - you wrap your numpy expression in numexpr.evaluate(<your code>) and often it’ll simply run faster! In the example below I’ve commented out the numpy vector code from the section above and replaced it with the numexpr variant:

import  numexpr

...

def calculate_z_numpy(q, maxiter, z):

    output = np.resize(np.array(0,), q.shape)

    for iteration in range(maxiter):

     #z = z*z + q

     z = numexpr.evaluate("z*z+q")

     #done = np.greater(abs(z), 2.0)

     done = numexpr.evaluate("abs(z).real > 2.0")

     #q = np.where(done,0+0j, q)

     q = numexpr.evaluate("where(done, 0+0j, q)")

     #z = np.where(done,0+0j, z)

     z = numexpr.evaluate("where(done, 0+0j, z)")

     #output = np.where(done, iteration, output)

     output = numexpr.evaluate("where(done, iteration, output)")

    return output

I’ve replaced np.greater with >, the use of np.greater just showed another way of achieving the same task earlier (but numexpr doesn’t let us refer to numpy functions, just the functions it provides).

You can only use numexpr on numpy code and it only makes sense to use it on vector operations. In the background numexpr breaks operations down into smaller segments that will fit into the CPU’s cache, it’ll also auto-vectorise across the available math units on the CPU if possible.

On my dual-core MacBook I see a 2-3* speed-up.  If I had an Intel MKL version of numexpr (warning - needs a commercial license from Intel or Enthought) then I might see an even greater speed-up.

numexpr can give us some useful system information:

>>> numexpr.print_versions()

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

Numexpr version:   1.4.2

NumPy version:     1.5.1

Python version:    2.7.1 (r271:86882M, Nov 30 2010, 09:39:13)

[GCC 4.0.1 (Apple Inc. build 5494)]

Platform:         darwin-i386

AMD/Intel CPU?    False

VML available?    False

Detected cores:    2

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

It can also gives us some very low-level information about our CPU:

>>> numexpr.cpu.info

{’arch’: ’i386’,

’machine’: ’i486’,

’sysctl_hw’: {’hw.availcpu’: ’2’,

’hw.busfrequency’: ’1064000000’,

’hw.byteorder’: ’1234’,

’hw.cachelinesize’: ’64’,

’hw.cpufrequency’: ’2000000000’,

’hw.epoch’: ’0’,

’hw.l1dcachesize’: ’32768’,

’hw.l1icachesize’: ’32768’,

’hw.l2cachesize’: ’3145728’,

’hw.l2settings’: ’1’,

’hw.machine’: ’i386’,

’hw.memsize’: ’4294967296’,

’hw.model’: ’MacBook5,2’,

’hw.ncpu’: ’2’,

’hw.pagesize’: ’4096’,

’hw.physmem’: ’2147483648’,

’hw.tbfrequency’: ’1000000000’,

’hw.usermem’: ’1841561600’,

’hw.vectorunit’: ’1’}}

We can also use it to pre-compile expressions (so they don’t have to be compiled dynamically in each loop - this can save time if you have a very fast loop) and then look as the disassembly (though I doubt you’d do anything with the disassembled output):

>>> expr=numexpr.NumExpr(’avector > 2.0’) # pre-compile an expression

>>> numexpr.disassemble(expr):

[(’gt_bdd’, ’r0’, ’r1[output]’, ’c2[2.0]’)]

>>> somenbrs=np.arange(10) # -> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

>>> expr.run(somenbrs)

array([False, False, False, True, True, True, True, True, True, True], dtype=bool)

You  might  choose  to  pre-compile  an  expression  in  a  fast  loop  if  the  overhead  of  compiling  (as  reported  by kernprof.py) reduces the benefit of the speed-ups achieved.