High Performance Python (from Training at EuroPython 2011) by Ian Ozsvald - HTML preview

/ Home / Computer Sciences / High Performance Python (from Training at EuroPython 2011)

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

CHAPTER

TWENTY ONE

PARALLELPYTHON

With the ParallelPython module we can easily change the multiprocessing example to run on many machines with all their CPUs. This module takes care of sending work units to local CPUs and remote machines and returning the output to the controller.

At EuroPython 2011 we had 8 machines in the tutorial (with 1-4 CPUs each) running a larger Mandelbrot problem.

It seems to work with a mix of Python versions - at home I’ve run it on my 32 bit MacBook with Python 2.7 and Mandelbrot jobs have run locally and remotely on a 32 bit Ubuntu machine with Python 2.6. It seems to send the original source (not compiled bytecode) so Python versions are less of an issue. Do be aware that full environments are not sent - if you use a local binary library (e.g. you import a Cython/ShedSkin compiled module) then that module must be in the PYTHONPATH or local directory on the remote machine. A binary compiled module will only run on machines with a matching architecture and Python version.

In this example we’ll use the same chunks code as we developed in the multiprocessing example.

First we define the IP addresses of the servers we’ll use in ppservers = (), if we’re just using the local machine then this can be an empty tuple. We can specify a list of strings (containing IP addresses or domain names), remember to end the tuple of a single item with a comma else it won’t be a tuple e.g. ppservers = (’localhost’,).

Next we iterate over each chunk and use job_server.submit(...) to submit a function with an input list to the job_server. In return we get a status object. Once all the tasks are submitted with can iterate over the returned job objects blocking until we get our results. Finally we can use print_stats() to show statistics of the run.

importpp

...

# we have the same work chunks as we did for the multiprocessing example above

# we also use the same tuple of work as we did in the multiprocessing example

start_time=datetime.datetime.now()

# tuple of all parallel python servers to connect with

ppservers=() # use this machine

# I can’t get autodiscover to work at home

#ppservers=("

",) # autodiscover on network

job_server=pp.Server(ppservers=ppservers)

# it’ll autodiscover the nbr of cpus it can use if first arg not specified

print "Starting pp with", job_server.get_ncpus(),"local CPU workers"

output=[]

jobs=[]

for chunk in chunks:

print "Submitting job with len(q) {}, len(z) {}".format(len(chunk[0]),len(chunk[2]))

job=job_server.submit(calculate_z_serial_purepython, (chunk,), (), ())

jobs.append(job)

for job in jobs:

output_job=job()

output+=output_job

# print statistics about the run

print job_server.print_stats()

end_time=datetime.datetime.now()

Now let’s change the code so it is sent to a ‘remote’ job server (but one that happens to be on our machine!). This is the stepping stone to running on job servers spread over your network.

If you changes ppservers as shown below the job_server will look for an instance of a ppserver.py running on the local machine on the default port. In a second shell you should run ppserver.py (it is installed in the PYTHONPATH so it should ‘just run’ from anywhere), the -d argument turns on DEBUG messages.

# tuple of all parallel python servers to connect with

ppservers=(’localhost’,) # use this machine

# for localhost run ’ppserver.py -d’ in another terminal

NBR_LOCAL_CPUS=0 # if 0, it sends jobs out to other ppservers

job_server=pp.Server(NBR_LOCAL_CPUS, ppservers=ppservers)

Now if you run the example you’ll see jobs being received by the ppserver.py. It should run in the same amount of time as the ppservers = () example. Note that all your CPUs will still be used, 0 will be used in the main Python process and all available will be used in the ppserver.py process.

Next take another machine and run ifconfig (or similar) to find out its IP address. Add this to ppservers so you have something like:

ppservers=(’localhost’,’192.168.13.202’)

Run ppserver.py -d on the remote machine too (so now you have two running). Make sure nbr_chunks = 16 or another high number so that we have enough work chunks to be distributed across all the available processors. You should see both ppserver.py instances receiving and processing jobs. Experiment with making many chunks of work e.g. using nbr_chunks = 256.

I found that few jobs were distributed over the network poorly - jobs of several MB each were rarely received by the remote processes (they often threw Execptions in the remote ppserver.py), so utilisation was poor. By using a larger nbr_chunks the tasks are each smaller and are sent and received more reliably. This may just be a quirk of ParallelPython (I’m relatively new to this module!).

As shown at the start of the report the ParallelPython module is very efficient, we get almost a doubling in performance by using both cores on the laptop. When sending jobs over the network the network communications adds an additional overhead - if your jobs are long-running then this will be a minor part of your run-time.