Parallel Processing

This tutorial is inspired by Chris Kiehl’s great post on multiprocessing.


from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool 

Create Some Data

# Create a list of some data
data = range(29999)

Create An Operation To Execute On The Data

# Create a function that takes a data point
def some_function(datum):
    # and returns the datum raised to the power of itself
    return datum**datum

Traditional Approach


# Create an empty for the results
results = [] 

# For each value in the data
for datum in data:
    # Append the output of the function when applied to that datum
CPU times: user 2min 2s, sys: 1.7 s, total: 2min 4s
Wall time: 2min 8s

Parallelism Approach

# Create a pool of workers equaling cores on the machine
pool = ThreadPool() 

# Apply (map) some_function to the data using the pool of workers
results =, data)

# Close the pool

# Combine the results of the workers
CPU times: user 1min 56s, sys: 1.59 s, total: 1min 57s
Wall time: 1min 57s