The multiprocessing
module in Python allows you to create and manage multiple processes, enabling you to run code concurrently on multiple CPU cores. This is particularly useful for CPU-bound tasks that can be parallelized to improve performance. Unlike multithreading, where threads share the same memory space, each process in multiprocessing has its own memory space, which avoids issues like the Global Interpreter Lock (GIL) in Python.
1. Importing the Multiprocessing Module
You can start using the multiprocessing
module by importing it:
import multiprocessing
2. Creating and Starting Processes
To create a new process, you need to define a function that you want to run in a separate process and then create a Process
object.
2.1. Example: Basic Process Creation
import multiprocessing
import time
def worker():
print("Worker started")
time.sleep(2)
print("Worker finished")
# Create a process
process = multiprocessing.Process(target=worker)
# Start the process
process.start()
# Wait for the process to finish
process.join()
print("Main process finished")
Output:
Worker started
Worker finished
Main process finished
In this example, the worker()
function is run in a separate process. The start()
method starts the process, and the join()
method waits for the process to finish.
3. Using a Process Pool
The multiprocessing
module provides a Pool
class that allows you to manage a pool of worker processes. This is useful for parallelizing the execution of a function across multiple input values.
3.1. Example: Using a Process Pool
import multiprocessing
def square(x):
return x * x
if __name__ == "__main__":
# Create a pool of 4 worker processes
with multiprocessing.Pool(processes=4) as pool:
# Map the square function to a range of numbers
results = pool.map(square, range(10))
print(results)
Output:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
In this example, the square()
function is applied to each element in the range of numbers using a pool of 4 worker processes. The results are collected in a list.
4. Sharing Data Between Processes
Because each process has its own memory space, sharing data between processes can be tricky. The multiprocessing
module provides several ways to share data, including Value
, Array
, and Manager
.
4.1. Example: Sharing Data Using Value
and Array
import multiprocessing
def increment_value(val):
for _ in range(1000):
val.value += 1
def increment_array(arr):
for i in range(len(arr)):
arr[i] += 1
if __name__ == "__main__":
# Shared value and array
shared_value = multiprocessing.Value('i', 0)
shared_array = multiprocessing.Array('i', [0, 1, 2, 3, 4])
# Create and start processes
processes = []
for _ in range(2):
p = multiprocessing.Process(target=increment_value, args=(shared_value,))
processes.append(p)
p.start()
for _ in range(2):
p = multiprocessing.Process(target=increment_array, args=(shared_array,))
processes.append(p)
p.start()
# Wait for all processes to finish
for p in processes:
p.join()
print("Shared Value:", shared_value.value)
print("Shared Array:", shared_array[:])
Output:
Shared Value: 2000
Shared Array: [2, 3, 4, 5, 6]
In this example, two processes increment a shared value, and two processes increment each element of a shared array. The final values reflect the combined updates from all processes.
5. Inter-Process Communication (IPC)
The multiprocessing
module provides several mechanisms for inter-process communication (IPC), including Queue
, Pipe
, and Manager
. These allow processes to exchange data safely.
5.1. Example: Using a Queue for IPC
import multiprocessing
def producer(queue):
for i in range(5):
queue.put(i)
print(f"Produced {i}")
def consumer(queue):
while not queue.empty():
item = queue.get()
print(f"Consumed {item}")
if __name__ == "__main__":
# Create a Queue
queue = multiprocessing.Queue()
# Create and start producer and consumer processes
p1 = multiprocessing.Process(target=producer, args=(queue,))
p2 = multiprocessing.Process(target=consumer, args=(queue,))
p1.start()
p1.join() # Ensure producer finishes before starting consumer
p2.start()
p2.join()
print("Main process finished")
Output:
Produced 0
Produced 1
Produced 2
Produced 3
Produced 4
Consumed 0
Consumed 1
Consumed 2
Consumed 3
Consumed 4
Main process finished
In this example, the producer process adds items to a queue, and the consumer process retrieves and processes them. The queue is used to safely pass data between the processes.
6. Process Synchronization
Sometimes, you may need to synchronize processes to ensure they don’t interfere with each other. The multiprocessing
module provides synchronization primitives like Lock
, Event
, Semaphore
, and Condition
.
6.1. Example: Using a Lock for Synchronization
import multiprocessing
import time
def task(lock, shared_value):
with lock:
temp = shared_value.value
time.sleep(0.1)
shared_value.value = temp + 1
if __name__ == "__main__":
lock = multiprocessing.Lock()
shared_value = multiprocessing.Value('i', 0)
processes = []
for _ in range(10):
p = multiprocessing.Process(target=task, args=(lock, shared_value))
processes.append(p)
p.start()
for p in processes:
p.join()
print("Final value:", shared_value.value)
Output:
Final value: 10
In this example, a lock is used to ensure that only one process at a time can modify the shared value, preventing race conditions.
7. Best Practices for Multiprocessing
- Use
if __name__ == "__main__"
: When using multiprocessing in scripts, always protect the entry point of the program usingif __name__ == "__main__"
to avoid recursive process spawning. - Avoid Global State: Each process has its own memory space, so avoid relying on global variables for inter-process communication.
- Use Queues and Pipes for Communication: Use
Queue
andPipe
for safe communication between processes. - Be Mindful of Resource Usage: Multiprocessing can consume significant resources, especially when spawning many processes. Use a
Pool
to manage resources efficiently. - Handle Exceptions: Ensure that processes are properly managed, and handle exceptions to avoid orphaned processes or deadlocks.