October 13, 2024

Python Multiprocessing

The multiprocessing module in Python allows you to create and manage multiple processes, enabling you to run code concurrently on multiple CPU cores. This is particularly useful for CPU-bound tasks that can be parallelized to improve performance. Unlike multithreading, where threads share the same memory space, each process in multiprocessing has its own memory space, which avoids issues like the Global Interpreter Lock (GIL) in Python.

1. Importing the Multiprocessing Module

You can start using the multiprocessing module by importing it:

import multiprocessing

2. Creating and Starting Processes

To create a new process, you need to define a function that you want to run in a separate process and then create a Process object.

2.1. Example: Basic Process Creation

import multiprocessing
import time

def worker():
    print("Worker started")
    time.sleep(2)
    print("Worker finished")

# Create a process
process = multiprocessing.Process(target=worker)

# Start the process
process.start()

# Wait for the process to finish
process.join()

print("Main process finished")

Output:

Worker started
Worker finished
Main process finished

In this example, the worker() function is run in a separate process. The start() method starts the process, and the join() method waits for the process to finish.

3. Using a Process Pool

The multiprocessing module provides a Pool class that allows you to manage a pool of worker processes. This is useful for parallelizing the execution of a function across multiple input values.

3.1. Example: Using a Process Pool

import multiprocessing

def square(x):
    return x * x

if __name__ == "__main__":
    # Create a pool of 4 worker processes
    with multiprocessing.Pool(processes=4) as pool:
        # Map the square function to a range of numbers
        results = pool.map(square, range(10))

    print(results)

Output:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In this example, the square() function is applied to each element in the range of numbers using a pool of 4 worker processes. The results are collected in a list.

4. Sharing Data Between Processes

Because each process has its own memory space, sharing data between processes can be tricky. The multiprocessing module provides several ways to share data, including Value, Array, and Manager.

4.1. Example: Sharing Data Using Value and Array

import multiprocessing

def increment_value(val):
    for _ in range(1000):
        val.value += 1

def increment_array(arr):
    for i in range(len(arr)):
        arr[i] += 1

if __name__ == "__main__":
    # Shared value and array
    shared_value = multiprocessing.Value('i', 0)
    shared_array = multiprocessing.Array('i', [0, 1, 2, 3, 4])

    # Create and start processes
    processes = []
    for _ in range(2):
        p = multiprocessing.Process(target=increment_value, args=(shared_value,))
        processes.append(p)
        p.start()

    for _ in range(2):
        p = multiprocessing.Process(target=increment_array, args=(shared_array,))
        processes.append(p)
        p.start()

    # Wait for all processes to finish
    for p in processes:
        p.join()

    print("Shared Value:", shared_value.value)
    print("Shared Array:", shared_array[:])

Output:

Shared Value: 2000
Shared Array: [2, 3, 4, 5, 6]

In this example, two processes increment a shared value, and two processes increment each element of a shared array. The final values reflect the combined updates from all processes.

5. Inter-Process Communication (IPC)

The multiprocessing module provides several mechanisms for inter-process communication (IPC), including Queue, Pipe, and Manager. These allow processes to exchange data safely.

5.1. Example: Using a Queue for IPC

import multiprocessing

def producer(queue):
    for i in range(5):
        queue.put(i)
        print(f"Produced {i}")

def consumer(queue):
    while not queue.empty():
        item = queue.get()
        print(f"Consumed {item}")

if __name__ == "__main__":
    # Create a Queue
    queue = multiprocessing.Queue()

    # Create and start producer and consumer processes
    p1 = multiprocessing.Process(target=producer, args=(queue,))
    p2 = multiprocessing.Process(target=consumer, args=(queue,))

    p1.start()
    p1.join()  # Ensure producer finishes before starting consumer
    p2.start()
    p2.join()

    print("Main process finished")

Output:

Produced 0
Produced 1
Produced 2
Produced 3
Produced 4
Consumed 0
Consumed 1
Consumed 2
Consumed 3
Consumed 4
Main process finished

In this example, the producer process adds items to a queue, and the consumer process retrieves and processes them. The queue is used to safely pass data between the processes.

6. Process Synchronization

Sometimes, you may need to synchronize processes to ensure they don’t interfere with each other. The multiprocessing module provides synchronization primitives like Lock, Event, Semaphore, and Condition.

6.1. Example: Using a Lock for Synchronization

import multiprocessing
import time

def task(lock, shared_value):
    with lock:
        temp = shared_value.value
        time.sleep(0.1)
        shared_value.value = temp + 1

if __name__ == "__main__":
    lock = multiprocessing.Lock()
    shared_value = multiprocessing.Value('i', 0)

    processes = []
    for _ in range(10):
        p = multiprocessing.Process(target=task, args=(lock, shared_value))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

    print("Final value:", shared_value.value)

Output:

Final value: 10

In this example, a lock is used to ensure that only one process at a time can modify the shared value, preventing race conditions.

7. Best Practices for Multiprocessing

  • Use if __name__ == "__main__": When using multiprocessing in scripts, always protect the entry point of the program using if __name__ == "__main__" to avoid recursive process spawning.
  • Avoid Global State: Each process has its own memory space, so avoid relying on global variables for inter-process communication.
  • Use Queues and Pipes for Communication: Use Queue and Pipe for safe communication between processes.
  • Be Mindful of Resource Usage: Multiprocessing can consume significant resources, especially when spawning many processes. Use a Pool to manage resources efficiently.
  • Handle Exceptions: Ensure that processes are properly managed, and handle exceptions to avoid orphaned processes or deadlocks.