Python uses multi-process pools and tasks
Concurrency is a form of non-strict evaluation, meaning the order of operations cannot be predicted. The multiprocessing
package introduces the concept of a Pool
object. A Pool
object contains many worker processes that can execute concurrently. This package allows the operating system to schedule and time-slice the execution of multiple processes, aiming to maximize the system’s capacity.
To take advantage of this capability, applications must be broken down into components that support non-strict execution. The entire application must be constructed from discrete tasks that can be processed in a non-deterministic order.
For example, an application that collects data from the internet by scraping the web can often be optimized through parallel processing. We can create a Pool
object with multiple equal workers to perform website scraping. Each worker is assigned a task in the form of a URL for analysis.
Applications that analyze multiple log files can also be parallelized. You can create a Pool object containing analysis workers and assign each log file to an analysis worker. This allows the workers in the Pool to read and analyze the log files in parallel. Each worker performs both I/O and computation, but one worker can perform analysis while other workers wait for I/O to complete.
Because the performance gains depend on input and output operations with unpredictable timing, multiprocessing often requires extensive experimentation. Varying the size of the worker pool and measuring runtimes are important parts of designing concurrent applications.