Python Iterator Chaining
Python Iterator Chaining. Python iterators have another important feature: you can chain multiple iterators together to write efficient data processing “pipelines.” I first encountered this pattern at David Beazley’s PyCon talk, and it left a deep impression on me.
Python’s generator functions and generator expressions allow you to quickly build concise and powerful iterator chains. This section will introduce practical uses of iterator chains and how to apply them to your own programs.
As a quick recap, generators and generator expressions are syntactic sugar for writing iterators in Python. Compared to writing class-based iterators, this approach saves a lot of boilerplate code.
While a regular function only produces a return value once, a generator produces results multiple times. You can think of a generator as producing a “stream” of values throughout its lifetime.
For example, the following generator is a counter that produces a new value each time next()
is called, generating integer values from 1 to 8:
def integers():
for i in range(1, 9):
yield i
You can confirm this behavior by running it in the Python REPL:
>>> chain = integers()
>>> list(chain)
[1, 2, 3, 4, 5, 6, 7, 8]
So far, not too interesting, but here’s where it gets really cool. Generators can be “chained” together to build efficient data processing algorithms that work like pipelines.
You can take a “stream” of values from an integers()
generator and feed it back into another generator. For example, square each number passed in and pass it back out again:
def squared(seq):
for i in seq:
yield i * i
This is what a “data pipeline” or “generator chain” does:
>>> chain = squared(integers())
>>> list(chain)
[1, 4, 9, 16, 25, 36, 49, 64]
This pipeline can continue to add new components. Data flows only in one direction, and each processing step is isolated from other processing steps through a strictly defined interface.
This is similar to how pipelines work in Unix. We also chain together a series of processes, with the output of each process feeding directly into the next.
Now let’s add a step to the pipeline that negates each value and passes it to the next step in the chain:
def negated(seq):
for i in seq:
yield -i
If we rebuild the generator chain and add the negated
generator at the end, we get the following:
>>> chain = negated(squared(integers()))
>>> list(chain)
[-1, -4, -9, -16, -25, -36, -49, -64]
My favorite thing about generator chains is that they process only one element at a time. There’s no buffer between steps in the chain.
(1) The integers
generator yields a value, such as 3.
(2) This value “activates” the squared
generator to process the value, yielding 3 × 3 = 9, which is passed to the next stage.
(3) The squared number produced by the squared
generator is immediately fed into the negated
generator, which modifies it to -9 and yields it again.
You can continue to extend this generator chain, adding your own steps to build a processing pipeline. Generator chains can be executed efficiently and easily modified because each step in the chain is a separate generator function.
Each generator function in this processing pipeline is very concise. Here’s a little trick to simplify the pipeline definition again without sacrificing readability:
integers = range(8)
squared = (i * i for i in integers)
negated = (-i for i in squared)
Note that each processing step is replaced with a generator expression that processes the previous step, which is equivalent to the generator chain described above.
>>> negated
<generator object <genexpr> at 0x1098bcb48>
>>> list(negated)
[0, -1, -4, -9, -16, -25, -36, -49]
The only disadvantage of using generator expressions is that they can no longer be configured using function parameters, nor can the same generator expression be reused multiple times in the same processing pipeline.
However, you can freely mix and match generator expressions and regular generators when building these pipelines, which can help improve the readability of complex pipelines.
Key Takeaways
-
Generators can be chained together to form efficient and maintainable data processing pipelines.
-
Chained generators process each element passed through the chain one by one.
-
Generator expressions can be used to write concise pipeline definitions, but may reduce code readability.