Python uses tuples and named tuples
Python tuples are also immutable objects, making them well-suited for functional programming. Tuple data structures have limited methods, and most manipulation is done using prefix syntax. Tuples are widely used, primarily for lists of tuples, nested tuples, and tuple generators.
The named tuple class adds a key feature to tuples: attributes can be referenced by name rather than by number. Named tuples can be used to create objects that accumulate data, allowing for pure functions based on stateless objects while maintaining the organized structure of data as objects.
Subsequently, we will primarily use tuples and named tuples to manipulate collections. For single or double values, named function parameters can be used; when manipulating collections, iterable tuples or iterable named tuples are used.
Whether to use a tuple or a named tuple depends on which data structure is more convenient in a specific scenario. For example, data containing the three colors red, green, and blue can be abstracted into a triple of the form (number, number, number)
, but this definition does not have the order of red, green, and blue. There are several ways to make the structure of the tuple clearer.
The following function can be used to extract data from the triple to indicate its structure.
red = lambda color: color[0]
green = lambda color: color[1]
blue = lambda color: color[2]
For the tuple item
, you can use red(item)
to extract the red component from it and add a more formal type annotation.
from typing import Tuple, Callable
RGB = Tuple[int, int, int]
red: Callable[[RGB], int] = lambda color: color[0]
This defines a new type, RGB
, that represents a triple. The type signature of red
is Callable[[RGB], int]
, indicating that it is a function that takes an input parameter of type RGB
and returns an integer value.
Or use the old-style named tuple definition:
from collections import namedtutple
Color = namedtuple("Color", ("red", "green", "blue", "name"))
A better approach is to use the NamedTuple
class from the typing
module:
from typing import NamedTuple
class Color(NamedTuple):
"""An RGB color."""
red: int
green: int
blue: int
name: str
Color
class defines the name and type of each position in the tuple. While maintaining performance advantages and immutability, it also allows mypy to verify that the type variable is used appropriately.
When extracting red using the above two methods, you can use item.red
instead of red(item)
. Both are more readable than item[0]
.
The application of tuples in functional programming focuses on the iterable tuple design pattern.
Using Generator Expressions
Some examples of generator expressions were given above, and more generator expressions will be given later. This section will introduce some advanced techniques about generators.
Generator expressions are often used to generate lists or dictionary literals through list derivation or dictionary derivation. Take [x**2 for x in range(10)]
as an example. It is a list derivation, also known as a list display. List derivation is a usage of generator expressions. Collection (list or dictionary) derivation contains closed literal syntax. Here, the generator [x**2 for x in range(10)]
is wrapped with the list literal syntax []
, indicating that this is a list derivation that returns a list object generated by the generator expression x**2 for x in range(10)
contained within it. This section mainly introduces generator expressions and does not cover list objects.
Collection objects and generator expressions are both iterable, so many behaviors are similar, but as we will see later, they are not equivalent. The disadvantage of using derivation is that it generates a (potentially large) collection object, while generator expressions are lazy and create values on demand, which can improve performance.
Two points about generator expressions are important:
- Except when used by special functions (such as
len()
, which require the entire collection size), generators behave similarly to sequences. - A generator can only be used once, and a used generator is considered empty.
Here is a generator function used in subsequent examples:
def pfactorsl(x: int) -> Iterator[int]:
if x % 2 == 0:
yield 2
if x // 2 > 1:
yield from pfactorsl(x // 2)
return
for i in range(3, int(math.sqrt(x) + 0.5) + 1, 2):
if x % i == 0:
yield i
if x // i > 1:
yield from pfactorsl(x // i)
return
yield x
This function computes all prime factors of the input value. If the input value x is even, it returns 2 and then recursively computes all prime factors of x/2.
For odd numbers, starting with 3, each odd number is verified to be a factor of the input. If so, the factor i is returned, and then the function recursively generates all prime factors of xdivi.
If the input value has no prime factors, it must be prime, and the function returns itself.
Here, 2 is used as a special value to halve the number of iterations, since all prime numbers other than 2 are odd.
In addition to recursion, the function above uses the loop variable i outside the for loop, so its stateful nature does not cause problems when modifying the loop body.
This example shows how to manually implement tail recursion optimization, replacing the recursive evaluation from 3 to sqrt{x} with a loop, avoiding deeply nested recursive calls.
Since the function is iterable, the yield from statement is used to compute all values from the recursive call and return them to the caller.
Be careful when using the
return
statement in recursive generator functions. Avoid using the following command line:
Return recursive_iter(args)
This only returns the generator object, without evaluating the function and returning the result. The following two expressions are possible:
for result in recursive_iter(args):
yield result
Or
yield from recursive_iter(args)
A purely recursive version of the above function is as follows:
def pfactorsr(x: int) -> Iterator[int]:
def factor_n(x: int, n: int) -> Iterator[int]:
if n*n > x:
yield x
return
if x % n == 0:
yield n
if x//n > 1:
yield from factor_n(x//n, n)
else:
yield from factor_n(x, n+2)
if x % 2 == 0:
yield 2
if x//2 > 1:
yield from pfactorsr(x//2)
return
yield from factor_n(x, 3)
First define the internal recursive function factor_n()
Test 3leqnleq sqrt{x} Whether n is in the range If the number n is not within this range, x is prime. Otherwise, check whether n is a factor of x. If so, return n and recursively generate all prime factors of frac{x}{n}. Otherwise, recursively test whether n+2 is a factor of x. This requires recursively testing all values in (n+2, n+2+2, n+2+2+2, …). While this is simpler to write than the previous for
loop version, it cannot handle factors exceeding 1000 due to Python’s stack size limit.
The outer function handles edge cases. As with other prime number processing algorithms, this uses 2 as a special case. For even numbers, first return 2 and then recursively generate the prime factors of x div 2. The remaining prime factors must be odd numbers at least 3, so starting with 3, we use the factor_n()
function to test each possible solution in turn.
A purely recursive function can handle a maximum of approximately 4,000,000 input values. Beyond this number, the calculation will fail due to exceeding Python’s upper limit on nested recursion depth.
Limitations of Generators
Generator expressions and generator functions have limitations, as shown below:
>>> from ch02_ex4 import *
>>> pfactorsl(1560)
<generator object pfactorsl at 0x1007b74b0>
>>> list(pfactorsl(1560))
[2, 2, 2, 3, 5, 13]
>>> len(pfactorsl(1560))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'generator' has no len()
The first example shows that generator functions are not strictly evaluated; they are lazy and are not actually evaluated until they are evaluated externally. This isn’t a limitation, but rather a reason to use generator expressions for Python functional programming.
The second example instantiates a generator function to get a list object, making it easier to observe the output and write unit test cases.
The third example illustrates a limitation of generator functions: you can’t use the len()
function to find its length. Because generators are lazy, the total number of values cannot be known until all values are taken.
Another limitation of generator functions is that they can only be used once, as shown below:
>>> result = pfactorsl(1560)
>>> sum(result)
27
>>> sum(result)
0
After the generator is evaluated for the first time using the sum()
function, the generator is empty the second time it is used, indicating that the generator can only be used once.
Generators in Python have a lifecycle, which is not particularly ideal from a functional programming perspective.
You can use the itertools.tee()
function to overcome the one-time limit. Here’s a simple example:
import itertools
from typing import Iterable, Any
def limits(iterable: Iterable[Any]) -> Any:
max_tee, min_tee = itertools.tee(iterable, 2)
return max(max_tee), min(min_tee)
This creates two copies of the generator expression passed as an argument: max_tee()
and min_tee()
, leaving the original generator unchanged. This feature allows for very flexible function composition. By evaluating the two copies, we get the maximum and minimum values of the generator’s result.
Once the iterable has exhausted all its values, no further values will be generated. When performing multiple reductions on a generator, such as summing, counting, finding maximum or minimum values, the algorithm must be designed to account for its single-use nature.
Combining Generator Expressions
A core characteristic of functional programming is the ability to flexibly combine generator expressions and generator functions to describe complex processing flows. The following describes several common methods for combining generator expressions.
The first method is to combine generators by creating composite functions. Suppose we have a generator (f(x) for x in range())
and need to compute g(f(x))
, we can combine them in the following ways.
The original generator expression can be changed to the following form:
g_f_x = (g(f(x)) for x in range())
While this approach is technically correct, it violates the reuse principle: the existing expression is not reused but rewritten.
You can also embed one expression within another, as shown below:
g_f_x = (g(y) for y in (f(x) for x in range()))
This allows for convenient variable substitution. Reuse can be achieved with the following command:
f_x = (f(x) for x in range())
g_f_x = (g(y) for y in f_x)
The advantage of this implementation is that you don’t need to rewrite the original expression (f(x) for x in range())
; you can simply assign it to a variable.
The combined expression is still a generator expression and is lazy. When a value is taken from g_f_x
, it takes a value from f_x
, which in turn takes a value from the range()
function.