Python assigns level values
Python Assigning Rank Values Below, we will solve the ranking problem in two parts. First, we create a general higher-order function to assign a rank value to the x
or y
attribute of a Pair
object. Then, we use this function to wrap the Pair
object, incorporating the rank values of the x
and y
attributes into it, thus avoiding deeply nested structures.
The function that assigns a rank value to each example in the dataset is as follows:
from typing import Callable, Tuple, List, TypeVar, cast, Dict
from typing import Dict, Iterable, Iterator
from collections import defaultdict
D_ = TypeVar("D_")
K_ = TypeVar("K_")
def rank(
data: Iterable[D_],
key: Callable[[D_], K_]=lambda obj: cast(K_, obj)
) -> Iterator[Tuple[float, D_]]:
def build_duplicates(
duplicates: Dict[K_, List[D_]],
data_iter: Iterator[D_],
key: Callable[[D_], K_]
) -> Dict[K_, List[D_]]:
for item in data_iter:
duplicates[key(item)].append(item)
return duplicates
def rank_output(
duplicates: Dict[K_, List[D_]],
key_iter: Iterator[K_],
base: int=0
) -> Iterator[Tuple[float, D_]]:
for k in key_iter:
dups = len(duplicates[k])
for value in duplicates[k]:
yield (base + 1 + base + dups) / 2, value
base += dups
duplicates = build_duplicates(
defaultdict(list), iter(data), key)
return rank_output(duplicates, iter(sorted(duplicates)), 0)
This rank sorting function uses two subfunctions to convert a list of values into a list of two-tuples containing the sequence value and the source data. The first step is to use the build_duplicates()
function to create the dictionary duplicates
. The dictionary’s keys are the values of the source data after being transformed by the key
function, and the corresponding values are a sequence of source data with the same key value. The second step is to call the rank_output()
function to generate a sequence of two-tuples based on duplicates
.
To clarify the relationship between the data, two type variables are defined here. The D_
type variable represents the type of the source data, such as the Leg
type or other complex objects. The K_
type variable represents the type of the rank value used for ranking, which can be different from the source data type. For example, the distance value extracted from the Leg
named tuple is a real number. The conversion from source data to rank value is implemented using the function parameter represented by the key
parameter, which has the type annotation Callable[[D_], K_]
.
build_duplicates()
uses a stateful object to construct a mapping between keys and values. Its implementation is achieved through tail-call optimization of a recursive algorithm. The internal state is first exposed as parameters to the build_duplicates()
function. The recursive implementation uses an empty data_iter
and a value of 0 for base
. These variables are not required for the iterative implementation of the algorithm, but they serve to better illustrate how the recursion works.
Similarly, the rank_output()
function recursively generates two-tuples consisting of source data and rank values. This is optimized here into a double for
loop. To explicitly calculate the rank, we take the average of the left bound (base + 1) and the right bound (base + dups). If an item in duplicates contains only one element, the rank is (2 * base + 2) / 2, making this formula generally applicable.
The type of duplicates is Dict[K_, List[D_]]
, indicating that the key type is K_
and the value type is List[D_]
, which is a list of the source data. This type appears repeatedly in the algorithm, demonstrating that good type definitions can effectively explain the reuse relationships between types.
The following example tests whether it works correctly. The first example sorts a single value, and the second example sorts a list of two-tuples, using an anonymous function to extract the key value from each tuple.
>>> list(rank([0.8, 1.2, 1.2, 2.3, 18]))
[(1.0, 0.8), (2.5, 1.2), (2.5, 1.2), (4.0, 2.3), (5.0, 18)]
>>> data= [(2, 0.8), (3, 1.2), (5, 1.2), (7, 2.3), (11, 18)]
>>> list(rank(data, key=lambda x:x[1]))
[(1.0, (2, 0.8)),
(2.5, (3, 1.2)),
(2.5, (5, 1.2)),
(4.0, (7, 2.3)),
(5.0, (11, 18))]
The sample data contains two identical values. The final calculated rank is 2.5, the middle number between the two orders (2 and 3). This is a common method for calculating the Spearman rank correlation coefficient between two data sets.
The
rank()
function rearranges the input data to find duplicate values. To calculate the rank for each pair ofx
andy
values, the data must be rearranged twice.