Python records, structures, and pure data objects

Python Records, Structures, and Plain Data Objects. Compared to arrays, record data structures have a fixed number of fields, each with a name and potentially different types.

This section introduces records, structures, and “plain data objects” in Python, but only covers built-in data types and classes in the standard library.

By the way, the definition of “record” here is very broad. For example, this section also covers types like Python’s built-in tuple. Since the fields in a tuple don’t have names, it’s not generally considered a record in the strict sense.

Python provides several data types that can be used to implement records, structures, and data transfer objects. This section will quickly introduce each implementation and its characteristics, and conclude with a summary and decision-making guide to help you make your own choice.

Okay, let’s get started!

Python Records, Structures, and Pure Data Objects: Dictionaries—Simple Data Objects

Python dictionaries can store any number of objects, each identified by a unique key. Dictionaries, also often called mappings or associative arrays, provide efficient access to objects associated with a given key.

Python dictionaries can also be used as record data types or data objects. Creating dictionaries in Python is easy, thanks to the language’s built-in syntactic sugar for creating dictionaries, which is concise and convenient.

Data objects created with dictionaries are mutable, and since fields can be added and removed at will, there is little protection for field names. These features combined can introduce surprising bugs; after all, there’s a trade-off between convenience and avoiding errors.

car1 = {
'color': 'red',
'mileage': 3812.4,
'automatic': True,
}
car2 = {
'color': 'blue',
'mileage': 40231,
'automatic': False,
}

# Dictionaries have a nice __repr__ method:
>>> car2
{'color': 'blue', 'automatic': False, 'mileage': 40231}

# Get mileage:
>>> car2['mileage']
40231

# Dictionaries are mutable:
>>> car2['mileage'] = 12
>>> car2['windshield'] = 'broken'
>>> car2
{'windshield': 'broken', 'color': 'blue',
'automatic': False, 'mileage': 12}

# No protection against incorrect, missing, or extra field names:
car3 = {
'colr': 'green',
'automatic': False,
'windshield': 'broken',
}

Python Records, Structs, and Pure Data Objects Tuples—Immutable Collections of Objects

Python tuples are simple data structures used to group arbitrary objects. Tuples are immutable and cannot be modified after creation.

In terms of performance, tuples take up slightly less memory than CPython lists and are faster to construct.

As can be seen from the following disassembled bytecode, constructing a tuple constant requires only one LOAD_CONST opcode, while constructing a list object with the same content requires multiple operations:

>>> import dis
>>> dis.dis(compile("(23, 'a', 'b', 'c')", '', 'eval'))
0 LOAD_CONST 4 ((23, 'a', 'b', 'c'))
3 RETURN_VALUE

>>> dis.dis(compile("[23, 'a', 'b', 'c']", '', 'eval'))
0 LOAD_CONST 0 (23)
3 LOAD_CONST 1 ('a')
6 LOAD_CONST 2 ('b')
9 LOAD_CONST 3 ('c')
12 BUILD_LIST 4
15 RETURN_VALUE

However, you shouldn’t be overly concerned with these differences. In practice, the performance difference is usually negligible, and attempting to gain additional performance gains by replacing lists with tuples is generally misguided.

A potential disadvantage of plain tuples is that the data stored in them can only be accessed using integer indices, making it impossible to assign names to individual attributes stored in the tuple, which can hinder code readability.

Furthermore, tuples are always singleton structures, making it difficult to ensure that two tuples store the same number of fields and the same attributes.

This makes it easy to make inadvertent mistakes, such as getting the fields in the wrong order. Therefore, it’s recommended to store as few fields as possible in tuples.

# Fields: color, mileage, automatic
>>> car1 = ('red', 3812.4, True)
>>> car2 = ('blue', 40231.0, False)

# Tuple instances have a nice __repr__ method:
>>> car1
('red', 3812.4, True)
>>> car2
('blue', 40231.0, False)

# Get mileage:
>>> car2[1]
40231.0

# Tuples are mutable:
>>> car2[1] = 12
TypeError:
"'tuple' object does not support item assignment"

# No error handling for incorrect or extra fields, or for providing fields in the wrong order:
>>> car3 = (3431.5, 'green', True, 'silver')

Python Records, Structs, and Plain Data Objects: Writing Custom Classes – Manual Fine-Grained Control

Classes can be used to define reusable “blueprints” for data objects, ensuring that each object provides the same fields.
Normal Python classes can be used as record data types, but they require manual implementation of some convenient features already available in other implementations. For example, adding new fields to the __init__ constructor is tedious and time-consuming.

Furthermore, the default string representation of objects instantiated from custom classes is not very useful. Addressing this requires adding your own __repr__ method. This method is often lengthy and must be updated each time a new field is added.

Fields stored on classes are mutable, and new fields can be added at will. Using the @property decorator allows for read-only fields and more access control, but this requires writing more glue code.

Writing custom classes is suitable for adding business logic and behavior to record objects, but it means that these objects are technically no longer ordinary pure data objects.

class Car:
def __init__(self, color, mileage, automatic):
self.color = color
self.mileage = mileage
self.automatic = automatic

>>> car1 = Car('red', 3812.4, True)
>>> car2 = Car('blue', 40231.0, False)

# Get mileage:
>>> car2.mileage
40231.0

# The class is mutable:
>>> car2.mileage = 12
>>> car2.windshield = 'broken'

# The default string representation of the class is not very useful; a __repr__ method must be manually written:
>>> car1
<Car object at 0x1081e69e8>

Python Records, Structures, and Pure Data Objects: collections.namedtuple – Convenient Data Objects

The namedtuple class, added since Python 2.6, extends the built-in tuple data type. Similar to custom classes, namedtuple allows you to define reusable “blueprints” for records, ensuring that the correct field names are used every time.

Like regular tuples, namedtuples are immutable. This means that after a namedtuple instance is created, new fields cannot be added or existing fields modified.

Aside from this, a namedtuple is just a tuple with a name. Each object stored in it can be accessed by a unique identifier. Therefore, there’s no need for integer indexing or workarounds like defining integer constants as mnemonics for indexes.

Namedtuple objects are implemented internally as regular Python classes, with better memory usage than regular classes and just as efficient as regular tuples:

>>> from collections import namedtuple
>>> from sys import getsizeof

>>> p1 = namedtuple('Point', 'x y z')(1, 2, 3)
>>> p2 = (1, 2, 3)

>>> getsizeof(p1)
72
>>> getsizeof(p2)
72

Using namedtuples can inadvertently clean up code and make it more readable by forcing better data organization.

I’ve found that switching from specialized data types (such as fixed-format dictionaries) to namedtuples helps express the intent of my code more clearly. Often, when I refactor an application to use namedtuples, I’ll magically find a better solution to a problem in the code.

Replacing ordinary (unstructured) tuples and dictionaries with namedtuples can also reduce the burden on colleagues, because the data passed using namedtuples becomes somewhat self-explanatory.

>>> from collections import namedtuple
>>> Car = namedtuple('Car' , 'color mileage automatic')
>>> car1 = Car('red', 3812.4, True)

# Instance has a nice __repr__ method:
>>> car1
Car(color='red', mileage=3812.4, automatic=True)

# Accessing fields:
>>> car1.mileage
3812.4

# Field is immutable:
>>> car1.mileage = 12
AttributeError: "can't set attribute"
>>> car1.windshield = 'broken'
AttributeError:
"'Car' object has no attribute 'windshield'"

Python Records, Structures, and Plain Data Objects: typing.NamedTuple — An Improved NamedTuple

This class was added in Python 3.6 and is a sister class to the namedtuple class in the collections module. It is very similar to namedtuple, differing primarily in the new syntax for defining record types and support for type annotations (type hints).

Note that only standalone type checking tools like mypy care about type annotations. However, even without tool support, type annotations can help other programmers better understand your code (which can be confusing if type annotations aren’t kept up to date with your code).

>>> from typing import NamedTuple

class Car(NamedTuple):
color: str
mileage: float
automatic: bool

>>> car1 = Car('red', 3812.4, True)

# Instance has a nice __repr__ method:
>>> car1
Car(color='red', mileage=3812.4, automatic=True)

# Accessing fields:
>>> car1.mileage
3812.4

# Field is immutable:
>>> car1.mileage = 12
AttributeError: "can't set attribute"
>>> car1.windshield = 'broken'
AttributeError:
"'Car' object has no attribute 'windshield'"

# Only type checkers like mypy will honor type annotations:
>>> Car('red', 'NOT_A_FLOAT', 99)
Car(color='red', mileage='NOT_A_FLOAT', automatic=99)

Python Records, Structures, and Pure Data Objects struct.Struct — Serializing C Structures

The struct.Struct class converts between Python values and C structures, serializing them into Python bytes objects. This can be used, for example, to handle binary data stored in files or coming from a network connection.

Structs are defined using a syntax similar to format strings, allowing them to define and organize various C data types (such as char, int, long, and their unsigned variants).

Serialized structures are not generally used to represent data objects that are manipulated exclusively within Python code; instead, they are primarily used as a data exchange format.

In some cases, packing primitive data types into structures can use less memory than other data types. However, in most cases this is an advanced (and potentially unnecessary) optimization.

>>> from struct import Struct
>>> MyStruct = Struct('i?f')
>>> data = MyStruct.pack(23, False, 42.0)

# This gives us a blob of data in memory:
>>> data
b'x17x00x00x00x00x00x00x00x00x00x00(B'

# The data can be unpacked again:
>>> MyStruct.unpack(data)
(23, False, 42.0)

Python Records, Structures, and Pure Data Objects types.SimpleNamespace — Fancy Attribute Access

Here’s another advanced way to create data objects in Python: types.SimpleNamespace. This class, added in Python 3.3, allows you to access its namespace as attributes.

That is, a SimpleNamespace instance exposes all its keys as class attributes. Therefore, you can access attributes using dot-notation syntax like obj.key instead of the regular dictionary’s obj['key'] square bracket indexing syntax. All instances also include a nice __repr__ by default.

As its name suggests, SimpleNamespace is simple: essentially, it’s an extended dictionary, with nice attribute access and string printing, and the ability to freely add, modify, and delete attributes.

>>> from types import SimpleNamespace
>>> car1 = SimpleNamespace(color='red',
... mileage=3812.4,
... automatic=True)

# Default __repr__ effect:
>>> car1
namespace(automatic=True, color='red', mileage=3812.4)

# Instances support attribute access and are mutable:
>>> car1.mileage = 12
>>> car1.windshield = 'broken'
>>> del car1.automatic
>>> car1
namespace(color='red', mileage=12, windshield='broken')

Key Takeaways

So what type of data object should you use in Python? As you can see, there are many different ways to implement records or data objects in Python, and which one to use often depends on the specific situation.
If you only have two or three fields, the order of the fields is easy to remember, or you don’t need to use field names, use a simple tuple object. For example, in three-dimensional space, (x, y, z) <br>
<strong>If you need to implement a data object with immutable fields, use a simple tuple like <code>collections.namedtuple
or typing.NamedTuple.
If you want to lock field names to prevent typos, collections.namedtuple and typing.NamedTuple are also recommended.
If you want to keep things simple, use a simple dictionary object, which has a convenient syntax similar to JSON.
If you need to modify the data structure,
If you need to add behavior (methods) to your object, you should write your own class from scratch or by extending collections.namedtuple or typing.NamedTuple.
If you want to strictly package data for serialization to disk or sending over the network, I recommend using struct.Struct.
In general, if you want to implement a regular record, structure, or data object in Python, my recommendation is to use collections.namedtuple in , or its sister typing.NamedTuple in Python 3.

Leave a Reply

Your email address will not be published. Required fields are marked *