Python CSV reader’s behavior with None and empty strings

Python CSV Reader’s Behavior for None and Empty Strings

In this article, we’ll explain how the Python CSV reader handles None and empty strings. CSV is a commonly used file format for storing tabular data. Python provides the csv module for processing CSV files, and the CSV reader is a common tool for conveniently reading and parsing CSV files.

Read more: Python Tutorial

CSV File Format and Reading Examples

A CSV file is a plain text file that typically uses commas as field delimiters. Each line represents a record, and each field represents a data item. However, the field delimiters in CSV files can be other characters, such as semicolons or tabs.

The following is a sample CSV file containing three fields: name, age, and grades. Each line represents information for a single student.

Name, Age, Grades
Zhang San, 18, 90
Li Si, 20,
Wang Wu,,

We can use Python’s csv module to read and parse this CSV file.

import csv

with open('students.csv', newline='') as file:
reader = csv.reader(file)
for row in reader:
print(row)

Running the above code, we get the following output:

['Name', 'Age', 'Score']
['Zhang San', '18', '90']
['Li Si', '20', '']
['Wang Wu', '', '']

The CSV reader allows us to conveniently read CSV files as a two-dimensional list. Each element of the list represents a row of data, and the sublists within each element represent the fields in each row. During the reading process, the CSV reader parses each row of data according to the field delimiters.

Handling None

In a CSV file, if a field is empty, that position is represented as an empty string. However, in Python, we typically use None to represent an empty value. So, how does the CSV reader handle empty strings?

By default, the CSV reader treats empty strings as normal field values rather than converting them to None. Therefore, when reading a CSV file, an empty string is treated as a non-empty string. This means that in the above example, empty fields in the result are all empty strings rather than None.

To convert empty strings to None, we can use the csv.QUOTE_NULL option of the csv module. The following example demonstrates how to convert empty strings to None in the CSV reader.

import csv

with open('students.csv', newline='') as file:
reader = csv.reader(file, quoting=csv.QUOTE_NULL)
for row in reader:
print(row)

Running the above code, we get the following output:

['Name', 'Age', 'Grade']
['Zhang San', '18', '90']
['Li Si', '20', None]
['Wang Wu', None, None]

By setting quoting=csv.QUOTE_NULL, we instruct the CSV reader to convert empty strings to None. During the reading process, when it encounters an empty string, it will convert it to None. This allows us to conveniently use None to represent null values.

Empty String Handling

In addition to empty strings, there’s another special case to consider: when a field in a CSV file is empty, nonexistent, or contains only whitespace. How does the CSV reader handle this situation?

First, if a field in a position does not exist (for example, if one row has fewer fields than another), the CSV reader ignores the field and automatically appends None. This ensures that each sublist of the two-dimensional list has the same length.

Second, if a field in a position contains only whitespace (such as a space or tab), the CSV reader treats it as a non-empty string. This is the same as handling a normal empty string.

The following example shows how to handle nonexistent fields and fields containing only whitespace.

import csv

with open('students.csv', newline='') as file:
reader = csv.reader(file, skipinitialspace=True)
for row in reader:
print(row)

Running the above code, we get the following output:

['Name', 'Age', 'Grade']
['Zhang San', '18', '90']
['Li Si', '20', '']
['Wang Wu', '', '']

By setting skipinitialspace=True, we instruct the CSV reader to skip leading whitespace characters before parsing each field. This means that fields containing only whitespace characters will be treated as empty strings by the CSV reader.

Summary

In this article, we introduced the Python CSV reader’s behavior when handling None and empty strings. By default, the CSV reader treats empty strings as normal field values rather than converting them to None. To convert empty strings to None, we can use the quoting=csv.QUOTE_NULL option. Furthermore, the CSV reader handles fields that do not exist or contain only whitespace. By understanding these behaviors, we can better understand and process CSV file data.

Leave a Reply

Your email address will not be published. Required fields are marked *