Reading Tabular Data with Python

Introduction

Reading tabular data is a very common operation in data analysis and processing tasks. Python provides a variety of methods for reading tabular data, including built-in and third-party libraries. This article will introduce how to use Python to read tabular data in different formats and provide relevant code examples. The following topics are covered:

Using the pandas library to read and process tabular data in CSV format.
Using the xlrd library to read and process tabular data in Excel format.
Using the openpyxl library to create and process Excel files.
Using the sqlite3 library to read and process tabular data in SQLite databases.

Using pandas to read and process tabular data in CSV format.

CSV (Comma Separated Values) CSV (Values) is a commonly used tabular data storage format, where each row represents a record, and different fields are separated by commas or other delimiters. The `pandas` library is a powerful data analysis tool that provides convenient functions for reading and processing CSV data.

Installing the pandas Library

To use the pandas library, you first need to install it. You can install the pandas library using pip using the following command:

pip install pandas

Reading CSV Files

Reading CSV files using the pandas library is straightforward. First, import the pandas library and then use the read_csv() function to read a CSV file. Here’s some sample code for reading a CSV file:

import pandas as pd

# Read a CSV file
data = pd.read_csv(‘data.csv’)

This assumes the CSV file is named data.csv and is in the same directory as the Python script. The read_csv() function reads a CSV file into a DataFrame object. You can verify that the data was read correctly by inspecting the contents of data.

Processing CSV Data

Once the CSV data has been read into a DataFrame object, you can perform various processing and analysis on the data. The following are some common operations:

Viewing a Data Overview: Use the head() function to view the first few rows of data, the default being the first five.

# View the first five rows of data
print(data.head())

Descriptive Statistics: Use the describe() function to generate descriptive statistics for the data.

# Generate descriptive statistical analysis results
print(data.describe())

Filtering data: You can use various conditions to filter data, such as selecting data in a column that meets a specific condition.

# Filtering data that meets a specific condition
filtered_data = data[data['column'] > 10]

Sorting data: Use the sort_values() function to sort data.

# Sort data by a column in ascending order
sorted_data = data.sort_values('column')

Data Aggregation: Use the groupby() function to group and aggregate data.

# Group data by a column and calculate the mean of each group
grouped_data = data.groupby('column').mean()

The above are just some basic examples of the pandas library. It also provides many more functions for processing and analyzing tabular data. For detailed usage instructions, please refer to the official documentation for pandas.

Using xlrd to Read and Process Excel Tabular Data

Excel is another common tabular data storage format that can store multiple tables and workbooks. xlrd is a library for reading and parsing Excel files, and is particularly suitable for processing older Excel versions (.xls format).

Installing the xlrd Library

To use the xlrd library, you must first install it. You can install the xlrd library via pip using the following command:

pip install xlrd

h3 Reading Excel Files

Reading Excel files using the xlrd library requires the following steps:

Importing the xlrd library

Opening an Excel file

Obtaining a workbook object

Selecting a table object

Reading table data

The following is sample code for reading an Excel file:

import xlrd

# Open an Excel file
workbook = xlrd.open_workbook(‘data.xls’)

# Obtain the first workbook object
worksheet = workbook.sheet_by_index(0)

# Read table data
data = []
for row in range(1, worksheet.nrows):
row_data = []
for col in range(worksheet.ncols):
cell_value = worksheet.cell_value(row, col)
row_data.append(cell_value)
data.append(row_data)

Here, we assume that the Excel file is named data.xls and is in the same directory as the Python script. The open_workbook() function opens the Excel file and returns a Workbook object. The sheet_by_index() function selects the workbook to read. The nrows and ncols attributes can be used to obtain the number of rows and columns in the table. The cell_value() function can be used to obtain the value of a cell.

Processing Excel Data

After reading Excel data, you can perform further processing and analysis as needed. Although the xlrd library provides some functionality for manipulating tabular data, it is relatively limited and not as convenient and powerful as the pandas library. If you need to process more complex Excel files, we recommend using the pandas library.

Creating and Processing Excel Files with OpenPyXL

In addition to reading Excel files, sometimes we need to create and process Excel files using Python. OpenPyXL is a library for reading and writing Excel files, supporting newer Excel versions (.xlsx format).

Installing the OpenPyXL Library

To use the OpenPyXL library, you first need to install it. You can install the openpyxl library via pip using the following command:

pip install openpyxl

Creating an Excel File

Creating an Excel file using the openpyxl library requires the following steps:

1. Import the openpyxl library

2. Create a workbook object

3. Create a table object

4. Write table data

5. Save the Excel file

The following is sample code for creating an Excel file:

import openpyxl

# Create a workbook object
workbook = openpyxl.Workbook()

# Create a table object
worksheet = workbook.active

# Write table data
data = [[‘A1’, ‘B1’, ‘C1’],
[‘A2’, ‘B2’, ‘C2’],
[‘A3’, ‘B3’, ‘C3’]]
for i, row_data in enumerate(data, start=1):
for j, cell_value in enumerate(row_data, start=1):
worksheet.cell(row=i, column=j, value=cell_value)

# Save the Excel file
workbook.save(‘output.xlsx’)

Use the openpyxl.Workbook() function to create a new workbook object. The active property allows you to retrieve the default workbook object. You can use the create_sheet() function to create additional workbooks. Use the cell() function to access cells and write values. Finally, use the save() function to save the Excel file.

Python reads table data

Reading Tabular Data with Python

Introduction

Installing the pandas Library

Processing CSV Data

Using xlrd to Read and Process Excel Tabular Data

Installing the xlrd Library

Processing Excel Data

Creating and Processing Excel Files with OpenPyXL

Installing the OpenPyXL Library

Leave a ReplyCancel Reply

Reading Tabular Data with Python

Introduction

Installing the pandas Library

Processing CSV Data

Using xlrd to Read and Process Excel Tabular Data

Installing the xlrd Library

Processing Excel Data

Creating and Processing Excel Files with OpenPyXL

Installing the OpenPyXL Library

Related Posts

Python generate PDF

Summary of print output format in Python

Python gets the current time minus 1 day

Leave a ReplyCancel Reply