Python reads table data
Reading Tabular Data with Python
Introduction
Reading tabular data is a very common operation in data analysis and processing tasks. Python provides a variety of methods for reading tabular data, including built-in and third-party libraries. This article will introduce how to use Python to read tabular data in different formats and provide relevant code examples. The following topics are covered:
- Using the
pandas
library to read and process tabular data in CSV format. - Using the
xlrd
library to read and process tabular data in Excel format. - Using the
openpyxl
library to create and process Excel files. - Using the
sqlite3
library to read and process tabular data in SQLite databases.
Using pandas to read and process tabular data in CSV format.
CSV (Comma Separated Values) CSV (Values) is a commonly used tabular data storage format, where each row represents a record, and different fields are separated by commas or other delimiters. The pandas
library is a powerful data analysis tool that provides convenient functions for reading and processing CSV data.
Installing the pandas Library
To use the pandas
library, you first need to install it. You can install the pandas library using pip using the following command:
pip install pandas
Reading CSV Files
Reading CSV files using the pandas library is straightforward. First, import the pandas library and then use the read_csv() function to read a CSV file. Here’s some sample code for reading a CSV file:
import pandas as pd
# Read a CSV file
data = pd.read_csv(‘data.csv’)
This assumes the CSV file is named data.csv and is in the same directory as the Python script. The read_csv()
function reads a CSV file into a DataFrame
object. You can verify that the data was read correctly by inspecting the contents of data
.
Processing CSV Data
Once the CSV data has been read into a DataFrame
object, you can perform various processing and analysis on the data. The following are some common operations:
- Viewing a Data Overview: Use the
head()
function to view the first few rows of data, the default being the first five.
# View the first five rows of data
print(data.head())
- Descriptive Statistics: Use the
describe()
function to generate descriptive statistics for the data.
# Generate descriptive statistical analysis results
print(data.describe())
- Filtering data: You can use various conditions to filter data, such as selecting data in a column that meets a specific condition.
# Filtering data that meets a specific condition
filtered_data = data[data['column'] > 10]
- Sorting data: Use the
sort_values()
function to sort data.
# Sort data by a column in ascending order
sorted_data = data.sort_values('column')
- Data Aggregation: Use the
groupby()
function to group and aggregate data.
# Group data by a column and calculate the mean of each group
grouped_data = data.groupby('column').mean()
The above are just some basic examples of the pandas
library. It also provides many more functions for processing and analyzing tabular data. For detailed usage instructions, please refer to the official documentation for pandas.
Using xlrd to Read and Process Excel Tabular Data
Excel is another common tabular data storage format that can store multiple tables and workbooks. xlrd
is a library for reading and parsing Excel files, and is particularly suitable for processing older Excel versions (.xls format).
Installing the xlrd Library
To use the xlrd
library, you must first install it. You can install the xlrd library via pip using the following command:
pip install xlrd
h3 Reading Excel Files
Reading Excel files using the xlrd library requires the following steps:
Importing the xlrd library
Opening an Excel file
Obtaining a workbook object
Selecting a table object
Reading table data
ol
The following is sample code for reading an Excel file:
import xlrd
# Open an Excel file
workbook = xlrd.open_workbook(‘data.xls’)
# Obtain the first workbook object
worksheet = workbook.sheet_by_index(0)
# Read table data
data = []
for row in range(1, worksheet.nrows):
row_data = []
for col in range(worksheet.ncols):
cell_value = worksheet.cell_value(row, col)
row_data.append(cell_value)
data.append(row_data)
Here, we assume that the Excel file is named data.xls
and is in the same directory as the Python script. The open_workbook()
function opens the Excel file and returns a Workbook
object. The sheet_by_index()
function selects the workbook to read. The nrows
and ncols
attributes can be used to obtain the number of rows and columns in the table. The cell_value()
function can be used to obtain the value of a cell.
Processing Excel Data
After reading Excel data, you can perform further processing and analysis as needed. Although the xlrd
library provides some functionality for manipulating tabular data, it is relatively limited and not as convenient and powerful as the pandas
library. If you need to process more complex Excel files, we recommend using the pandas
library.
Creating and Processing Excel Files with OpenPyXL
In addition to reading Excel files, sometimes we need to create and process Excel files using Python. OpenPyXL
is a library for reading and writing Excel files, supporting newer Excel versions (.xlsx format).
Installing the OpenPyXL Library
To use the OpenPyXL library, you first need to install it. You can install the openpyxl library via pip using the following command:
pip install openpyxl
Creating an Excel File
Creating an Excel file using the openpyxl library requires the following steps:
1. Import the openpyxl library
2. Create a workbook object
3. Create a table object
4. Write table data
5. Save the Excel file
The following is sample code for creating an Excel file:
import openpyxl
# Create a workbook object
workbook = openpyxl.Workbook()
# Create a table object
worksheet = workbook.active
# Write table data
data = [[‘A1’, ‘B1’, ‘C1’],
[‘A2’, ‘B2’, ‘C2’],
[‘A3’, ‘B3’, ‘C3’]]
for i, row_data in enumerate(data, start=1):
for j, cell_value in enumerate(row_data, start=1):
worksheet.cell(row=i, column=j, value=cell_value)
# Save the Excel file
workbook.save(‘output.xlsx’)
Use the openpyxl.Workbook()
function to create a new workbook object. The active
property allows you to retrieve the default workbook object. You can use the create_sheet()
function to create additional workbooks. Use the cell()
function to access cells and write values. Finally, use the save()
function to save the Excel file.