How to create a pivot table in Python using Pandas?
How to Create a Pivot Table in Python Using Pandas?
Pivot tables are a powerful data analysis tool that allows you to summarize and aggregate data based on different dimensions. In Python, you can create pivot tables using the Pandas library, which provides flexible and efficient tools for data manipulation and analysis.
To create a pivot table in Pandas, you first need a dataset in a Pandas DataFrame. You can load data into a DataFrame from a variety of sources, such as CSV files, Excel spreadsheets, and SQL databases.
Once your data is loaded into a DataFrame, you can use the Pandas pivot_table() function to create a pivot table. Here’s the syntax:
dataframe.pivot(self, index=None, columns=None, values=None, aggfunc)
The pivot_table() function takes several parameters, including the DataFrame you want to use, the index column, the columns you want to use as the pivot table columns, and the values column you want to aggregate. You can also specify the aggregation function to use, such as sum, mean, max, min, and so on.
Before we delve into pivot tables and the pivot_table() function, let’s first create the DataFrame we’ll be using.
DataFrames in Pandas
In Pandas, a DataFrame is a two-dimensional labeled data structure with columns that may be of different types. It is the primary data structure used for data processing and analysis in Pandas.
A DataFrame can be thought of as a spreadsheet or SQL table, with rows and columns. It allows for easy processing and manipulation of data, including indexing, selecting, filtering, merging, and grouping.
Consider the following code. Using a Python dictionary, this code creates a DataFrame object named df with the columns ‘Product’, ‘Category’, ‘Quantity’, and ‘Amount’. Each key of the dictionary corresponds to the name of a column, and its value is a list containing the values for that column.
Example
# Import the Pandas library
import pandas as pd
# Create a DataFrame using a dictionary
df = pd.DataFrame({
'Product': ['Lychee', 'Broccoli', 'Banana', 'Banana', 'Bean', 'Orange', 'Mango', 'Banana'],
'Category': ['Fruit', 'Vegetable', 'Fruit', 'Fruit', 'Vegetable', 'Fruit', 'Fruit'],
'Quantity': [8, 5, 3, 4, 5, 9, 11, 8],
'Amount': [270, 239, 617, 384, 626, 610, 62, 90]
})
# Print DataFrame
print(df)
Output
When this code is executed, it will produce the following output on the terminal −
Product Category Quantity Amount
0 Lychee Fruit 8 270
1 Broccoli Vegetable 5 239
2 Banana Fruit 3 617
3 Banana Fruit 4 384
4 Bean Vegetable 5 626
5 Orange Fruit 9 610
6 Mango Fruit 11 62
7 Banana Fruit 8 90
Creating a Pivot Table with Pandas
Now let’s create a pivot table of sales totals using the pivot_table() function. See the code below.
Example
# Import the pandas library
import pandas as pd
# Create a DataFrame from a dictionary
# Create columns for "Product," "Category," "Quantity," and "Amount" using their values
df = pd.DataFrame({
'Product': ['Lychee', 'Broccoli', 'Banana', 'Banana', 'Beans', 'Orange', 'Mango', 'Banana'],
'Category': ['Fruit', 'Vegetable', 'Fruit', 'Fruit', 'Vegetable', 'Fruit', 'Fruit'],
'Quantity': [8, 5, 3, 4, 5, 9, 11, 8],
'Amount': [270, 239, 617, 384, 626, 610, 62, 90]
})
# Create a pivot table of total sales
# Group by product
pivot = df.pivot_table(index = ['Product'], values = ['Amount'], aggfunc = 'sum')
print(pivot)
# Print the data frame
print(df)
Explanation
- This creates a DataFrame object named df with four columns: ‘Product’, ‘Category’, ‘Quantity’, and ‘Amount’. Each column has its own values and is created using a Python dictionary.
- Then, the code creates a pivot table that groups the sales data by product and calculates the total sales for each product using the pivot_table() function.
- Finally, a pivot table is printed to the console to show the total sales data for each product, and the original DataFrame is printed to the console to show the raw data used to generate the pivot table.
Output
After execution, you will get the following output on the terminal –
Product Amount
Banana 1091
Beans 626
Broccoli 239
Litchi 270
Mango 62
Orange 610
Product Category Quantity Amount
0 Litchi Fruit 8 270
1 Broccoli Vegetable 5 239
2 Banana Fruit 3 617
3 Banana Fruit 4 384
4 Beans Vegetable 5 626
5 Orange Fruit 9 610
6 Mango Fruit 11 62
7 Banana Fruit 8 90
Conclusion
In conclusion, using the pandas library in Python Creating pivot tables in Python is a powerful way to analyze tabular data and extract meaningful insights. By grouping data and calculating aggregate values, pivot tables can help you identify patterns and trends in your data that might otherwise be difficult to spot. With the flexibility and ease of use provided by pandas, creating pivot tables has never been easier.
By following the steps outlined in this tutorial, you should now have a solid foundation for creating and manipulating pivot tables in Python.