Openpyxl Tutorial
This article shows how to use the openpyxl library to work with Excel files in Python. openpyxl is a Python library for reading and writing Excel 2010 xlsx / xlsm / xltx / xltm files. (Python Tutorial)
Excel xlsx
In this tutorial, we use xlsx files. xlsx is the file extension for the open XML spreadsheet file format used by Microsoft Excel. xlsm files support macros. xlsx is a proprietary binary format, while xlsx is based on the Office Open XML Formats.
$ sudo pip3 install openpyxl
We use the pip3
tool to install openpyxl
.
Openpyxl Creates a New File
In this first example, we use openpyxl
to create a new xlsx file.
write_xlsx.py
#!/usr/bin/env python
from openpyxl import Workbook
import time
book = Workbook()
sheet = book.active
sheet['A1'] = 56
sheet['A2'] = 43
now = time.strftime("%x")
sheet['A3'] = now
book.save("sample.xlsx")
In this example, we create a new xlsx file. We write data into three cells.
from openpyxl import Workbook
From the openpyxl
module, we import the Workbook
class. A workbook is a container for all other parts of a document.
book = Workbook()
We create a new workbook. A workbook is always created with at least one worksheet.
sheet = book.active
We get a reference to the active worksheet.
sheet['A1'] = 56
sheet['A2'] = 43
We write numeric data to cells A1 and A2.
now = time.strftime("%x")
sheet['A3'] = now
We write the current date to cell A3.
book.save("sample.xlsx")
We use the save()
method to write the contents to the sample.xlsx
file.
Openpyxl Writing Cells
There are two basic ways to write to cells: using the worksheet’s keys (such as A1 or D3), or using row and column notation with the cell()
method.
write2cell.py
#!/usr/bin/env python
from openpyxl import Workbook
book = Workbook()
sheet = book.active
sheet['A1'] = 1
sheet.cell(row=2, column=2).value = 2
book.save('write2cell.xlsx')
In this example, we write two values to two cells.
sheet['A1'] = 1
Here, we assign a value to cell A1.
sheet.cell(row=2, column=2).value = 2
In this line, we write to cell B2 using row and column notation.
Openpyxl Appending Values
Using the append()
method, we can append a set of values to the bottom of the current worksheet.
appending_values.py
#!/usr/bin/env python
from openpyxl import Workbook
book = Workbook()
sheet = book.active
rows = (
(88, 46, 57),
(89, 38, 12),
(23, 59, 78),
(56, 21, 98),
(24, 18, 43),
(34, 15, 67)
)
for row in rows:
sheet.append(row)
book.save('appending.xlsx')
In this example, we append three columns of data to the current worksheet.
rows = (
(88, 46, 57),
(89, 38, 12),
(23, 59, 78),
(56, 21, 98),
(24, 18, 43),
(34, 15, 67)
)
The data is stored in tuples of tuples.
for row in rows:
sheet.append(row)
We go through the container row by row and insert rows of data using the append()
method.
OpenPyXL Reading Cells
In the following example, we read previously written data from the sample.xlsx
file.
read_cells.py
#!/usr/bin/env python
import openpyxl
book = openpyxl.load_workbook('sample.xlsx')
sheet = book.active
a1 = sheet['A1']
a2 = sheet['A2']
a3 = sheet.cell(row=3, column=1)
print(a1.value)
print(a2.value)
print(a3.value)
This example loads an existing xlsx file and reads three cells.
book = openpyxl.load_workbook('sample.xlsx')
Use the load_workbook()
method to open the file.
a1 = sheet['A1']
a2 = sheet['A2']
a3 = sheet.cell(row=3, column=1)
We read the contents of cells A1, A2, and A3. In the third row, we use the cell()
method to get the value of cell A3.
$ ./read_cells.py
56
43
10/26/16
This is the output of the example.
OpenPyXL Reading Multiple Cells
We have the following data table:
We use the range operator to read the data.
read_cells2.py
#!/usr/bin/env python
import openpyxl
book = openpyxl.load_workbook('items.xlsx')
sheet = book.active
cells = sheet['A1': 'B6']
for c1, c2 in cells:
print("{0:8} {1:8}".format(c1.value, c2.value))
In this example, we use range operations to read data from two columns.
cells = sheet['A1': 'B6']
In this row, we read data from cells A1-B6.
for c1, c2 in cells:
print("{0:8} {1:8}".format(c1.value, c2.value))
The
format()
function is used to neatly output data to the console.
$ ./read_cells2.py
Items Quantity
coins 23
chairs 3
pencils 5
bottles 8
books 30
Openpyxl Iteration by Row
The
iter_rows()
method returns the cells in a worksheet as rows.
iterating_by_rows.py
#!/usr/bin/env python
from openpyxl import Workbook
book = Workbook()
sheet = book.active
rows = (
(88, 46, 57),
(89, 38, 12),
(23, 59, 78),
(56, 21, 98),
(24, 18, 43),
(34, 15, 67)
)
for row in rows:
sheet.append(row)
for row in sheet.iter_rows(min_row=1, min_col=1, max_row=6, max_col=3):
for cell in row:
print(cell.value, end=" ")
print()
book.save('iterbyrows.xlsx')
This example iterates over the data row by row.
for row in sheet.iter_rows(min_row=1, min_col=1, max_row=6, max_col=3):
We provide bounds for the iteration.
$ ./iterating_by_rows.py
88 46 57
89 38 12
23 59 78
56 21 98
24 18 43
34 15 67
Openpyxl Iterating by Columns
iter_cols()
method returns the cells in a worksheet as columns.
iterating_by_columns.py
#!/usr/bin/env python
from openpyxl import Workbook
book = Workbook()
sheet = book.active
rows = (
(88, 46, 57),
(89, 38, 12),
(23, 59, 78),
(56, 21, 98),
(24, 18, 43),
(34, 15, 67)
)
for row in rows:
sheet.append(row)
for row in sheet.iter_cols(min_row=1, min_col=1, max_row=6, max_col=3):
for cell in row:
print(cell.value, end=" ")
print()
book.save('iterbycols.xlsx')
This example iterates through the data column by column.
$ ./iterating_by_columns.py
88 89 23 56 24 34
46 38 59 21 18 15
57 12 78 98 43 67
Statistics
For the next example, we need to create an xlsx file containing numbers. For example, we use the RANDBETWEEN()
function to create 25 rows of numbers in 10 columns.
mystats.py
#!/usr/bin/env python
import openpyxl
import statistics as stats
book = openpyxl.load_workbook('numbers.xlsx', data_only=True)
sheet = book.active
rows = sheet.rows
values = []
for row in rows:
for cell in row:
values.append(cell.value)
print("Number of values: {0}".format(len(values)))
print("Sum of values: {0}".format(sum(values)))
print("Minimum value: {0}".format(min(values)))
print("Maximum value: {0}".format(max(values)))
print("Mean: {0}".format(stats.mean(values)))
print("Median: {0}".format(stats.median(values)))
print("Standard deviation: {0}".format(stats.stdev(values)))
print("Variance: {0}".format(stats.variance(values)))
In this example, we read all the values from the worksheet and calculate some basic statistics.
import statistics as stats
Import the statistics
module to provide some statistical functions, such as median and variance.
book = openpyxl.load_workbook('numbers.xlsx', data_only=True)
Using the data_only
option, we get the values from the cells rather than the formulas.
rows = sheet.rows
We get all rows of cells that are not empty.
for row in rows:
for cell in row:
values.append(cell.value)
In the two for loops, we form a list of integer values from the cells.
print("Number of values: {0}".format(len(values)))
print("Sum of values: {0}".format(sum(values)))
print("Minimum value: {0}".format(min(values)))
print("Maximum value: {0}".format(max(values)))
print("Mean: {0}".format(stats.mean(values)))
print("Median: {0}".format(stats.median(values)))
print("Standard deviation: {0}".format(stats.stdev(values)))
print("Variance: {0}".format(stats.variance(values)))
We calculate and print mathematical statistics about the values. Some functions are built-in, while others are imported through the statistics
module.
$ ./mystats.py
Number of values: 312
Sum of values: 15877
Minimum value: 0
Maximum value: 100
Mean: 50.88782051282051
Median: 54.0
Standard deviation: 28.459203819700967
Variance: 809.9262820512821
Openpyxl Filtering & Sorting Data
Sheets have an auto_filter
attribute that allows for setting filtering and sorting conditions.
Note that Openpyxl sets the conditions, but we have to apply them in the spreadsheet application.
filter_sort.py
#!/usr/bin/env python
from openpyxl import Workbook
wb = Workbook()
sheet = wb.active
data = [
['Item', 'Colour'],
['pen', 'brown'],
['book', 'black'],
['plate', 'white'],
['chair', 'brown'],
['coin', 'gold'],
['bed', 'brown'],
['notebook', 'white'],
]
for r in data:
sheet.append(r)
sheet.auto_filter.ref = 'A1:B8'
sheet.auto_filter.add_filter_column(1, ['brown', 'white'])
sheet.auto_filter.add_sort_condition('B2:B8')
wb.save('filtered.xlsx')
In this example, we create a worksheet containing items and their colors. We set a filter and a sort condition.
Openpyxl Dimensions
To get to the cells that actually contain data, we can use dimensions.
dimensions.py
#!/usr/bin/env python
from openpyxl import Workbook
book = Workbook()
sheet = book.active
sheet['A3'] = 39
sheet['B3'] = 19
rows = [
(88, 46),
(89, 38),
(23, 59),
(56, 21),
(24, 18),
(34, 15)
]
for row in rows:
sheet.append(row)
print(sheet.dimensions)
print("Minimum row: {0}".format(sheet.min_row))
print("Maximum row: {0}".format(sheet.max_row))
print("Minimum column: {0}".format(sheet.min_column))
print("Maximum column: {0}".format(sheet.max_column))
for c1, c2 in sheet[sheet.dimensions]:
print(c1.value, c2.value)
book.save('dimensions.xlsx')
This example calculates the dimensions of two columns of data.
sheet['A3'] = 39
sheet['B3'] = 19
rows = [
(88, 46),
(89, 38),
(23, 59),
(56, 21),
(24, 18),
(34, 15)
]
for row in rows:
sheet.append(row)
We add data to the worksheet. Note that we start adding rows starting with the third one.
print(sheet.dimensions)
The
dimensions
property returns the top-left and bottom-right corners of a range of non-empty cells.
print("Minimum row: {0}".format(sheet.min_row))
print("Maximum row: {0}".format(sheet.max_row))
Using the min_row
and max_row
properties, we can get the minimum and maximum rows containing data.
print("Minimum column: {0}".format(sheet.min_column))
print("Maximum column: {0}".format(sheet.max_column))
Through the min_column
and max_column
attributes, we get the minimum and maximum columns containing data.
for c1, c2 in sheet[sheet.dimensions]:
print(c1.value, c2.value)
We loop through the data and print it to the console.
$ ./dimensions.py
A3:B9
Minimum row: 3
Maximum row: 9
Minimum column: 1
Maximum column: 2
39 19
88 46
89 38
23 59
56 21
24 18
34 15
Worksheets
Each workbook can have multiple worksheets.
Figure: Sheets
Let’s have a workbook with these three sheets.
sheets.py
#!/usr/bin/env python
import openpyxl
book = openpyxl.load_workbook('sheets.xlsx')
print(book.get_sheet_names())
active_sheet = book.active
print(type(active_sheet))
sheet = book.get_sheet_by_name("March")
print(sheet.title)
This program can be used with Excel worksheets.
print(book.get_sheet_names())
The
get_sheet_names()
method returns the names of the available worksheets in a workbook.
active_sheet = book.active
print(type(active_sheet))
We get the active sheet and print its type to the terminal.
sheet = book.get_sheet_by_name("March")
We use the get_sheet_by_name()
method to get a reference to the worksheet.
print(sheet.title)
The title of the retrieved worksheet is printed to the terminal.
$ ./sheets.py
['January', 'February', 'March']
<class 'openpyxl.worksheet.worksheet.Worksheet'>
March
This is the output of the program.
sheets2.py
#!/usr/bin/env python
import openpyxl
book = openpyxl.load_workbook('sheets.xlsx')
book.create_sheet("April")
print(book.sheetnames)
sheet1 = book.get_sheet_by_name("January")
book.remove_sheet(sheet1)
print(book.sheetnames)
book.create_sheet("January", 0)
print(book.sheetnames)
book.save('sheets2.xlsx')
In this example, we create a new worksheet.
book.create_sheet("April")
Use the create_sheet()
method to create a new sheet.
print(book.sheetnames)
Sheet names can also be displayed using the sheetnames
property.
book.remove_sheet(sheet1)
Sheets can be removed using the remove_sheet()
method.
book.create_sheet("January", 0)
You can create a new sheet at a specified location. In this example, we create a new sheet at index 0.
$ ./sheets2.py
['January', 'February', 'March', 'April']
['February', 'March', 'April']
['January', 'February', 'March', 'April']
You can change the background color of a worksheet.
sheets3.py
#!/usr/bin/env python
import openpyxl
book = openpyxl.load_workbook('sheets.xlsx')
sheet = book.get_sheet_by_name("March")
sheet.sheet_properties.tabColor = "0072BA"
book.save('sheets3.xlsx')
This example changes the background color of the worksheet titled “March.”
sheet.sheet_properties.tabColor = "0072BA"
We change the tabColor
property to a new color.
The background color of the third worksheet has been changed to a shade of blue.
Merging Cells
Cells can be merged using the merge_cells()
method, but not using the unmerge_cells()
method. When merging cells, all cells except the top-left corner are removed from the worksheet.
merging_cells.py
#!/usr/bin/env python
from openpyxl import Workbook
from openpyxl.styles import Alignment
book = Workbook()
sheet = book.active
sheet.merge_cells('A1:B2')
cell = sheet.cell(row=1, column=1)
cell.value = 'Sunny day'
cell.alignment = Alignment(horizontal='center', vertical='center')
book.save('merging.xlsx')
In this example, we merge four cells: A1, B1, A2, and B2. The text in the last cell is centered.
from openpyxl.styles import Alignment
To center the text in the last cell, we use the Alignment
class from the openpyxl.styles
module.
sheet.merge_cells('A1:B2')
We merge four cells using the merge_cells()
method.
cell = sheet.cell(row=1, column=1)
We now have the last cell.
cell.value = 'Sunny day'
cell.alignment = Alignment(horizontal='center', vertical='center')
We set the text to the merged cell and update its alignment.
Openpyxl Freeze Panes
When freezing panes, we keep an area of the worksheet visible while scrolling to another area of the worksheet.
freezing.py
#!/usr/bin/env python
from openpyxl import Workbook
from openpyxl.styles import Alignment
book = Workbook()
sheet = book.active
sheet.freeze_panes = 'B2'
book.save('freezing.xlsx')
This example freezes the panes at cell B2.
sheet.freeze_panes = 'B2'
To freeze the panes, we use the freeze_panes
property.
Openpyxl Formulas
The next example shows how to use formulas. openpyxl
does not perform calculations; it writes the formulas to the cells.
formulas.py
#!/usr/bin/env python
from openpyxl import Workbook
book = Workbook()
sheet = book.active
rows = (
(34, 26),
(88, 36),
(24, 29),
(15, 22),
(56, 13),
(76, 18)
)
for row in rows:
sheet.append(row)
cell = sheet.cell(row=7, column=2)
cell.value = "=SUM(A1:B6)"
cell.font = cell.font.copy(bold=True)
book.save('formulas.xlsx')
In this example, we use the SUM()
function to calculate the sum of all values and display the output in bold.
rows = (
(34, 26),
(88, 36),
(24, 29),
(15, 22),
(56, 13),
(76, 18)
)
for row in rows:
sheet.append(row)
We create two columns of data.
cell = sheet.cell(row=7, column=2)
We get the cell that displays the calculation result.
cell.value = "=SUM(A1:B6)"
We write a formula into a cell.
cell.font = cell.font.copy(bold=True)
We change the font style.
OpenPyXL Images
In the following example, we show how to insert an image into a worksheet.
write_image.py
#!/usr/bin/env python
from openpyxl import Workbook
from openpyxl.drawing.image import Image
book = Workbook()
sheet = book.active
img = Image("icesid.png")
sheet['A1'] = 'This is Sid'
sheet.add_image(img, 'B2')
book.save("sheet_image.xlsx")
In this example, we write an image to a sheet of paper.
from openpyxl.drawing.image import Image
We use the Image
class from the openpyxl.drawing.image
module.
img = Image("icesid.png")
This creates a new Image
class. The icesid.png
image is located in the current working directory.
sheet.add_image(img, 'B2')
We use the add_image()
method to add a new image.
Openpyxl Charts
openpyxl
supports creating a variety of charts, including bar, line, area, bubble, scatter, and pie charts.
According to the documentation, openpyxl
only supports creating charts within a worksheet. Existing charts in the workbook will be lost.
create_bar_chart.py
#!/usr/bin/env python
from openpyxl import Workbook
from openpyxl.chart import (
Reference,
Series,
BarChart
)
book = Workbook()
sheet = book.active
rows = [
("USA", 46),
("China", 38),
("UK", 29),
("Russia", 22),
("South Korea", 13),
("Germany", 11)
]
for row in rows:
sheet.append(row)
data = Reference(sheet, min_col=2, min_row=1, max_col=2, max_row=6)
categories = Reference(sheet, min_col=1, min_row=1, max_row=6)
chart = BarChart()
chart.add_data(data=data)
chart.set_categories(categs)
chart.legend = None
chart.y_axis.majorGridlines = None
chart.varyColors = True
chart.title = "Olympic Gold Medals in London"
sheet.add_chart(chart, "A8")
book.save("bar_chart.xlsx")
In this example, we create a bar chart showing the number of Olympic gold medals for each country in London 2012.
from openpyxl.chart import (
Reference,
Series,
BarChart
)
openpyxl.chart
The module has tools for working with charts.
book = Workbook()
sheet = book.active
Create a new workbook.
rows = [
("USA", 46),
("China", 38),
("UK", 29),
("Russia", 22),
("South Korea", 13),
("Germany", 11)
]
for row in rows:
sheet.append(row)
We create some data and append it to the cells of the active worksheet.
data = Reference(sheet, min_col=2, min_row=1, max_col=2, max_row=6)
With the Reference
class, we reference the row in the table that represents the data. In our case, these are the number of Olympic gold medals.
categs = Reference(sheet, min_col=1, min_row=1, max_row=6)
We create a category axis. A category axis is an axis that treats data as a series of non-numeric text labels. In our case, we have text labels representing country names.
chart = BarChart()
chart.add_data(data=data)
chart.set_categories(categs)
We create a bar chart and set its data and categories.
chart.legend = None
chart.y_axis.majorGridlines = None
Use the legend
and majorGridlines
properties to turn off the legend and major gridlines.
chart.varyColors = True
Setting varyColors
to True
gives each bar a different color.
chart.title = "Olympic Gold Medals in London"
Set a title for the chart.
sheet.add_chart(chart, "A8")
Use the add_chart()
method to add the created chart to the worksheet.
In this tutorial, we used the openpyxl library. We read data from and wrote data to Excel files.
You may also be interested in the following related tutorials: Python Tutorial, Python CSV Tutorial, Python SimpleJSON Tutorial, and Python List Comprehensions.