Openpyxl Tutorial

This article shows how to use the openpyxl library to work with Excel files in Python. openpyxl is a Python library for reading and writing Excel 2010 xlsx / xlsm / xltx / xltm files. (Python Tutorial)

Excel xlsx

In this tutorial, we use xlsx files. xlsx is the file extension for the open XML spreadsheet file format used by Microsoft Excel. xlsm files support macros. xlsx is a proprietary binary format, while xlsx is based on the Office Open XML Formats.

$ sudo pip3 install openpyxl

We use the pip3 tool to install openpyxl.

Openpyxl Creates a New File

In this first example, we use openpyxl to create a new xlsx file.

write_xlsx.py

#!/usr/bin/env python

from openpyxl import Workbook
import time

book = Workbook()
sheet = book.active

sheet['A1'] = 56
sheet['A2'] = 43

now = time.strftime("%x")
sheet['A3'] = now

book.save("sample.xlsx")

In this example, we create a new xlsx file. We write data into three cells.

from openpyxl import Workbook

From the openpyxl module, we import the Workbook class. A workbook is a container for all other parts of a document.

book = Workbook()

We create a new workbook. A workbook is always created with at least one worksheet.

sheet = book.active

We get a reference to the active worksheet.

sheet['A1'] = 56
sheet['A2'] = 43

We write numeric data to cells A1 and A2.

now = time.strftime("%x")
sheet['A3'] = now

We write the current date to cell A3.

book.save("sample.xlsx")

We use the save() method to write the contents to the sample.xlsx file.

Python OpenPyXL Tutorial

Openpyxl Writing Cells

There are two basic ways to write to cells: using the worksheet’s keys (such as A1 or D3), or using row and column notation with the cell() method.

write2cell.py

#!/usr/bin/env python

from openpyxl import Workbook

book = Workbook()
sheet = book.active

sheet['A1'] = 1
sheet.cell(row=2, column=2).value = 2

book.save('write2cell.xlsx')

In this example, we write two values to two cells.

sheet['A1'] = 1

Here, we assign a value to cell A1.

sheet.cell(row=2, column=2).value = 2

In this line, we write to cell B2 using row and column notation.

Openpyxl Appending Values

Using the append() method, we can append a set of values to the bottom of the current worksheet.

appending_values.py

#!/usr/bin/env python

from openpyxl import Workbook

book = Workbook()
sheet = book.active

rows = (
(88, 46, 57),
(89, 38, 12),
(23, 59, 78),
(56, 21, 98),
(24, 18, 43),
(34, 15, 67)
)

for row in rows:
sheet.append(row)

book.save('appending.xlsx')

In this example, we append three columns of data to the current worksheet.

rows = (
(88, 46, 57),
(89, 38, 12),
(23, 59, 78),
(56, 21, 98),
(24, 18, 43),
(34, 15, 67)
)

The data is stored in tuples of tuples.

for row in rows:
sheet.append(row)

We go through the container row by row and insert rows of data using the append() method.

OpenPyXL Reading Cells

In the following example, we read previously written data from the sample.xlsx file.

read_cells.py

#!/usr/bin/env python

import openpyxl

book = openpyxl.load_workbook('sample.xlsx')

sheet = book.active

a1 = sheet['A1']
a2 = sheet['A2']
a3 = sheet.cell(row=3, column=1)

print(a1.value)
print(a2.value)
print(a3.value)

This example loads an existing xlsx file and reads three cells.

book = openpyxl.load_workbook('sample.xlsx')

Use the load_workbook() method to open the file.

a1 = sheet['A1']
a2 = sheet['A2']
a3 = sheet.cell(row=3, column=1)

We read the contents of cells A1, A2, and A3. In the third row, we use the cell() method to get the value of cell A3.

$ ./read_cells.py 
56
43
10/26/16

This is the output of the example.

OpenPyXL Reading Multiple Cells

We have the following data table:

Python OpenPyXL Tutorial

We use the range operator to read the data.

read_cells2.py

#!/usr/bin/env python

import openpyxl

book = openpyxl.load_workbook('items.xlsx')

sheet = book.active

cells = sheet['A1': 'B6']

for c1, c2 in cells:
print("{0:8} {1:8}".format(c1.value, c2.value))

In this example, we use range operations to read data from two columns.

cells = sheet['A1': 'B6']

In this row, we read data from cells A1-B6.

for c1, c2 in cells:
print("{0:8} {1:8}".format(c1.value, c2.value))

The

format() function is used to neatly output data to the console.

$ ./read_cells2.py
Items Quantity
coins 23
chairs 3
pencils 5
bottles 8
books 30

Openpyxl Iteration by Row

The

iter_rows() method returns the cells in a worksheet as rows.

iterating_by_rows.py

#!/usr/bin/env python

from openpyxl import Workbook

book = Workbook()
sheet = book.active

rows = (
    (88, 46, 57),
    (89, 38, 12),
    (23, 59, 78),
    (56, 21, 98),
    (24, 18, 43),
    (34, 15, 67)
)

for row in rows:
    sheet.append(row)

for row in sheet.iter_rows(min_row=1, min_col=1, max_row=6, max_col=3):
    for cell in row:
        print(cell.value, end=" ")
    print()

book.save('iterbyrows.xlsx')

This example iterates over the data row by row.

for row in sheet.iter_rows(min_row=1, min_col=1, max_row=6, max_col=3):

We provide bounds for the iteration.

$ ./iterating_by_rows.py 
88 46 57 
89 38 12 
23 59 78 
56 21 98 
24 18 43 
34 15 67

Openpyxl Iterating by Columns

iter_cols() method returns the cells in a worksheet as columns.

iterating_by_columns.py

#!/usr/bin/env python

from openpyxl import Workbook

book = Workbook()
sheet = book.active

rows = (
    (88, 46, 57),
    (89, 38, 12),
    (23, 59, 78),
    (56, 21, 98),
    (24, 18, 43),
    (34, 15, 67)
)

for row in rows:
    sheet.append(row)

for row in sheet.iter_cols(min_row=1, min_col=1, max_row=6, max_col=3):
    for cell in row:
        print(cell.value, end=" ")
    print()

book.save('iterbycols.xlsx')

This example iterates through the data column by column.

$ ./iterating_by_columns.py 
88 89 23 56 24 34 
46 38 59 21 18 15 
57 12 78 98 43 67

Statistics

For the next example, we need to create an xlsx file containing numbers. For example, we use the RANDBETWEEN() function to create 25 rows of numbers in 10 columns.

mystats.py

#!/usr/bin/env python

import openpyxl
import statistics as stats

book = openpyxl.load_workbook('numbers.xlsx', data_only=True)

sheet = book.active

rows = sheet.rows

values = []

for row in rows:
    for cell in row:
        values.append(cell.value)

print("Number of values: {0}".format(len(values)))
print("Sum of values: {0}".format(sum(values)))
print("Minimum value: {0}".format(min(values)))
print("Maximum value: {0}".format(max(values)))
print("Mean: {0}".format(stats.mean(values)))
print("Median: {0}".format(stats.median(values)))
print("Standard deviation: {0}".format(stats.stdev(values)))
print("Variance: {0}".format(stats.variance(values)))

In this example, we read all the values from the worksheet and calculate some basic statistics.

import statistics as stats

Import the statistics module to provide some statistical functions, such as median and variance.

book = openpyxl.load_workbook('numbers.xlsx', data_only=True)

Using the data_only option, we get the values from the cells rather than the formulas.

rows = sheet.rows

We get all rows of cells that are not empty.

for row in rows:
for cell in row:
values.append(cell.value)

In the two for loops, we form a list of integer values from the cells.

print("Number of values: {0}".format(len(values)))
print("Sum of values: {0}".format(sum(values)))
print("Minimum value: {0}".format(min(values)))
print("Maximum value: {0}".format(max(values)))
print("Mean: {0}".format(stats.mean(values)))
print("Median: {0}".format(stats.median(values)))
print("Standard deviation: {0}".format(stats.stdev(values)))
print("Variance: {0}".format(stats.variance(values)))

We calculate and print mathematical statistics about the values. Some functions are built-in, while others are imported through the statistics module.

$ ./mystats.py
Number of values: 312
Sum of values: 15877
Minimum value: 0
Maximum value: 100
Mean: 50.88782051282051
Median: 54.0
Standard deviation: 28.459203819700967
Variance: 809.9262820512821

Openpyxl Filtering & Sorting Data

Sheets have an auto_filter attribute that allows for setting filtering and sorting conditions.

Note that Openpyxl sets the conditions, but we have to apply them in the spreadsheet application.

filter_sort.py

#!/usr/bin/env python

from openpyxl import Workbook

wb = Workbook()
sheet = wb.active

data = [
    ['Item', 'Colour'],
    ['pen', 'brown'],
    ['book', 'black'],
    ['plate', 'white'],
    ['chair', 'brown'],
    ['coin', 'gold'],
    ['bed', 'brown'],
    ['notebook', 'white'],
]

for r in data:
    sheet.append(r)

sheet.auto_filter.ref = 'A1:B8'
sheet.auto_filter.add_filter_column(1, ['brown', 'white'])
sheet.auto_filter.add_sort_condition('B2:B8')

wb.save('filtered.xlsx')

In this example, we create a worksheet containing items and their colors. We set a filter and a sort condition.

Openpyxl Dimensions

To get to the cells that actually contain data, we can use dimensions.

dimensions.py

#!/usr/bin/env python

from openpyxl import Workbook

book = Workbook()
sheet = book.active

sheet['A3'] = 39
sheet['B3'] = 19

rows = [
    (88, 46),
    (89, 38),
    (23, 59),
    (56, 21),
    (24, 18),
    (34, 15)
]

for row in rows:
    sheet.append(row)

print(sheet.dimensions)
print("Minimum row: {0}".format(sheet.min_row))
print("Maximum row: {0}".format(sheet.max_row))
print("Minimum column: {0}".format(sheet.min_column))
print("Maximum column: {0}".format(sheet.max_column))

for c1, c2 in sheet[sheet.dimensions]:
print(c1.value, c2.value)

book.save('dimensions.xlsx')

This example calculates the dimensions of two columns of data.

sheet['A3'] = 39
sheet['B3'] = 19

rows = [
(88, 46),
(89, 38),
(23, 59),
(56, 21),
(24, 18),
(34, 15)
]

for row in rows:
sheet.append(row)

We add data to the worksheet. Note that we start adding rows starting with the third one.

print(sheet.dimensions)

The

dimensions property returns the top-left and bottom-right corners of a range of non-empty cells.

print("Minimum row: {0}".format(sheet.min_row))
print("Maximum row: {0}".format(sheet.max_row))

Using the min_row and max_row properties, we can get the minimum and maximum rows containing data.

print("Minimum column: {0}".format(sheet.min_column))

print("Maximum column: {0}".format(sheet.max_column))

Through the min_column and max_column attributes, we get the minimum and maximum columns containing data.

for c1, c2 in sheet[sheet.dimensions]:
print(c1.value, c2.value)

We loop through the data and print it to the console.

$ ./dimensions.py
A3:B9
Minimum row: 3
Maximum row: 9
Minimum column: 1
Maximum column: 2
39 19
88 46
89 38
23 59
56 21
24 18
34 15

Worksheets

Each workbook can have multiple worksheets.

Python OpenPyXL Tutorial

Figure: Sheets

Let’s have a workbook with these three sheets.

sheets.py

#!/usr/bin/env python

import openpyxl

book = openpyxl.load_workbook('sheets.xlsx')

print(book.get_sheet_names())

active_sheet = book.active
print(type(active_sheet))

sheet = book.get_sheet_by_name("March")
print(sheet.title)

This program can be used with Excel worksheets.

print(book.get_sheet_names())

The

get_sheet_names() method returns the names of the available worksheets in a workbook.

active_sheet = book.active
print(type(active_sheet))

We get the active sheet and print its type to the terminal.

sheet = book.get_sheet_by_name("March")

We use the get_sheet_by_name() method to get a reference to the worksheet.

print(sheet.title)

The title of the retrieved worksheet is printed to the terminal.

$ ./sheets.py
['January', 'February', 'March']
<class 'openpyxl.worksheet.worksheet.Worksheet'>
March

This is the output of the program.

sheets2.py

#!/usr/bin/env python

import openpyxl

book = openpyxl.load_workbook('sheets.xlsx')

book.create_sheet("April")

print(book.sheetnames)

sheet1 = book.get_sheet_by_name("January")

book.remove_sheet(sheet1)

print(book.sheetnames)

book.create_sheet("January", 0)
print(book.sheetnames)

book.save('sheets2.xlsx')

In this example, we create a new worksheet.

book.create_sheet("April")

Use the create_sheet() method to create a new sheet.

print(book.sheetnames)

Sheet names can also be displayed using the sheetnames property.

book.remove_sheet(sheet1)

Sheets can be removed using the remove_sheet() method.

book.create_sheet("January", 0)

You can create a new sheet at a specified location. In this example, we create a new sheet at index 0.

$ ./sheets2.py
['January', 'February', 'March', 'April']
['February', 'March', 'April']
['January', 'February', 'March', 'April']

You can change the background color of a worksheet.

sheets3.py

#!/usr/bin/env python

import openpyxl

book = openpyxl.load_workbook('sheets.xlsx')

sheet = book.get_sheet_by_name("March")
sheet.sheet_properties.tabColor = "0072BA"

book.save('sheets3.xlsx')

This example changes the background color of the worksheet titled “March.”

sheet.sheet_properties.tabColor = "0072BA"

We change the tabColor property to a new color.

Python OpenPyXL Tutorial

The background color of the third worksheet has been changed to a shade of blue.

Merging Cells

Cells can be merged using the merge_cells() method, but not using the unmerge_cells() method. When merging cells, all cells except the top-left corner are removed from the worksheet.

merging_cells.py

#!/usr/bin/env python

from openpyxl import Workbook
from openpyxl.styles import Alignment

book = Workbook()
sheet = book.active

sheet.merge_cells('A1:B2')

cell = sheet.cell(row=1, column=1)
cell.value = 'Sunny day'
cell.alignment = Alignment(horizontal='center', vertical='center')

book.save('merging.xlsx')

In this example, we merge four cells: A1, B1, A2, and B2. The text in the last cell is centered.

from openpyxl.styles import Alignment

To center the text in the last cell, we use the Alignment class from the openpyxl.styles module.

sheet.merge_cells('A1:B2')

We merge four cells using the merge_cells() method.

cell = sheet.cell(row=1, column=1)

We now have the last cell.

cell.value = 'Sunny day'
cell.alignment = Alignment(horizontal='center', vertical='center')

We set the text to the merged cell and update its alignment.

Python OpenPyXL Tutorial

Openpyxl Freeze Panes

When freezing panes, we keep an area of the worksheet visible while scrolling to another area of the worksheet.

freezing.py

#!/usr/bin/env python

from openpyxl import Workbook
from openpyxl.styles import Alignment

book = Workbook()
sheet = book.active

sheet.freeze_panes = 'B2'

book.save('freezing.xlsx')

This example freezes the panes at cell B2.

sheet.freeze_panes = 'B2'

To freeze the panes, we use the freeze_panes property.

Openpyxl Formulas

The next example shows how to use formulas. openpyxl does not perform calculations; it writes the formulas to the cells.

formulas.py

#!/usr/bin/env python

from openpyxl import Workbook

book = Workbook()
sheet = book.active

rows = (
    (34, 26),
    (88, 36),
    (24, 29),
    (15, 22),
    (56, 13),
    (76, 18)
)

for row in rows:
    sheet.append(row)

cell = sheet.cell(row=7, column=2)
cell.value = "=SUM(A1:B6)"
cell.font = cell.font.copy(bold=True)

book.save('formulas.xlsx')

In this example, we use the SUM() function to calculate the sum of all values and display the output in bold.

rows = (
(34, 26),
(88, 36),
(24, 29),
(15, 22),
(56, 13),
(76, 18)
)

for row in rows:
sheet.append(row)

We create two columns of data.

cell = sheet.cell(row=7, column=2)

We get the cell that displays the calculation result.

cell.value = "=SUM(A1:B6)"

We write a formula into a cell.

cell.font = cell.font.copy(bold=True)

We change the font style.

Python OpenPyXL Tutorial

OpenPyXL Images

In the following example, we show how to insert an image into a worksheet.

write_image.py

#!/usr/bin/env python

from openpyxl import Workbook
from openpyxl.drawing.image import Image

book = Workbook()
sheet = book.active

img = Image("icesid.png")
sheet['A1'] = 'This is Sid'

sheet.add_image(img, 'B2')

book.save("sheet_image.xlsx")

In this example, we write an image to a sheet of paper.

from openpyxl.drawing.image import Image

We use the Image class from the openpyxl.drawing.image module.

img = Image("icesid.png")

This creates a new Image class. The icesid.png image is located in the current working directory.

sheet.add_image(img, 'B2')

We use the add_image() method to add a new image.

Openpyxl Charts

openpyxl supports creating a variety of charts, including bar, line, area, bubble, scatter, and pie charts.

According to the documentation, openpyxl only supports creating charts within a worksheet. Existing charts in the workbook will be lost.

create_bar_chart.py

#!/usr/bin/env python

from openpyxl import Workbook
from openpyxl.chart import (
    Reference,
    Series,
    BarChart
)

book = Workbook()
sheet = book.active

rows = [
    ("USA", 46),
    ("China", 38),
    ("UK", 29),
    ("Russia", 22),
    ("South Korea", 13),
    ("Germany", 11)
]

for row in rows:
    sheet.append(row)

data = Reference(sheet, min_col=2, min_row=1, max_col=2, max_row=6)
categories = Reference(sheet, min_col=1, min_row=1, max_row=6)

chart = BarChart()
chart.add_data(data=data)
chart.set_categories(categs)

chart.legend = None
chart.y_axis.majorGridlines = None
chart.varyColors = True
chart.title = "Olympic Gold Medals in London"

sheet.add_chart(chart, "A8")

book.save("bar_chart.xlsx")

In this example, we create a bar chart showing the number of Olympic gold medals for each country in London 2012.

from openpyxl.chart import (
Reference,
Series,
BarChart
)

openpyxl.chart The module has tools for working with charts.

book = Workbook()
sheet = book.active

Create a new workbook.

rows = [
("USA", 46),
("China", 38),
("UK", 29),
("Russia", 22),
("South Korea", 13),
("Germany", 11)
]

for row in rows:
sheet.append(row)

We create some data and append it to the cells of the active worksheet.

data = Reference(sheet, min_col=2, min_row=1, max_col=2, max_row=6)

With the Reference class, we reference the row in the table that represents the data. In our case, these are the number of Olympic gold medals.

categs = Reference(sheet, min_col=1, min_row=1, max_row=6)

We create a category axis. A category axis is an axis that treats data as a series of non-numeric text labels. In our case, we have text labels representing country names.

chart = BarChart()
chart.add_data(data=data)
chart.set_categories(categs)

We create a bar chart and set its data and categories.

chart.legend = None
chart.y_axis.majorGridlines = None

Use the legend and majorGridlines properties to turn off the legend and major gridlines.

chart.varyColors = True

Setting varyColors to True gives each bar a different color.

chart.title = "Olympic Gold Medals in London"

Set a title for the chart.

sheet.add_chart(chart, "A8")

Use the add_chart() method to add the created chart to the worksheet.

Python OpenPyXL Tutorial

In this tutorial, we used the openpyxl library. We read data from and wrote data to Excel files.

You may also be interested in the following related tutorials: Python Tutorial, Python CSV Tutorial, Python SimpleJSON Tutorial, and Python List Comprehensions.

Leave a Reply

Your email address will not be published. Required fields are marked *