Python tarfile module
Python tarfile Module
The ‘tar’ utility was originally introduced for the UNIX operating system. Its purpose was to collect multiple files into a single archive file, called a tarball, making it easier to distribute files. The tarfile module in Python’s standard library provides functions that help create tar archives and extract them from tarballs as needed. These archives can be constructed using gzip, bz2, and lzma compression, or with no compression at all.
The open() function defined in this module is used to write to or read from tar files.
Open() Function
This function returns a TarFile object corresponding to the filename provided as an argument. The function also takes another argument called mode, which defaults to ‘r’, indicating no compression. Other modes are as follows:
Sequence Number | Mode and Operation |
---|---|
1 | ‘r’ or ‘r:*’ opens the file for reading with transparent compression. |
2 | ‘r:’ opens the file for reading without compression. |
3 | ‘r:gz’ opens the file for reading with gzip compression. |
4 | ‘r:bz2’ opens the file for reading with bzip2 compression. |
5 | ‘r:xz’ opens the file for reading with lzma compression. |
6 | ‘x’ or ‘x:’ creates a file without compression. |
7 | ‘x:gz’ creates a file with gzip compression. |
8 | ‘x:bz2’ creates a file with bzip2 compression. |
9 | ‘x:xz’ creates a file with lzma compression. |
10 | ‘a’ or ‘a:’ opens the file for appending without compression. |
11 | ‘w’ or ‘w:’ opens the file for writing without compression. |
12 | ‘w:gz’ opens the file for writing with gzip compression. |
13 | ‘w:bz2’ opens the file for writing with bzip2 compression. |
14 | ‘w:xz’ opens the file for writing with lzma compression. |
This module defines the TarFile class. TarFile objects can be instantiated by calling the constructor instead of the open() function.
TarFile() Method
This constructor also requires a filename and mode parameters. The possible values for the mode parameter are shown above.
Other methods in this class are as follows:
add() Method
This method adds a file to the archive. This method requires a name, which can be a filename, directory, symbolic link, shortcut, etc. By default, directories are added recursively. To prevent recursive addition, set the recurse parameter to False.
addfile() Method
This method adds a TarInfo object to the archive.
extractall() Method
This method extracts all members of the archive to the current path if no other path is explicitly provided.
extract() Method
This method extracts the specified member to the given path, which defaults to the current path.
The following example opens a tar file for compression using the gzip algorithm and adds a file to it.
import tarfile
fp=tarfile.open("zen.tar.gz","w:gz")
fp.add("zen.txt")
fp.close()
Assuming a ‘zen.txt’ file exists in the current working directory, it will be added to the ‘zen.tar.gz’ file.
The following code extracts files from a tar archive, extracts all files (in this case, only one file), and places them in the current folder. To verify the result, you can delete or rename ‘zen.txt’ in the current folder.
import tarfile
fp=tarfile.open("zen.tar.gz","r:gz")
fp.extractall()
fp.close()
You will find the file “zen.txt” in the current directory.
To create a tar file containing all the files in the current directory, use the following code –
import tarfile, glob
fp=tarfile.open('file.tar','w')
for file in glob.glob('*.*'):
fp.add(file)
fp.close()
Command Line Interface
You can create and extract tar files using the command line interface. For example, you can add the ‘lines.txt’ file to a tar archive by executing the following command in a command window:
C:python311 >python -m tarfile -c line.tar lines.txt
The following command-line options are available.
-l or –list | Lists the files in the tar archive. |
---|---|
-c or –create | Creates a compressed file from the source files. |
-e or –extract | Extracts the compressed file to the current directory, if output_dir is not specified. |
-t or –test | Tests whether the compressed file is valid. |
-v or –verbose | Verbose output. |
The following command extracts line.tar to the newdir folder in the current directory.
C:python311>python -m tarfile -e line.tar newdir/
The following command line will list all the files in a tar archive.
C:python311>python -m tarfile -l files.tar