tarfile – Tar archive access

Purpose:Tar archive access.
Available In:2.3 and later

The tarfile module provides read and write access to UNIX tar archives, including compressed files. In addition to the POSIX standards, several GNU tar extensions are supported. Various UNIX special file types (hard and soft links, device nodes, etc.) are also handled.

Testing Tar Files

The is_tarfile() function returns a boolean indicating whether or not the filename passed as an argument refers to a valid tar file.

import tarfile

for filename in [ 'README.txt', 'example.tar', 
                  'bad_example.tar', 'notthere.tar' ]:
    try:
        print '%20s  %s' % (filename, tarfile.is_tarfile(filename))
    except IOError, err:
        print '%20s  %s' % (filename, err)

If the file does not exist, is_tarfile() raises an IOError.

$ python tarfile_is_tarfile.py
          README.txt  False
         example.tar  True
     bad_example.tar  False
        notthere.tar  [Errno 2] No such file or directory: 'notthere.tar'

Reading Meta-data from an Archive

Use the TarFile class to work directly with a tar archive. It supports methods for reading data about existing archives as well as modifying the archives by adding additional files.

To read the names of the files in an existing archive, use getnames():

import tarfile

t = tarfile.open('example.tar', 'r')
print t.getnames()

The return value is a list of strings with the names of the archive contents:

$ python tarfile_getnames.py
['README.txt']

In addition to names, meta-data about the archive members is available as instances of TarInfo objects. Load the meta-data via getmembers() and getmember().

import tarfile
import time

t = tarfile.open('example.tar', 'r')
for member_info in t.getmembers():
    print member_info.name
    print '\tModified:\t', time.ctime(member_info.mtime)
    print '\tMode    :\t', oct(member_info.mode)
    print '\tType    :\t', member_info.type
    print '\tSize    :\t', member_info.size, 'bytes'
    print
$ python tarfile_getmembers.py
README.txt
        Modified:       Sun Feb 22 11:13:55 2009
        Mode    :       0644
        Type    :       0
        Size    :       75 bytes

If you know in advance the name of the archive member, you can retrieve its TarInfo object with getmember().

import tarfile
import time

t = tarfile.open('example.tar', 'r')
for filename in [ 'README.txt', 'notthere.txt' ]:
    try:
        info = t.getmember(filename)
    except KeyError:
        print 'ERROR: Did not find %s in tar archive' % filename
    else:
        print '%s is %d bytes' % (info.name, info.size)

If the archive member is not present, getmember() raises a KeyError.

$ python tarfile_getmember.py
README.txt is 75 bytes
ERROR: Did not find notthere.txt in tar archive

Extracting Files From an Archive

To access the data from an archive member within your program, use the extractfile() method, passing the member’s name.

import tarfile

t = tarfile.open('example.tar', 'r')
for filename in [ 'README.txt', 'notthere.txt' ]:
    try:
        f = t.extractfile(filename)
    except KeyError:
        print 'ERROR: Did not find %s in tar archive' % filename
    else:
        print filename, ':', f.read()
$ python tarfile_extractfile.py
README.txt : The examples for the tarfile module use this file and example.tar as data.

ERROR: Did not find notthere.txt in tar archive

If you just want to unpack the archive and write the files to the filesystem, use extract() or extractall() instead.

import tarfile
import os

os.mkdir('outdir')
t = tarfile.open('example.tar', 'r')
t.extract('README.txt', 'outdir')
print os.listdir('outdir')
$ python tarfile_extract.py
['README.txt']

Note

The standard library documentation includes a note stating that extractall() is safer than extract(), and it should be used in most cases.

import tarfile
import os

os.mkdir('outdir')
t = tarfile.open('example.tar', 'r')
t.extractall('outdir')
print os.listdir('outdir')
$ python tarfile_extractall.py
['README.txt']

If you only want to extract certain files from the archive, their names can be passed to extractall().

import tarfile
import os

os.mkdir('outdir')
t = tarfile.open('example.tar', 'r')
t.extractall('outdir', members=[t.getmember('README.txt')])
print os.listdir('outdir')
$ python tarfile_extractall_members.py
['README.txt']

Creating New Archives

To create a new archive, simply open the TarFile with a mode of 'w'. Any existing file is truncated and a new archive is started. To add files, use the add() method.

import tarfile

print 'creating archive'
out = tarfile.open('tarfile_add.tar', mode='w')
try:
    print 'adding README.txt'
    out.add('README.txt')
finally:
    print 'closing'
    out.close()

print
print 'Contents:'
t = tarfile.open('tarfile_add.tar', 'r')
for member_info in t.getmembers():
    print member_info.name
$ python tarfile_add.py
creating archive
adding README.txt
closing

Contents:
README.txt

Using Alternate Archive Member Names

It is possible to add a file to an archive using a name other than the original file name, by constructing a TarInfo object with an alternate arcname and passing it to addfile().

import tarfile

print 'creating archive'
out = tarfile.open('tarfile_addfile.tar', mode='w')
try:
    print 'adding README.txt as RENAMED.txt'
    info = out.gettarinfo('README.txt', arcname='RENAMED.txt')
    out.addfile(info)
finally:
    print 'closing'
    out.close()

print
print 'Contents:'
t = tarfile.open('tarfile_addfile.tar', 'r')
for member_info in t.getmembers():
    print member_info.name

The archive includes only the changed filename:

$ python tarfile_addfile.py
creating archive
adding README.txt as RENAMED.txt
closing

Contents:
RENAMED.txt

Writing Data from Sources Other Than Files

Sometimes you want to write data to an archive but the data is not in a file on the filesystem. Rather than writing the data to a file, then adding that file to the archive, you can use addfile() to add data from an open file-like handle.

import tarfile
from cStringIO import StringIO

data = 'This is the data to write to the archive.'

out = tarfile.open('tarfile_addfile_string.tar', mode='w')
try:
    info = tarfile.TarInfo('made_up_file.txt')
    info.size = len(data)
    out.addfile(info, StringIO(data))
finally:
    out.close()

print
print 'Contents:'
t = tarfile.open('tarfile_addfile_string.tar', 'r')
for member_info in t.getmembers():
    print member_info.name
    f = t.extractfile(member_info)
    print f.read()

By first constructing a TarInfo object ourselves, we can give the archive member any name we wish. After setting the size, we can write the data to the archive using addfile() and passing a StringIO buffer as a source of the data.

$ python tarfile_addfile_string.py

Contents:
made_up_file.txt
This is the data to write to the archive.

Appending to Archives

In addition to creating new archives, it is possible to append to an existing file. To open a file to append to it, use mode 'a'.

import tarfile

print 'creating archive'
out = tarfile.open('tarfile_append.tar', mode='w')
try:
    out.add('README.txt')
finally:
    out.close()

print 'contents:', [m.name 
                    for m in tarfile.open('tarfile_append.tar', 'r').getmembers()]

print 'adding index.rst'
out = tarfile.open('tarfile_append.tar', mode='a')
try:
    out.add('index.rst')
finally:
    out.close()

print 'contents:', [m.name 
                    for m in tarfile.open('tarfile_append.tar', 'r').getmembers()]

The resulting archive ends up with two members:

$ python tarfile_append.py
creating archive
contents: ['README.txt']
adding index.rst
contents: ['README.txt', 'index.rst']

Working with Compressed Archives

Besides regular tar archive files, the tarfile module can work with archives compressed via the gzip or bzip2 protocols. To open a compressed archive, modify the mode string passed to open() to include ":gz" or ":bz2", depending on the compression method you want to use.

import tarfile
import os

fmt = '%-30s %-10s'
print fmt % ('FILENAME', 'SIZE')
print fmt % ('README.txt', os.stat('README.txt').st_size)

for filename, write_mode in [
    ('tarfile_compression.tar', 'w'),
    ('tarfile_compression.tar.gz', 'w:gz'),
    ('tarfile_compression.tar.bz2', 'w:bz2'),
    ]:
    out = tarfile.open(filename, mode=write_mode)
    try:
        out.add('README.txt')
    finally:
        out.close()

    print fmt % (filename, os.stat(filename).st_size),
    print [m.name for m in tarfile.open(filename, 'r:*').getmembers()]

When opening an existing archive for reading, you can specify "r:*" to have tarfile determine the compression method to use automatically.

$ python tarfile_compression.py
FILENAME                       SIZE
README.txt                     75
tarfile_compression.tar        10240      ['README.txt']
tarfile_compression.tar.gz     212        ['README.txt']
tarfile_compression.tar.bz2    190        ['README.txt']

See also

tarfile
The standard library documentation for this module.
GNU tar manual
Documentation of the tar format, including extensions.
zipfile
Similar access for ZIP archives.
gzip
GNU zip compression
bz2
bzip2 compression
Bookmark and Share