gc – Garbage Collector¶
Purpose: | Manages memory used by Python objects |
---|---|
Available In: | 2.1+ |
gc exposes the underlying memory management mechanism of Python, the automatic garbage collector. The module includes functions for controlling how the collector operates and to examine the objects known to the system, either pending collection or stuck in reference cycles and unable to be freed.
Tracing References¶
With gc you can use the incoming and outgoing references between objects to find cycles in complex data structures. If you know the data structure with the cycle, you can construct custom code to examine its properties. If not, you can the get_referents() and get_referrers() functions to build generic debugging tools.
For example, get_referents() shows the objects referred to by the input arguments.
import gc
import pprint
class Graph(object):
def __init__(self, name):
self.name = name
self.next = None
def set_next(self, next):
print 'Linking nodes %s.next = %s' % (self, next)
self.next = next
def __repr__(self):
return '%s(%s)' % (self.__class__.__name__, self.name)
# Construct a graph cycle
one = Graph('one')
two = Graph('two')
three = Graph('three')
one.set_next(two)
two.set_next(three)
three.set_next(one)
print
print 'three refers to:'
for r in gc.get_referents(three):
pprint.pprint(r)
In this case, the Graph instance three holds references to its instance dictionary (in the __dict__ attribute) and its class.
$ python gc_get_referents.py
Linking nodes Graph(one).next = Graph(two)
Linking nodes Graph(two).next = Graph(three)
Linking nodes Graph(three).next = Graph(one)
three refers to:
{'name': 'three', 'next': Graph(one)}
<class '__main__.Graph'>
This example uses a Queue to perform a breadth-first traversal of all of the object references looking for cycles. The items inserted into the queue are tuples containing the reference chain so far and the next object to examine. It starts with three, and look at everything it refers to. Skipping classes lets us avoid looking at methods, modules, etc.
import gc
import pprint
import Queue
class Graph(object):
def __init__(self, name):
self.name = name
self.next = None
def set_next(self, next):
print 'Linking nodes %s.next = %s' % (self, next)
self.next = next
def __repr__(self):
return '%s(%s)' % (self.__class__.__name__, self.name)
# Construct a graph cycle
one = Graph('one')
two = Graph('two')
three = Graph('three')
one.set_next(two)
two.set_next(three)
three.set_next(one)
print
seen = set()
to_process = Queue.Queue()
# Start with an empty object chain and Graph three.
to_process.put( ([], three) )
# Look for cycles, building the object chain for each object we find
# in the queue so we can print the full cycle when we're done.
while not to_process.empty():
chain, next = to_process.get()
chain = chain[:]
chain.append(next)
print 'Examining:', repr(next)
seen.add(id(next))
for r in gc.get_referents(next):
if isinstance(r, basestring) or isinstance(r, type):
# Ignore strings and classes
pass
elif id(r) in seen:
print
print 'Found a cycle to %s:' % r
for i, link in enumerate(chain):
print ' %d: ' % i,
pprint.pprint(link)
else:
to_process.put( (chain, r) )
The cycle in the nodes is easily found by watching for objects that have already been processed. To avoid holding references to those objects, their id() values are cached in a set. The dictionary objects found in the cycle are the __dict__ values for the Graph instances, and hold their instance attributes.
$ python gc_get_referents_cycles.py
Linking nodes Graph(one).next = Graph(two)
Linking nodes Graph(two).next = Graph(three)
Linking nodes Graph(three).next = Graph(one)
Examining: Graph(three)
Examining: {'name': 'three', 'next': Graph(one)}
Examining: Graph(one)
Examining: {'name': 'one', 'next': Graph(two)}
Examining: Graph(two)
Examining: {'name': 'two', 'next': Graph(three)}
Found a cycle to Graph(three):
0: Graph(three)
1: {'name': 'three', 'next': Graph(one)}
2: Graph(one)
3: {'name': 'one', 'next': Graph(two)}
4: Graph(two)
5: {'name': 'two', 'next': Graph(three)}
Forcing Garbage Collection¶
Although the garbage collector runs automatically as the interpreter executes your program, you may want to trigger collection to run at a specific time when you know there are a lot of objects to free or there is not much work happening in your application. Trigger collection using collect().
import gc
import pprint
class Graph(object):
def __init__(self, name):
self.name = name
self.next = None
def set_next(self, next):
print 'Linking nodes %s.next = %s' % (self, next)
self.next = next
def __repr__(self):
return '%s(%s)' % (self.__class__.__name__, self.name)
# Construct a graph cycle
one = Graph('one')
two = Graph('two')
three = Graph('three')
one.set_next(two)
two.set_next(three)
three.set_next(one)
print
# Remove references to the graph nodes in this module's namespace
one = two = three = None
# Show the effect of garbage collection
for i in range(2):
print 'Collecting %d ...' % i
n = gc.collect()
print 'Unreachable objects:', n
print 'Remaining Garbage:',
pprint.pprint(gc.garbage)
print
In this example, the cycle is cleared as soon as collection runs the first time, since nothing refers to the Graph nodes except themselves. collect() returns the number of “unreachable” objects it found. In this case, the value is 6 because there are 3 objects with their instance attribute dictionaries.
$ python gc_collect.py
Linking nodes Graph(one).next = Graph(two)
Linking nodes Graph(two).next = Graph(three)
Linking nodes Graph(three).next = Graph(one)
Collecting 0 ...
Unreachable objects: 6
Remaining Garbage:[]
Collecting 1 ...
Unreachable objects: 0
Remaining Garbage:[]
If Graph has a __del__() method, however, the garbage collector cannot break the cycle.
import gc
import pprint
class Graph(object):
def __init__(self, name):
self.name = name
self.next = None
def set_next(self, next):
print '%s.next = %s' % (self, next)
self.next = next
def __repr__(self):
return '%s(%s)' % (self.__class__.__name__, self.name)
def __del__(self):
print '%s.__del__()' % self
# Construct a graph cycle
one = Graph('one')
two = Graph('two')
three = Graph('three')
one.set_next(two)
two.set_next(three)
three.set_next(one)
# Remove references to the graph nodes in this module's namespace
one = two = three = None
# Show the effect of garbage collection
print 'Collecting...'
n = gc.collect()
print 'Unreachable objects:', n
print 'Remaining Garbage:',
pprint.pprint(gc.garbage)
Because more than one object in the cycle has a finalizer method, the order in which the objects need to be finalized and then garbage collected cannot be determined, so the garbage collector plays it safe and keeps the objects.
$ python gc_collect_with_del.py
Graph(one).next = Graph(two)
Graph(two).next = Graph(three)
Graph(three).next = Graph(one)
Collecting...
Unreachable objects: 6
Remaining Garbage:[Graph(one), Graph(two), Graph(three)]
When the cycle is broken, the Graph instances can be collected.
import gc
import pprint
class Graph(object):
def __init__(self, name):
self.name = name
self.next = None
def set_next(self, next):
print 'Linking nodes %s.next = %s' % (self, next)
self.next = next
def __repr__(self):
return '%s(%s)' % (self.__class__.__name__, self.name)
def __del__(self):
print '%s.__del__()' % self
# Construct a graph cycle
one = Graph('one')
two = Graph('two')
three = Graph('three')
one.set_next(two)
two.set_next(three)
three.set_next(one)
# Remove references to the graph nodes in this module's namespace
one = two = three = None
# Collecting now keeps the objects as uncollectable
print
print 'Collecting...'
n = gc.collect()
print 'Unreachable objects:', n
print 'Remaining Garbage:',
pprint.pprint(gc.garbage)
# Break the cycle
print
print 'Breaking the cycle'
gc.garbage[0].set_next(None)
print 'Removing references in gc.garbage'
del gc.garbage[:]
# Now the objects are removed
print
print 'Collecting...'
n = gc.collect()
print 'Unreachable objects:', n
print 'Remaining Garbage:',
pprint.pprint(gc.garbage)
Because gc.garbage holds a reference to the objects from the previous garbage collection run, it needs to be cleared out after the cycle is broken to reduce the reference counts so they can be finalized and freed.
$ python gc_collect_break_cycle.py
Linking nodes Graph(one).next = Graph(two)
Linking nodes Graph(two).next = Graph(three)
Linking nodes Graph(three).next = Graph(one)
Collecting...
Unreachable objects: 6
Remaining Garbage:[Graph(one), Graph(two), Graph(three)]
Breaking the cycle
Linking nodes Graph(one).next = None
Removing references in gc.garbage
Graph(two).__del__()
Graph(three).__del__()
Graph(one).__del__()
Collecting...
Unreachable objects: 0
Remaining Garbage:[]
Finding References to Objects that Can’t be Collected¶
Looking for the object holding a reference to something in the garbage is a little trickier than seeing what an object references. Because the code asking about the reference needs to hold a reference itself, some of the referrers need to be ignored. This example creates a graph cycle, then works through the Graph instances in the garbage and removes the reference in the “parent” node.
import gc
import pprint
import Queue
class Graph(object):
def __init__(self, name):
self.name = name
self.next = None
def set_next(self, next):
print 'Linking nodes %s.next = %s' % (self, next)
self.next = next
def __repr__(self):
return '%s(%s)' % (self.__class__.__name__, self.name)
def __del__(self):
print '%s.__del__()' % self
# Construct two graph cycles
one = Graph('one')
two = Graph('two')
three = Graph('three')
one.set_next(two)
two.set_next(three)
three.set_next(one)
# Remove references to the graph nodes in this module's namespace
one = two = three = None
# Collecting now keeps the objects as uncollectable
print
print 'Collecting...'
n = gc.collect()
print 'Unreachable objects:', n
print 'Remaining Garbage:',
pprint.pprint(gc.garbage)
REFERRERS_TO_IGNORE = [ locals(), globals(), gc.garbage ]
def find_referring_graphs(obj):
print 'Looking for references to %s' % repr(obj)
referrers = (r for r in gc.get_referrers(obj)
if r not in REFERRERS_TO_IGNORE)
for ref in referrers:
if isinstance(ref, Graph):
# A graph node
yield ref
elif isinstance(ref, dict):
# An instance or other namespace dictionary
for parent in find_referring_graphs(ref):
yield parent
# Look for objects that refer to the objects that remain in
# gc.garbage.
print
print 'Clearing referrers:'
for obj in gc.garbage:
for ref in find_referring_graphs(obj):
ref.set_next(None)
del ref # remove local reference so the node can be deleted
del obj # remove local reference so the node can be deleted
# Clear references held by gc.garbage
print
print 'Clearing gc.garbage:'
del gc.garbage[:]
# Everything should have been freed this time
print
print 'Collecting...'
n = gc.collect()
print 'Unreachable objects:', n
print 'Remaining Garbage:',
pprint.pprint(gc.garbage)
This sort of logic is overkill if you understand why the cycles are being created in the first place, but if you have an unexplained cycle in your data using get_referrers() can expose the unexpected relationship.
$ python gc_get_referrers.py
Linking nodes Graph(one).next = Graph(two)
Linking nodes Graph(two).next = Graph(three)
Linking nodes Graph(three).next = Graph(one)
Collecting...
Unreachable objects: 6
Remaining Garbage:[Graph(one), Graph(two), Graph(three)]
Clearing referrers:
Looking for references to Graph(one)
Looking for references to {'name': 'three', 'next': Graph(one)}
Linking nodes Graph(three).next = None
Looking for references to Graph(two)
Looking for references to {'name': 'one', 'next': Graph(two)}
Linking nodes Graph(one).next = None
Looking for references to Graph(three)
Looking for references to {'name': 'two', 'next': Graph(three)}
Linking nodes Graph(two).next = None
Clearing gc.garbage:
Graph(three).__del__()
Graph(two).__del__()
Graph(one).__del__()
Collecting...
Unreachable objects: 0
Remaining Garbage:[]
Collection Thresholds and Generations¶
The garbage collector maintains three lists of objects it sees as it runs, one for each “generation” tracked by the collector. As objects are examined in each generation, they are either collected or they age into subsequent generations until they finally reach the stage where they are kept permanently.
The collector routines can be tuned to occur at different frequencies based on the difference between the number of object allocations and deallocations between runs. When the number of allocations minus the number of deallocations is greater than the threshold for the generation, the garbage collector is run. The current thresholds can be examined with get_threshold().
import gc
print gc.get_threshold()
The return value is a tuple with the threshold for each generation.
$ python gc_get_threshold.py
(700, 10, 10)
The thresholds can be changed with set_threshold(). This example program reads the threshold for generation 0 from the command line, adjusts the gc settings, then allocates a series of objects.
import gc
import pprint
import sys
try:
threshold = int(sys.argv[1])
except (IndexError, ValueError, TypeError):
print 'Missing or invalid threshold, using default'
threshold = 5
class MyObj(object):
def __init__(self, name):
self.name = name
print 'Created', self.name
gc.set_debug(gc.DEBUG_STATS)
gc.set_threshold(threshold, 1, 1)
print 'Thresholds:', gc.get_threshold()
print 'Clear the collector by forcing a run'
gc.collect()
print
print 'Creating objects'
objs = []
for i in range(10):
objs.append(MyObj(i))
Different threshold values introduce the garbage collection sweeps at different times, shown here because debugging is enabled.
$ python -u gc_threshold.py 5
Thresholds: (5, 1, 1)
Clear the collector by forcing a run
gc: collecting generation 2...
gc: objects in each generation: 144 3163 0
gc: done, 0.0005s elapsed.
Creating objects
gc: collecting generation 0...
gc: objects in each generation: 7 0 3234
gc: done, 0.0000s elapsed.
Created 0
Created 1
Created 2
Created 3
Created 4
gc: collecting generation 0...
gc: objects in each generation: 6 4 3234
gc: done, 0.0000s elapsed.
Created 5
Created 6
Created 7
Created 8
Created 9
gc: collecting generation 2...
gc: objects in each generation: 5 6 3232
gc: done, 0.0004s elapsed.
A smaller threshold causes the sweeps to run more frequently.
$ python -u gc_threshold.py 2
Thresholds: (2, 1, 1)
Clear the collector by forcing a run
gc: collecting generation 2...
gc: objects in each generation: 144 3163 0
gc: done, 0.0004s elapsed.
Creating objects
gc: collecting generation 0...
gc: objects in each generation: 3 0 3234
gc: done, 0.0000s elapsed.
gc: collecting generation 0...
gc: objects in each generation: 4 3 3234
gc: done, 0.0000s elapsed.
Created 0
Created 1
gc: collecting generation 1...
gc: objects in each generation: 3 4 3234
gc: done, 0.0000s elapsed.
Created 2
Created 3
Created 4
gc: collecting generation 0...
gc: objects in each generation: 5 0 3239
gc: done, 0.0000s elapsed.
Created 5
Created 6
Created 7
gc: collecting generation 0...
gc: objects in each generation: 5 3 3239
gc: done, 0.0000s elapsed.
Created 8
Created 9
gc: collecting generation 2...
gc: objects in each generation: 2 6 3235
gc: done, 0.0004s elapsed.
Debugging¶
Debugging memory leaks can be challenging. gc includes several options to expose the inner workings to make the job easier. The options are bit-flags meant to be combined and passed to set_debug() to configure the garbage collector while your program is running. Debugging information is printed to stderr.
The DEBUG_STATS flag turns on statistics reporting, causing the garbage collector to report when it is running, the number of objects tracked for each generation, and the amount of time it took to perform the sweep.
import gc
gc.set_debug(gc.DEBUG_STATS)
gc.collect()
This example output shows two separate runs of the collector because it runs once when it is invoked explicitly, and a second time when the interpreter exits.
$ python gc_debug_stats.py
gc: collecting generation 2...
gc: objects in each generation: 667 2808 0
gc: done, 0.0005s elapsed.
gc: collecting generation 2...
gc: objects in each generation: 0 0 3164
gc: done, 0.0004s elapsed.
Enabling DEBUG_COLLECTABLE and DEBUG_UNCOLLECTABLE causes the collector to report on whether each object it examines can or cannot be collected. You need to combine these flags need with DEBUG_OBJECTS so gc will print information about the objects being held.
import gc
flags = (gc.DEBUG_COLLECTABLE |
gc.DEBUG_UNCOLLECTABLE |
gc.DEBUG_OBJECTS
)
gc.set_debug(flags)
class Graph(object):
def __init__(self, name):
self.name = name
self.next = None
print 'Creating %s 0x%x (%s)' % (self.__class__.__name__, id(self), name)
def set_next(self, next):
print 'Linking nodes %s.next = %s' % (self, next)
self.next = next
def __repr__(self):
return '%s(%s)' % (self.__class__.__name__, self.name)
class CleanupGraph(Graph):
def __del__(self):
print '%s.__del__()' % self
# Construct a graph cycle
one = Graph('one')
two = Graph('two')
one.set_next(two)
two.set_next(one)
# Construct another node that stands on its own
three = CleanupGraph('three')
# Construct a graph cycle with a finalizer
four = CleanupGraph('four')
five = CleanupGraph('five')
four.set_next(five)
five.set_next(four)
# Remove references to the graph nodes in this module's namespace
one = two = three = four = five = None
print
# Force a sweep
print 'Collecting'
gc.collect()
print 'Done'
The two classes Graph and CleanupGraph are constructed so it is possible to create structures that are automatically collectable and structures where cycles need to be explicitly broken by the user.
The output shows that the Graph instances one and two create a cycle, but are still collectable because they do not have a finalizer and their only incoming references are from other objects that can be collected. Although CleanupGraph has a finalizer, three is reclaimed as soon as its reference count goes to zero. In contrast, four and five create a cycle and cannot be freed.
$ python -u gc_debug_collectable_objects.py
Creating Graph 0x100460750 (one)
Creating Graph 0x100460790 (two)
Linking nodes Graph(one).next = Graph(two)
Linking nodes Graph(two).next = Graph(one)
Creating CleanupGraph 0x1004607d0 (three)
Creating CleanupGraph 0x100460810 (four)
Creating CleanupGraph 0x100460850 (five)
Linking nodes CleanupGraph(four).next = CleanupGraph(five)
Linking nodes CleanupGraph(five).next = CleanupGraph(four)
CleanupGraph(three).__del__()
Collecting
gc: collectable <Graph 0x100460750>
gc: collectable <Graph 0x100460790>
gc: collectable <dict 0x100360b30>
gc: collectable <dict 0x100361dc0>
gc: uncollectable <CleanupGraph 0x100460810>
gc: uncollectable <CleanupGraph 0x100460850>
gc: uncollectable <dict 0x100361ee0>
gc: uncollectable <dict 0x100362240>
Done
The flag DEBUG_INSTANCES works much the same way for instances of old-style classes (not derived from object.
import gc
flags = (gc.DEBUG_COLLECTABLE |
gc.DEBUG_UNCOLLECTABLE |
gc.DEBUG_INSTANCES
)
gc.set_debug(flags)
class Graph:
def __init__(self, name):
self.name = name
self.next = None
print 'Creating %s 0x%x (%s)' % (self.__class__.__name__, id(self), name)
def set_next(self, next):
print 'Linking nodes %s.next = %s' % (self, next)
self.next = next
def __repr__(self):
return '%s(%s)' % (self.__class__.__name__, self.name)
class CleanupGraph(Graph):
def __del__(self):
print '%s.__del__()' % self
# Construct a graph cycle
one = Graph('one')
two = Graph('two')
one.set_next(two)
two.set_next(one)
# Construct another node that stands on its own
three = CleanupGraph('three')
# Construct a graph cycle with a finalizer
four = CleanupGraph('four')
five = CleanupGraph('five')
four.set_next(five)
five.set_next(four)
# Remove references to the graph nodes in this module's namespace
one = two = three = four = five = None
print
# Force a sweep
print 'Collecting'
gc.collect()
print 'Done'
In this case, however, the dict objects holding the instance attributes are not included in the output.
$ python -u gc_debug_collectable_instances.py
Creating Graph 0x100469710 (one)
Creating Graph 0x100469758 (two)
Linking nodes Graph(one).next = Graph(two)
Linking nodes Graph(two).next = Graph(one)
Creating CleanupGraph 0x1004697e8 (three)
Creating CleanupGraph 0x100469830 (four)
Creating CleanupGraph 0x100469878 (five)
Linking nodes CleanupGraph(four).next = CleanupGraph(five)
Linking nodes CleanupGraph(five).next = CleanupGraph(four)
CleanupGraph(three).__del__()
Collecting
gc: collectable <Graph instance at 0x100469710>
gc: collectable <Graph instance at 0x100469758>
gc: uncollectable <CleanupGraph instance at 0x100469830>
gc: uncollectable <CleanupGraph instance at 0x100469878>
Done
If seeing the uncollectable objects is not enough information to understand where data is being retained, you can enable DEBUG_SAVEALL to cause gc to preserve all objects it finds without any references in the garbage list, so you can examine them. This is helpful if, for example, you don’t have access to the constructor to print the object id when each object is created.
import gc
flags = (gc.DEBUG_COLLECTABLE |
gc.DEBUG_UNCOLLECTABLE |
gc.DEBUG_OBJECTS |
gc.DEBUG_SAVEALL
)
gc.set_debug(flags)
class Graph(object):
def __init__(self, name):
self.name = name
self.next = None
def set_next(self, next):
self.next = next
def __repr__(self):
return '%s(%s)' % (self.__class__.__name__, self.name)
class CleanupGraph(Graph):
def __del__(self):
print '%s.__del__()' % self
# Construct a graph cycle
one = Graph('one')
two = Graph('two')
one.set_next(two)
two.set_next(one)
# Construct another node that stands on its own
three = CleanupGraph('three')
# Construct a graph cycle with a finalizer
four = CleanupGraph('four')
five = CleanupGraph('five')
four.set_next(five)
five.set_next(four)
# Remove references to the graph nodes in this module's namespace
one = two = three = four = five = None
# Force a sweep
print 'Collecting'
gc.collect()
print 'Done'
# Report on what was left
for o in gc.garbage:
if isinstance(o, Graph):
print 'Retained: %s 0x%x' % (o, id(o))
$ python -u gc_debug_saveall.py
CleanupGraph(three).__del__()
Collecting
gc: collectable <Graph 0x100460790>
gc: collectable <Graph 0x1004607d0>
gc: collectable <dict 0x100361990>
gc: collectable <dict 0x100361db0>
gc: uncollectable <CleanupGraph 0x100460850>
gc: uncollectable <CleanupGraph 0x100460890>
gc: uncollectable <dict 0x100361ed0>
gc: uncollectable <dict 0x100362230>
Done
Retained: Graph(one) 0x100460790
Retained: Graph(two) 0x1004607d0
Retained: CleanupGraph(four) 0x100460850
Retained: CleanupGraph(five) 0x100460890
For simplicity, DEBUG_LEAK is defined as a combination of all of the other options.
import gc
flags = gc.DEBUG_LEAK
gc.set_debug(flags)
class Graph(object):
def __init__(self, name):
self.name = name
self.next = None
def set_next(self, next):
self.next = next
def __repr__(self):
return '%s(%s)' % (self.__class__.__name__, self.name)
class CleanupGraph(Graph):
def __del__(self):
print '%s.__del__()' % self
# Construct a graph cycle
one = Graph('one')
two = Graph('two')
one.set_next(two)
two.set_next(one)
# Construct another node that stands on its own
three = CleanupGraph('three')
# Construct a graph cycle with a finalizer
four = CleanupGraph('four')
five = CleanupGraph('five')
four.set_next(five)
five.set_next(four)
# Remove references to the graph nodes in this module's namespace
one = two = three = four = five = None
# Force a sweep
print 'Collecting'
gc.collect()
print 'Done'
# Report on what was left
for o in gc.garbage:
if isinstance(o, Graph):
print 'Retained: %s 0x%x' % (o, id(o))
Keep in mind that because DEBUG_SAVEALL is enabled by DEBUG_LEAK, even the unreferenced objects that would normally have been collected and deleted are retained.
$ python -u gc_debug_leak.py
CleanupGraph(three).__del__()
Collecting
gc: collectable <Graph 0x100460790>
gc: collectable <Graph 0x1004607d0>
gc: collectable <dict 0x100360b20>
gc: collectable <dict 0x100361d20>
gc: uncollectable <CleanupGraph 0x100460850>
gc: uncollectable <CleanupGraph 0x100460890>
gc: uncollectable <dict 0x100361e40>
gc: uncollectable <dict 0x1003621a0>
Done
Retained: Graph(one) 0x100460790
Retained: Graph(two) 0x1004607d0
Retained: CleanupGraph(four) 0x100460850
Retained: CleanupGraph(five) 0x100460890
See also
- gc
- The standard library documentation for this module.
- weakref
- The weakref module gives you references to objects without increasing their reference count, so they can still be garbage collected.
- Supporting Cyclic Garbage Collection
- Background material from Python’s C API documentation.
- How does Python manage memory?
- An article on Python memory management by Fredrik Lundh.