Manual memory management leaves us open to errors:
// allocate Object* a = new Object; // de-allocate delete a;
Most modern languages do not require the programmer to manage memory themselves.
Instead they provide garbage collectors which automatically free unused objects. Garbage collection was first created for the LISP language.
A simple way of performing garbage collection is called reference counting. It works by keeping track of how many pointers are pointing to a given object.
Each time a pointer is pointed at an object, the counts are updated.
Reference counting can be implemented in a language like C++. This is typically called smart pointers.
The C++ smart_ptr class can be used for that as in this example.
The downside of reference counting is that it does not catch cyclical references.
A cycle of references is present in several data structures such as a circularly linked list:
Because each node has a reference to it from another node, its count will never hit zero, even when the head pointer is removed.
This example demonstrates this problem with smart_ptr.
Reference counting is usually not used for garbage collection for this reason (PHP and Perl are exceptions).
Garbage collectors work by tracing all of the objects in a program. If we start with all of the objects that can be directly accessed, then scan out, we will reach all accessible objects. Any object not reached is garbage.
Objects that are directly accessible: local variables and globals.
Any object that is part of the root set, or accessible indirectly from the root set.
Any object on the heap that is not reachable.
The basic garbage collection algorithm is called mark and sweep. It works as follows:
In a compacting garbage collector, the heap is divided into live and dead sections:
Once the set of reachable objects have been found, they are moved to the live section. This compacts the live objects together.
This means we have to move every live object, but has a few benefits:
In typical programs, there tend to be two types of objects:
Objects that exist for most of the program are allocated early. Recently allocated objects are more likely to become garbage sooner. This is called infant mortality.
A generational garbage collector maintains several generations of objects:
A generational collector works as follows:
Generational GC has a few benefits:
Python uses generational garbage collection as can be seen in the Python source code. The main collection function is on line 821.
Simple garbage collection halts the running program to perform collection. This can produce noticeable lags in execution.
To avoid this, modern garbage collectors use incremental collection. This means that execution and collection are interleaved, usually in separate threads.This presents many challenges:
Incremental collectors usually keep objects in three categories:
Incremental collection is complex and must still pause execution in some instances.
Many collectors can also be explicitly invoked by the programmer. For example, in Java:
Garbage collection involves some overhead, but modern garbage collectors are quite well optimized.
Also manual allocation and de-allocation has an overhead as well. Consider the following two programs:
The Java program will run much faster because of GC optimizations.
Manual memory management is not always faster. Large programs in C or C++ typically try to optimize memory usage by pooling objects and reusing space.
Copyright © 2019 Ian Finlayson | Licensed under a Creative Commons Attribution 4.0 International License.