References and Memory

Overview

Now we will turn our attention to the way that memory works in Java programs. This will be critical to our study of data structures. Data structures work by arranging values in memory so they can be used effectively, so they can't be used without knowing what's going on in memory.

Many of our data structures will also rely on references which are variables that store the location of another object in memory.

To start understanding references, suppose we have a class declaration like this in Java:


public class Person {
    private String name;
    private int birthYear;

    private Person bestFriend;

    // ...
}

Here, we have a class declaration with a number of variables declared inside of it. A Person object contains a name, birth year, and who the person's best friend is.

But the way we store the best friend is by including a Person object inside of the Person. But that Person must also include a Person for that person's best friend. Is it possible to have objects of the same type nested inside of each other like this?

A picture of a Person object with name, birthYear and bestFriend
stored inside it. The bestFriend is another Person object with those
things, and so on. — An (incorrect) view of how the Person class will be stored in memory

References

The answer to this riddle is that, when you declare an object in Java, it does not actually create the object in memory. Instead it creates a reference to an object, which may be created later.

A reference (which is also called a pointer) is essentially a variable which holds the address in memory of another variable. When you first declare an object, it creates the reference variable, and initializes the address to null (which is memory address 0).

So the way that a Person object is actually stored in memory would look like this:

The Person object stores a 4 byte integer for the birth year, 8 bytes for
the name (which is a reference to a String), and 8 bytes for the best friend (which
is a reference to a Person) — The actual memory layout of a Person object

Instantiation

We now need to actually create objects and put them into these references. Objects are generally created with the new keyword. A special case is String objects which can be created with new, or by putting text within quotes (which Java supports for convenience sake).

The code below fills in some references this way:


class Person {

    private String name;
    private int birthYear;
    private Person bestFriend;

    public Person(String name, int birthYear) {
        this.birthYear = birthYear;
        this.name = new String(name);
    }

    public void setFriend(Person friend) {
        bestFriend = friend;
    }
}

public class MemoryTest {
    public static void main(String args[]) {
        // make one person
        Person p1 = new Person("Alice Anderson", 1997);

        // make another person
        Person p2 = new Person("Bill Barber", 1998);

        // set them as each other's friend
        p1.setFriend(p2);
        p2.setFriend(p1);
    }
}

There are a few things happening here. First we initialize the "name" field inside of the constructor using new. This is optional with String objects in Java, but is shown here. We also initialize the birthYear field. Primitive objects in Java are not references, so we can't use new for that.

The constructor leaves the bestFriend field as null. It can later be set with the setFriend method. The main method makes two person objects and passes sets them as each other's best friend.

After running this program, this is the way that the objects might look in memory:

The memory layout of the fields of the two objects in the example
above. The reference fields contain the address of the objects they
refer to in memory. — Memory diagram of the objects in the program above

The exact memory addresses used are arbitrary examples. The important thing to understand is that these reference objects store the memory addresses of the objects which they refer to.

Because the exact memory addresses themselves don't really matter, we normally draw a diagram like this using arrows instead. That way we can still indicate which objects they are referring to without needing to specify addresses.

In this version of the image we replaced the memory addresses with
arrows indicating which fields refer to which objects in memory. — Memory diagram using arrows instead of addresses to show relationships

Because we draw reference variables as arrows like this, they are also called "pointers".

Stack vs. Heap Memory

There are actually two distinct areas of memory programs have access to: the stack and the heap. They are used for different purposes:

Stack	Heap
Allocated automatically	Allocated with new
Stores primitives and references	Stores objects
Have names	Are anonymous
Destroyed at end of scope	Destroyed when not referred to

Let's say that we have the following main method:


public static void main(String[] args) {
    Scanner in = new Scanner(System.in);

    // get user throw
    System.out.println("Enter throw (1)Rock, (2)Paper, (3)Scissors");
    int user = in.nextInt();

    // get computer throw
    Random rng = new Random();
    int comp = rng.nextInt(3) + 1;

    // figure winner
    int difference = user - comp;
    switch (difference) {
        case 0:
            System.out.println("Tie!");
            break;
        case 1:
        case -2:
            System.out.println("You won!");
            break;
        case -1:
        case 2:
            System.out.println("You lost :(");
            break;
    }
}

Which things are placed on the heap and which on the stack?

Stack Frames

The important thing about the stack is that each time you call a method, you are given a new place on the stack to store all of the variables that method might need. This is called a stack frame.

When you return from a method, all of the variables on that stack frame are destroyed. The only variables that can be accessed in a program are those that are in the currently executing method (or objects on the heap it has access to).

The stack essentially keeps track of our history of method calls from oldest to most-recent. For example, consider the following code:


class Stacks {
    public static void f(int x) {
        System.out.println(x);
    }

    public static void g(int x) {
        f(x + 1);
    }

    public static void h(int x) {
        g(x * 2);
    }

    public static void main(String[] args) {
        h(7);
    }
}

When this code runs, execution starts in main, then goes to h, then g, then f. When the functions begin to return, the chain of execution then goes back from f, back to g, then h and finally back to main where the program ends:

At first only main is on the stack. When a function is called, a new
stack frame for it is placed on top, so the stack grows bigger. Then the
functions begin to return and the stack frames are removed until only main
is left again. — The stack as this program is run

When a program runs, a stack is maintained to keep track of which function we are in. The block for each function is the stack frame, and contains all of the variables that method uses. We will not have to worry about the call stack very often because it is maintained for us by the virtual machine.

Common Memory Mistakes

There are two common mistakes when dealing with memory in Java. The first is to use a reference that has not been instantiated yet. For example, we could do that with code like this:


Person p;
// ...
p.show();

This will produce the famous "NullPointerException":

Exception in thread "main" java.lang.NullPointerException
	at Example.main(Example.java:28)

The fix for this is to make sure that all objects you're trying to use have actually been instantiated.

The second most common mistake regarding memory in Java programs is not understanding that object variables are just references, and not objects themselves. Misunderstanding this will lead to countless issues.

For example, the following code makes an array of Person objects. It then makes a "defaultPerson" object with some default properties. It sets each slot in the array to this person, then tries to set the names of each individual element after that.


Person defaultPerson = new Person("Default", 1990);
for (int i = 0; i < 8; i++) {
    array[i] = defaultPerson;
}

array[0].setName("Alice");
array[1].setName("Billy");
array[2].setName("Claire");
array[3].setName("Dominic");
// ...

What will print if we print if we run the following code:


array[0].printName();