Home CPSC 340

Binary Search Trees Balancing


 

Balance

There are multiple ways to build a binary search tree storing the same data depending on the order in which the nodes are inserted. What happens if we insert values in a sorted order?

What is produced from this is basically a singly-linked list. The reason for this is that when the nodes are inserted in order, each successive value is greater than any previous ones, so the insertion can only take the right branch. So the left branches are never used and we end up with a line of nodes:

This tree has only right children, with each node chaining down the right side,
essentially it looks like a singly-linked list.
A degenerate tree

This is called a degenerate tree.

The ideal case is that the tree is balanced which means that, for each node, no subtree is more than 1 node deeper than the other subtree.

The efficiency of the binary search tree depends on it being reasonably well balanced.


 

Balancing a Binary Search Tree

There are a few ways to balance a binary search tree. Perhaps the simplest way is to take the data out of it and store it, in sorted order, in a temporary array. Then, take the data in the array and re-insert it into the binary search tree in such a way that the tree will be balanced.

To take the data out, we can simply do an inorder traversal, and store the data in an array as we visit each node. Then we can remove each node from the tree using the remove function above repeatedly.

Then all that remains is to rebuild the tree from the array. Given a sorted array of items, in what order should we insert the items to achieve a balanced tree?

An array containing the data beig stored in the tree.
The data is in sorted order in the array.
Storing the tree data in an array

Let's begin with the first item. The first item that is added will become the root of the tree. Which item do we want as the root? Ideally it would be the middle value of the data. That way there will be an equal number of items in the left subtree as there are in the right subtree:

The same array of data contains a label showing that the middle item of the array
is going to become the root of our tree.
The middle item will be the root

After adding the root, we will have to add the items to the left and those to the right. The first item from the left we add will be the immediate left child of the root. For the same reasons that the root should be the middle item, the left child of the root should be from the middle of the left half of the array. Likewise for the right.

This image adds 'left' and 'right' labels showing that the root of the left
subtree is halfway through the left half, and the root of the right subtree is halfway
through the right half.
The roots of the left and right subtrees

We will add the middle item, then recursively add the left and right halves of the array using the same procedure. This balancing algorithm is given below:

  1. Create an array to hold the elements.
  2. Fill the array using an in-order traversal.
  3. Remove all of the nodes from the tree.
  4. Insert the entire array items using the recursive algorithm below.

This recursive algorithm will insert the array items:

  1. If there are elements in the array:
    1. Calculate the middle of the array.
    2. Insert the middle item.
    3. Recursively insert the left half.
    4. Recursively insert the right half.

Balance.java adds this balancing algorithm into the code we have so far.


 

Analysis

The two main binary search tree operations, insertion and searching are both $O(\log n)$ when the tree is balanced. This is due to the fact that with a perfectly balanced tree, the height is limited to the log base 2 of n and the height determines the most times we could recurse.

In the worst case, however, the tree is degenerate so the height will be equal to n. This would make insertion and searching both take $O(n)$ time.

If we balance a binary search tree, we ensure that it can be used in $O(\log n)$ time.

Arrays can also be searched in $O(\log n)$ time using binary search. However, inserting into a binary search tree is faster than inserting into an array - especially since the array must remain sorted for binary search to work.

Copyright © 2024 Ian Finlayson | Licensed under a Creative Commons BY-NC-SA 4.0 License.