There are multiple ways to build a binary search tree storing the same data depending on the order in which the nodes are inserted. What happens if we insert values in a sorted order?
What is produced from this is basically a singly-linked list. The reason for this is that when the nodes are inserted in order, each successive value is greater than any previous ones, so the insertion can only take the right branch. So the left branches are never used and we end up with a line of nodes:
This is called a degenerate tree.
The ideal case is that the tree is balanced which means that, for each node, no subtree is more than 1 node deeper than the other subtree.
The efficiency of the binary search tree depends on it being reasonably well balanced.
There are a few ways to balance a binary search tree. Perhaps the simplest way is to take the data out of it and store it, in sorted order, in a temporary array. Then, take the data in the array and re-insert it into the binary search tree in such a way that the tree will be balanced.
To take the data out, we can simply do an inorder traversal, and store the data in an array as we visit each node. Then we can remove each node from the tree using the remove function above repeatedly.
Then all that remains is to rebuild the tree from the array. Given a sorted array of items, in what order should we insert the items to achieve a balanced tree?
Let's begin with the first item. The first item that is added will become the root of the tree. Which item do we want as the root? Ideally it would be the middle value of the data. That way there will be an equal number of items in the left subtree as there are in the right subtree:
After adding the root, we will have to add the items to the left and those to the right. The first item from the left we add will be the immediate left child of the root. For the same reasons that the root should be the middle item, the left child of the root should be from the middle of the left half of the array. Likewise for the right.
We will add the middle item, then recursively add the left and right halves of the array using the same procedure. This balancing algorithm is given below:
This recursive algorithm will insert the array items:
Balance.java adds this balancing algorithm into the code we have so far.
The two main binary search tree operations, insertion and searching are both $O(\log n)$ when the tree is balanced. This is due to the fact that with a perfectly balanced tree, the height is limited to the log base 2 of n and the height determines the most times we could recurse.
In the worst case, however, the tree is degenerate so the height will be equal to n. This would make insertion and searching both take $O(n)$ time.
If we balance a binary search tree, we ensure that it can be used in $O(\log n)$ time.
Arrays can also be searched in $O(\log n)$ time using binary search. However, inserting into a binary search tree is faster than inserting into an array - especially since the array must remain sorted for binary search to work.
Copyright © 2024 Ian Finlayson | Licensed under a Creative Commons BY-NC-SA 4.0 License.