Balanced Search Trees – AVL Tree

Introduction

When we talk about balanced search trees, we specifically are talking about two things.

  • A search trees.
  • A height balanced tree.

Here in this post we will consider a Binary Search Tree and will try to equip with features to self balance itself upon modifications. Before starting up with the logic and code part we need to understand the motive behind this kind of data structure and more precisely we need to understand the need of such a complex scenario.

Apart from that we also need to understand what it means to be termed as balanced.

What does balancing means?

We say a tree to be balanced when it is height balanced, that means the following:

  • It is not skewed.
  • It has a uniform distribution of nodes through out the structure.
  • The height is as small as possible.
  • A general relation can be derived such that we can claim that the height is proportional to logN where N is the total number of nodes.

Why is balancing important in a search tree?

Balanced Search Trees makes a tree data structure efficient, it guarantees logN running time for all the dictionary operations (Insert, Search, Delete). This of course is achieved by paying some cost  as mentioned below:

  • Re balancing upon insertion of new node.
  • Re balancing upon deletion of existing node.

Both the operations need to have constant running time. If the running time is not constant, we might not get enough benefits.

AVL Tree

AVL (Adelson-Velsky and Evgenii Landis’ ) tree is one of the self balancing binary search tree data structures. Each node of the AVL Tree maintains a specific relation between its left and right sub trees.

For a given node N, the absolute difference between the heights of its left and right sub tree is either zero or one. The difference between the heights can be termed as the balance factor.

We can calculate the balance factor of a node N as the height(left sub tree) – height(right sub tree). The preferred value of balance factor is an element of the set {-1, 0, +1}.

Upon addition or deletion of a node, the height of left or right sub tree might change and in turn affect the balance factor. In which case the balance factor for the node would be recalculated.

If in case the value is not in the prescribed range then the tree is said to be unbalanced. If a tree is unbalanced, there is a need to re balance the trees.

Re balancing Scenarios

Upon insertion or deletion of a node a tree may or may not get unbalanced. If it does not get unbalanced, there is absolutely no need of re balancing. But, if the tree gets unbalanced then the tree structure will resemble one of the following :

Balanced Search Trees - AVL Tree

 

The Left Right Case

The circles are individual nodes of a tree and they are labeled by the numbers 1, 2 and 3. The triangles are perfectly balanced sub trees which need no re balancing.

We can fairly assume that the balance factor of all the triangles is zero and hence, the nodes 1, 2 and 3 are disturbing the balance of the tree. Notice the balance factor of node 1 is two which does not lie in the preferred range.

The balancing is done through a rotation which is a multi step process:

  • Pick up the unbalanced node, i.e. node 1
  • Get the right child(node 3) of its left child(node 2)
  • Attach the grand child (node 3) to the left of the unbalanced node (node 1)
  • Attach the left sub tree(sub tree B) of grandchild (node 3) to the right of the node 2.
  • Attach node 2 to the left of node 3.
  • Update the heights of each of the affected nodes.

Once the above steps are followed, our tree will land up in another situation:

The Left Left Case

Resembles to the second image above. Node 1 is still unbalanced, and to balance the tree we need another rotation as follows:

  • Remove right sub tree (sub tree C) of node 3
  • Remove the link from node 1 to node 3
  • Attach node 1 to the right of node 3
  • Attach sub tree C to the left of node 1.

This completes the rotation for the left left case and finally the tree is balanced. Please note that in some unbalanced cases, the tree already is in the left left case. Whereas in the others the tree might be in the left right case, and it would take one rotation cycle for converting it to the left left case and another rotation cycle to convert it to the balanced state.

Similar to above there are two more scenarios, exactly opposite to the above.

The Right Left Case

Balanced Search Trees - AVL Tree

The explanation is similar to the one above. If you had difficulty understanding this case, let me know in the comments.

Verification

It is pretty common for scholars, new to AVL trees, to get the rotations wrong sometimes. Here is a verification technique which helps to recheck your work done in rotations and re balancing.

If you closely notice the diagrams in each step, you might get it. The in order traversal of the unbalanced tree, the intermediate tree and the final tree remains exactly same.

If you choose a different technique to rotate and re balance, still you need to make sure that the in order traversal of the unbalanced tree and the balanced tree is exactly same.

Source Code

Here is the essential part of the source code as per the above algorithm

This is a recursive call, which is similar to the BST Insert. Additionally it verifies the balance factor after each insertion and if there is a unbalanced node, it tries to balance that as per the algorithm described above.

Note that, this is recursive call, hence the every node’s height is calculated from the bottom most node and the first unbalanced node from the bottom is attempted to be balanced first.

Here is the code for the rotations for all the four cases. This is a constant amount of work, we can also claim that each method is trying to find the height of two nodes which is logN work. Agreed, but if we can have a data structure which can store its height (and keeps updating in O(1) time, that will be constant work. In this case it is log N.

 

For the complete source code, please visit the github link .

Analysis

As discussed above the running time of insertion and deletion is O(logN) if we can create the said data structure. If we use this code as is, it will be a O(logN * logN) or O(log2N)  algorithm.

No additional space required for the algorithm. Hence, it has a space complexity of O(1).

Enjoy learning.. Stay connected & Stay subscribed…