Understanding Search Trees – Red Black Trees

Introduction

This article is to highlight the importance of having a Search tree with minimum height and revisit couple of data structures which can guarantee a minimum height Tree.

Understanding Search Trees – Red Black Trees

An evolution of BST that aim to keep the tree balanced without affecting the complexity of the primitive operations. This is done by colouring each node in the tree with either red or black and preserving a set of properties that guarantees that the deepest path in the tree is not longer than twice the shortest one.

A red black tree is a binary search tree with the following properties:
• Every node is coloured with either red or black.
• The root and all leaf nodes (nil) are coloured with black
• The parent of red node is black.
• Every path from a node n to a descendant leaf has the same number of black nodes, i.e. the black height of all the leaf nodes is same.

Height of a RBT

Consider the Red Black Tree below:
RBT1

Questions

  • How many leaves are there in the tree above, and the answer is nine.
  • How many keys are there in the tree above, and the answer is eight.
  • In any case for a RBT similar to one above, the number of leaves = number of keys + 1
  • Number of leaves in a branching tree is always one greater than the number of internal nodes.

Now let us merge each red node into its parent. What we get is the following tree:
RBT2

The above tree is an example of a 2,3,4 Tree or you can also say 2,4 Tree. This shows that if we have a RBT and we merge all the red nodes to their parents, we get a 2,3,4 Tree.

In general what is the relation between the height and the total number of leaves in a complete binary Tree?

Answer : The total number of leaves is equal to 2h where h is the height of the binary tree, and 2 is the branching factor(the maximum number of branches each node results into) for a binary tree.

Similarly if the branching factor of a tree is three( that means each node can have a maximum of 3 children), then the total number of leaves would be 3h, where h is the height of the tree.

Hence, we can conclude that if the branching factor of a tree is k then the total number of leaves is kh, for a tree of height h.

Now that the relation is clear, let us see the minimum number of leaves possible in a 2,3,4 tree.
A 2,3,4 tree of height h can have minimum number of leaves only if it can branches into two at every level, which is an ideal case of binary tree. Hence, the minimum number of leaves would be 2h.

Also, let us find the maximum number of leaves possible in a 2,3,4 tree.
A 2,3,4 tree of height h can have maximum number of leaves only if can branches into four at every level, which is an ideal case of quaternary tree. Hence, the minimum number of leaves would be 4h.

Which means that the number of leaves in the 2,3,4 tree will be bounded by 2h <= number of leaves <= 4h.

Now if 2h <= number of leaves, then taking log of both sides.
h <= log(N+1) where N is the number of keys in the 2,3,4 tree or Red black tree. [For detail see the third bullet point above].

Also, the number of red nodes can be at most half the length of the longest path in the tree, because a red node cannot have a red parent. Which means the maximum number of red nodes in a path can be achieved by placing red and black nodes alternatively. Hence, if this is the case then the height of 2,3,4 tree can be at most half the height of the corresponding red black tree.

And we just deduced that h <= log (N+1), that means
The height of Red Black Tree will be 2h, which is equal to 2 * log(N+1)

Hence we can say that the relation between the number of keys and the height of a Red Black Tree is HRBT = log (N+1)

Queries in a RBT

Queries means the operation which ask us question about the data in the data structure.

Now we have a bound over the height of a RBT that means we can guarantee that all the queries like min, max, search etc can be all done in a running time proportional to height i.e. O(log N)

Update in a RBT

Updates means the operations which update the data in the data structure. For e.g.: Insertion and Deletion. This will primarily require the following:

  • Re-coloring – changing the colors of nodes
  • Re-structuring – rotations. (which is a constant time operation)

RBT3

The above diagram from left to right is a result of right rotation of B and the diagram when viewed from right to left is a left rotation of A.
If we perform an inorder traversal on both the trees, we get the following:

Inorder of left tree : X < A < Y < B < Z
Inorder of right tree: X < A < Y < B < Z

This means that the rotation will not disturb the BST property.

The above diagram is to display the rotations in constant time. Assume the triangles X, Y and Z to be valid RBTs and we are trying to rotate A.
If A is the left child of its parent, then we perform a right rotate and if A is the right child of its parent we perform a left rotate. Let us assume we are doing a right rotate:

Step 1: Store the parent of A’s parent(B) in a temporary node GRANDPARENT Step 2: Store the right child of A in a temporary node, in our case TEMP Step 3: Attach the parent(A) i.e. B as the right child of A.
Step 4: Attach the TEMP as the left child of B.
Step 5: Attach A to the left/right of GRANPARENT depending on where B was earlier.

We can rotate on the same lines while doing a left rotate which seems to be a constant time operation.

Conclusion

We learnt about 2,3,4 trees and Red Black Trees and found that both are similar in nature. We also established a strict bound on the height of a RBT and 2,3,4 tree and proved that it is the minimum height tree, which guarantees the O(log N+1) time operations.

Insertion and Deletion in a Red Black Tree to be continued….

Stay connected and stay Subscribed