Building Binary Search Trees – detailed discussion

Introduction

This is the second article for Binary Search Trees, to read more about the previous article, please check the topic Binary Search Tree Basics – a detailed discussion. To get updates on the coming articles in the series, please use the Subscribe feature on the right panel.

Purpose of article

This article will resume from where we concluded in the first article, this is primarily to understand how to build a BST and understand the problems which arise after building Binary Search Trees. Then we will improve over it and find out ways to build better BSTs.

Do remember there is a purpose for every possible thing we do, the purpose of shifting our focus from Binary Tree to Binary Search Tree is to get guarantee of reducing the search time for an element compared to Binary Tree.

Also, in the previous articles (Binary Trees) when we tried to add a child to a node which already had two children, we were not able to add it. We will also try to overcome that limitation in this article.

The thumb rule of a BST

There is one and only one thumb rule associated with BST and that is as follows:

For a given node N the value stored at the left child L is always less than the value stored at N and the value stored at the right child R is always greater than the value stored at N.

With the above understanding we will try and build a Binary Search Tree from an array of N elements. Below is the array:

How do we start is always a big question in BST and there is no good rule to decide this. For simplicity we agree to start from the beginning of the array and try to build the tree one element at a time.
array
Step 1: We pick up 2 and try to add it in the tree, but since this is the first node there is no tree, so we create a root element and store 2 at the root RT.

The tree at this point looks like the below:
step1

Step 2: Now pick up the second element i.e. 1 and insert it into the tree. As we already have a root RT, we need to compare the value of RT with the element which we intend to insert. If the element has a value lower than the node we are visiting(in this case the root), we check if the node has a left child?

We find that the node doesn’t have any left child, hence we insert the element 1 as the left child of the node(root RT). The tree looks like below:
step2

Step 3: We pick up third element i.e. 5 and insert it into the tree. If the element is greater than the value at RT we check if the node RT has a right child?

We discover that RT doesn’t have a right child, so we add the element 5 to the right of RT and the tree looks like below:
step3
Step 4: We pick up the fourth element i.e. 6 and insert it into the tree. If the element is greater than the value at RT we check if the node RT has a right child?

We discover that RT has a right child, so we move to the right child R1 and do the same comparison again. Is the element greater than the value at node R1?

And the answer is yes, hence we check if R1 has a right child and we discover that R1 has no right child, so we add the element 6 as the right child of R1. The tree looks like the one below:
step4
Step 5: We pick up the fifth element i.e. 9 and insert it into the tree. If the element is greater than the value at RT we check if the node RT has a right child?

We discover that RT has a right child, so we move to the right child R1 and do the same comparison again. Is the element greater than the value at node R1?

And the answer is yes, hence we check if R1 has a right child and we discover that R1 has has a right child, so we move to the right child of R1 i.e. R2 and do the same comparison again. Is the element 9 greater than the value at node R2?

And the answer is yes, hence we check if R2 has a right child and we discover that R2 has no right child, so we add the element 6 as the right child of R2. The tree looks like the one below:
step5
Step 6: We pick up the sixth element i.e. 10 and insert it into the tree. If the element is greater than the value at RT we check if the node RT has a right child?

We discover that RT has a right child, so we move to the right child R1 and do the same comparison again. Is the element greater than the value at node R1?

And the answer is yes, hence we check if R1 has a right child and we discover that R1 has a right child, so we move to the right child of R1 i.e. R2 and do the same comparison again. Is the element 10 greater than the value at node R2?

And the answer is yes, hence we check if R2 has a right child and we discover that R2 has a right child, so we move to the right child of R2 i.e. R3 and do the same comparison again. Is the element 10 greater than the value at node R3?

And the answer is yes, hence we check if R3 has a right child and we discover that R3 has no right child, so we add the element 10 as the right child of R3. The tree looks like the one below:
step6
Step 7: We pick up the seventh element i.e. 8 and insert it into the tree. If the element is greater than the value at RT we check if the node RT has a right child?

We discover that RT has a right child, so we move to the right child R1 and do the same comparison again. Is the element 8 greater than the value at node R1?

And the answer is yes, hence we check if R1 has a right child and we discover that R1 has a right child, so we move to the right child of R1 i.e. R2 and do the same comparison again. Is the element 8 greater than the value at node R2?

And the answer is yes, hence we check if R2 has a right child and we discover that R2 has a right child, so we move to the right child of R2 i.e. R3 and do the same comparison again. Is the element 8 greater than the value at node R3?

And the answer is no, hence we check if R3 has a left child and we discover that R3 has no left child, so we add the element 8 as the left child of R3. The tree looks like the one below:
step7
Step 8: We pick up the eighth element i.e. 7 and insert it into the tree. If the element is greater than the value at RT we check if the node RT has a right child?

We discover that RT has a right child, so we move to the right child R1 and do the same comparison again. Is the element 8 greater than the value at node R1?

And the answer is yes, hence we check if R1 has a right child and we discover that R1 has a right child, so we move to the right child of R1 i.e. R2 and do the same comparison again. Is the element 7 greater than the value at node R2?

And the answer is yes, hence we check if R2 has a right child and we discover that R2 has a right child, so we move to the right child of R2 i.e. R3 and do the same comparison again. Is the element 7 greater than the value at node R3?

And the answer is no, hence we check if R3 has a left child and we discover that R3 has a left child, so we move to the left child of R3 i.e. L2 and do the same comparison again. Is the element 7 greater than the value at node L2?

And the answer is no, hence we check if L2 has a left child and we discover that L2 has no left child, so we add the element 7 as the left child of L2. The tree looks like the one below:
step8

What is wrong with the above BST?

Well it is a tree and it also satisfies the criteria of a BST but somewhere deep inside when I look at this picture I do not feel good about it. I believe that we could have done something better than this, before jumping into that I would like to point out the problems with this tree, they are listed below:

1) This tree took a lot of time in creation, the total number of comparisons done in creating is more when the tree size is more. For e.g. inserting 7 took five comparisons, in the best case it could have been done in three or may be four comparisons(if the root chosen was any other number like 6 or 8). Similar is the case with other insertions as well.

2) Search queries will take more time in this tree, as inserting 7 took five comparisons it will take five comparisons to search 7 as well. Hence the insert time is proportional (or may i say similar) to the fetch time.

3) On a less serious note, definitely it doesn’t look good. 🙂

Imaging the tree which only extended in one side, it would be as good as a sorted array on which we can’t even perform binary search. What does this mean?

It means that we cannot harness the benefits of binary search, so even if it is a valid binary tree, it doesn’t give us any thing. This precisely means that searches take O(N) time and not logN (where N is the number of elements/nodes) and so is insertion. for details read Binary Search Tree Basics – a detailed discussion

What made this happen?

If we closely observe, we see that the input array was almost sorted and that is the culprit, or may I say we chose the elements to insert in the wrong order, or we chose the root incorrectly.

Had it been the case that the root and all the elements were chosen randomly, we would have ended up with a better tree, which could have guaranteed a logN insertion and a logN search in most of the cases.

How to fix this?

Well the answer lies in the previous paragraph RANDOMNESS. And believe me this is a keyword in achieving faster running time for most of the algorithm known today. If you can ensure random enough data, you can guarantee a lesser running time in a more strict way.

If the data for building the binary search tree is randomly picked then you could have got the tree present in the previous article. It is shown below:
bst

BST Sort

Another point I would like to comment upon is we unknowingly sorted the input array which building the binary search tree. Just do an inorder traversal on the BST and you will get the data in a sorted manner.

Hence, the running time of the BST sort is proportional to the sum of running time for inserting each element in the tree (by inserting an element, we are basically putting it at a place according to its natural ordering).

What is the cost of inserting an element in a perfectly built BST which has elements evenly distributed across the left and right sub tree?

Well, the answer is very simple, it will not take comparisons more than the height of the tree, because at each level you do just one comparison and figure out if the element should go to the right or the left. So, the maximum number of comparisons required to insert an element is H where H is the height of the BST. If we want to define H in terms of total number of nodes N present in the BST, it would be H = log N (as it is a binary tree)

Now inserting N elements will take N times the cost of inserting one element, and it would be N * logN = NlogN. So we can say that the BST sort takes O(NlogN) time for a perfectly built binary search tree.

What if the tree is skewed(inclined to one side) as we discussed in the previous section?

The running time will definitely change and will increase, because the insertion will take O(N) time for each element instead of O(logN) time. So building the whole tree might take time proportional to N2. And I can say that the BST Sort running time is O(N2).

Well if you can recollect some sorting algorithms you must have noticed that whatever comparisons we made in building a binary search tree is sam as the comparisons we made during understanding the Quick sort. Definitely not in the same order, but yeah they are the same comparisons. The BST Sort closely resembles with the Quick sort and exhibits almost the same behavior as the Quick Sort.

Just to demonstrate the same what all comparisons did we make while building the BST above?
RT(2) is compared with – 1 , 5 , 6 , 9 , 10 , 8 , 7
R1(5) is compared with – 6 , 9 , 10, 8 , 7
R2(6) is compared with – 9 , 10, 8 , 7
R3(9) is compared with – 10, 8 , 7
L2(8) is compared with – 7

Now let us find the comparisons we made in sorting the same array using quick sort. This is the result of the quick sort code which I took from the Quick sort article
comparison with 2 – 1 , 5 , 6 , 9 , 10 , 8 , 7
comparison with 5 – 6 , 9 , 10, 8 , 7
comparison with 6 – 9 , 10, 8 , 7
comparison with 9 – 10, 8 , 7
comparison with 7 – 8

This shows that the comparisons are same for quick sort and BST sort.

Building a perfect binary search tree

As we discussed above the key to perfect BST is randomness, so we can also call it a Randomly Built Binary Search Tree. Now it is easy to build a binary search tree randomly, pick an element at a random index and build the tree as an usual BST. You need to take care of the random number generator and repetitions because you cannot add the element at a given index twice.

There is a theorem which says that the height of a randomly built BST is O(logN) where N is the total elements to be inserted. The proof being pretty complicated and I prefer not to share that at this point, we may take it in advanced topics later.

Conclusion

In this article we learnt how to build a binary search tree and we also analyzed the running time of BST Sort and understood the best ways to build a perfect BST. We also compared it with the quick sort algorithm and figured out the similarities between them. In coming articles we will deal with writing some code to create a binary search tree and we may visit some interview question.

Hope this helps, happy reading. Stay connected and stay Subscribed