Binary Search Tree Basics – a detailed discussion

Introduction

The article Binary Search Tree Basics – a detailed discussion is the first article for Binary Search Trees and the ninth article in the series of non-linear data structures, to read more about the previous article, please check the topic Operations on a binary tree – Hierarchical Datastructure. To get updates on the coming articles in the series, please use the Subscribe feature on the right panel.

Purpose of article

This article is to discuss Binary Search Tree in Detail, mostly the theory behind a binary search tree and how it is different from a Binary Tree and what all does it bring to the table if we use a Binary Search Tree. Here onward we will use the term BST and Binary Search Tree interchangeability.

Pre-requisites

To understand Binary Search Tree, we need to understand the term Binary Search. Let me tell you a small story, once upon a time, we decided to store objects (for simplicity consider numbers) for future references. We chose a data structure called array, added each of the number in the array. Few days later someone came to us with a number K and asked to search if K exists in the array, in case it exists he expected the index of the element in return and if it is not present we just tell him that it’s not present.

Now let us assume that we have an array which contains the numbers as below. Assume the length of the array is N.

What is the average time taken to search a number?
I will start from the very basic,
Number of comparisons to search the first element – 1
Number of comparisons to search the second element – 2
Number of comparisons to search the third element – 3
Number of comparisons to search the fourth element – 4
Number of comparisons to search the ith element – i

Hence, total number of comparisons to search each element once would be the sum of all the comparisons listed above, say it Tcomparison. So Tcomparison = 1 + 2 + 3 + 4 + … + i + … N, where N is the size of the array.

Tcomparison = [N * (N+1)] / 2 and the average number of comparisons for a search would be total comparisons divided by total numbers. That would be equal to [N * (N+1)] / [2 * N], which gives us [N+1] / 2 or we can say order N.

This proves that a regular search in an array will take around O(N) running time. If we relate this analysis, then the regular search in a binary tree will also take the same O(N) running time. Because if we want to search an element in N elements, we need to go and look at each element and that makes the running time proportional to N.

Can we do this in less than N time?
The answer is, Yes we can and we will do it but it will have some cost as every improvement has a cost associated with it. What if the input array is sorted or what if we invest some of our time in sorting and then do the searching.

Let us find out the running time of searches in a sorted array. Assume the following array:
step7
If someone asks me to search 5, I perform the following pseudo code:
binarysearch
The code for hte same is below:

Explanation

The algorithm is very simple, every time we find the mid element and compare the value of that element with the number to be search. Three cases arise:
1) The value is equal to the number to be searched, we return the index as mid.

2) The value is lesser than the number to be search, in this case it doesn’t make sense to search in the left side of the mid index, because the value to be searched is greater than the value at the mid. Hence, we search in the right array. For this we shift the min pointer to the mid and the max pointer remains at the end of the array.

3) The value is greater than the number to be search, in this case it doesn’t make sense to search in the right side of the mid index, because the value to be searched is lesser than the value at the mid. Hence, we search in the left array. For this we shift the max pointer to the mid and the min pointer remains at the beginning of the array.

Repeat the above steps till we either find a match and return the value of mid or min becomes equal to max and return -1
This is the Binary Search Technique/Algorithm

Analyzing the running time for this approach

If you observe closely, we divide the array into two halves and shorten the range in which we are searching. There is only one comparison involved with the element at the mid index and we keep shrinking the range till we are left with no more elements to shrink.

That means we compare ones for each mid we calculate, and how many mid do we calculate?
When we keep dividing by 2, we will eventually end up with 3 divisions for the above example, or to generalize it I would say log2N divisions. That means we are doing 1 comparison for each of the log2N divisions of the array. Hence the running time would add up to O(log2N) multiplied by a constant [In programming we write log2N as logN]. And when we analyze the efficiency of an algorithm, we do not care much about the constant terms. So we can say that the running time is O(logN).

Result

After seeing both the approaches above, we now know that the sorted array gives a better performance for searching. One thing we missed and that is the time taken for sorting, but do we really care?
If we have to fire 1,048,576 (i.e 220) search queries and we have an array of 1024(i.e. 210) numbers. The time spent in sorting will not make much difference compared to 1,048,576 search queries because sorting has to be done just once.

With sorted array
Time for sorting is almost NlogN = 1024 * 10
Time for searching is logN = 20.

With unsorted array
Time for searching is N = 1048576

This is a real difference.

What is a Binary Search Tree

With our base built up by the above discussion and the articles on Trees and Binary Trees it will be very easy to understand the idea behind the binary search tree or BST.

The way I explain it is slightly different from the way you will find it at other places. For the sake of understanding you can think it as follows:

“The relationship between the binary tree and a BST is same as the relationship between a unsorted array and a sorted array respectively”.

Well they are not exactly similar but we can say the following:

1) Building a sorted array and building a BST will have similar running time.
2) Search operations on a sorted array and a BST(Randomly build) will take same running time. Yes you read it correctly, it is same when the BST is randomly built, will talk about this phrase in coming articles.

But the above similarities clearly depict that we can interchangeably use the sorted array and the BST. There are definitely some limitations like, in arrays we can do addition only at the end but in trees we can add nodes at 2h locations where h is the height of the root.

Formal definition of Binary Search Tree

To make it more simple I would like to present it in a different way, for a given node N the value stored at the left child L is always less than the value stored at N and the value stored at the right child R is always greater than the value stored at N.

The below diagram makes it more clearer:
bst
In the above diagram, left child of each node contains a value lower than the node value and each right contains a value higher than the node value.

Conclusion

In this article we learnt the binary search algorithm and understood the time complexity associated with it. We also touched the basics of BST(binary search tree) and built a base upon it to create and use binary search tree in the coming articles.

Hope this helps, happy reading. Stay connected and stay Subscribed