Heap(specifically binary heap) is a special data structure which can be viewed as a nearly complete binary tree. This post is mostly focused on understanding the heap datastructure
The first few impressions about heap after reading the above line:
- The height of the binary tree would be minimum. For e.g.: if the tree has N nodes, the height would be logN.
- The tree can also be stored in an array, in fact it would be easier to store this tree in an array (because it is complete).
- This makes it an awesome data structure because the minimum height guarantees logN running time for all the basic operations.
- Each node of the key contains a key.
Binary heaps are of two types: a max heap and a min heap. For further discussion we will consider max heap, and in case you need to learn about min heap, it would be appropriate to reverse few of the things and get a min heap out of it.
Properties of a Heap
In order to offer the benefits like minimal height and minimal running time for certain operations, the heap maintains few properties. These properties are called the heap property. In a max heap:
- For a given node the key stored at the node is always smaller than the key stored at its parent.
- This definition can be recursively applied for all the nodes in the heap except the root, because root has no parent.
- It also means that the root contains the largest key in a max heap. For a min heap the root will contain the smallest key.
- Does this remind us of any practical problem? The priority queue?
- Technically the priority queue can be implemented with a min heap. Depending on how you treat priority you can choose the max heap as well.
- At any given level of the binary tree,the level is filled from left to right.
Here is the tree representation of a max heap.
Storing a Heap
As this is a complete binary tree and we can guarantee that the level is filled from left to right, we can use an array to store this tree. For a N node heap, we can have an sufficiently large array of length L where L > N. And the storage follows:
- The root is stored at index 1 (if the array is 1 indexed)
- The children of the root are stored at index 2 and 3.
- As a general rule, a node stored at index i, will have its left child at index 2i and its right child at index 2i+1.
- This also reversely proves that for a node at index i or index i, its parent is stored at i/2.
Here is the array representation of the above max heap.
Let us check if our formula for parent child relation is true. The element at index 3 is 12 and its children are at indices 2*3 = 6 and 2*3 + 1 = 7.
The value at index 6 is 7 and the value at index 7 is 9 and this seems to be correct because the children of 12 in the above tree are 7 and 9.
Maintaining the Heap Property
As we know that a dynamic data structure offers addition and removal. Situations can occur when an insertion or deletion disturbs the heap property. In such a scenario we need to fix the disturbance and restore the heap property.
This fix is termed as Heapify and before understanding what will heapify do, we need to understand what kind of disturbances can occur.
The possible disturbance is having a key stored at parent which is smaller than the key stored at either or both its children. This can occur as a part of removal or addition of a new key, which we will see later.
The situation is pictorially defined below. In the below diagram we see that the node with value 2 is disturbed and we need to propagate it down to restore the heap property.
The fix is pretty simple
- Exchange the key at disturbed node, with one of its children. How to choose the child
- If both left and right child has a larger key than the node’s key, swap with the right child.
- If the left child has a larger key than the node’s key, swap with the left child.
- Else swap with the right child.
- This will fix the current disturbed node. But two situations arrive.
- The children satisfy the max heap property and nothing needs to be done, heapify terminates.
- One of the children gets disturbed and has to be fixed. So, repeat from step 1 with the children now.
Pseudo Code for Heapify of a max heap
Heapify(A, i) // heapify the element at index i of array A
L ← 2*i // get the index of the left child
R ← 2*i + 1 // get the index of the right child
if L ≤ HEAP_SIZE(A) and A[L] > A[i]
swapIndex ← L
else swapIndex ← i
if R ≤ HEAP_SIZE(A) and A[R] > A[swapIndex]
swapIndex ← R
if swapIndex ≠ i
swap A[i] ↔ A[swapIndex] // swapped the disturbed node with one of the child
Heapify(A, swapIndex) // recursively heapify if the child is disturbed
Running time of the heapify routine:
The above pseudo code suggests that a heapify operation consists of two things:
- A comparison and key swap, which is a constant time operation.
- Recursively heapifying the children’s subtree as we compare with both the children and perform an appropriate operation.
- The recursion divides the problem into smaller pieces and has to evaluate one subtree completely.
- The size of the subtree in the worst case could be a fully filled subtree in which case we will end up comparing nodes till the last level of the tree.
- We can prove that the number of nodes in a fully filled subtree has an upper bound of 2N/3
- Hence the recurrence equation would look like T(N) <= T(2N/3) + Θ(1)
You can always solve this recurrence equation using master method as discussed in a previous post.
Please Note: If you are trying to solve this using master method, for large enough N the graph of Nlog32 is almost parallel to x axis. Hence it will result in a constant similar to Θ(1)
Hence, the heapify routine will run in O(logN) time.
Proving upper bound on number of nodes of fully filled subtree
Here is a tree such that the children’s sub tree is fully filled.
Let us say that the height of the tree is H and the number of nodes in the tree is N
- Hence the number of nodes in left sub tree is 2H – 1
- The number of nodes in the right subtree is 2H-1 – 1
- Total nodes in the tree is 1 + 2H – 1 + 2H-1 – 1 , which is equal to 2H – 1 + 2H-1.
- We can also write this relation as N = 2H – 1 + 2H-1.
- This can be simplified as N+1 = 2H(1+1/2)
- That implies (2/3)(N) = 2H – (2/3) and this can be written as (2/3)(N) >= 2H
- This is an upper bound on the number of nodes of a fully filled subtree.
This was just an introduction to the Heap data structures. We will understand how to build a heap in the next post. Also, we will evaluate the running time of the heap building process.
Apart from that we will also learn about basic operations in heap and will discuss a new sorting algorithm called HeapSort.
Let me know in comments if you have any difficulties understanding any part of this post. Stay connected and stay subscribed.