Analysis of Heap Creation Algorithm
Let us analyze the algorithm in terms of running time and space.
Space complexity seems to be constant or O(1) as there is no extra space, we might just use one or two variables for swapping and loop counters.
Time Complexity is pretty interesting here. A simpler analysis will suggest that in the worst case, we will have N/2 elements (all the inner nodes of the tree) which might disobey the max heap property and all of them need to go through the heapify process.
Also, we learnt in the previous post that the cost of heapify in terms of running time is O(logN) for a N node heap. So If there are N/2 nodes which need heapify, the total cost would be N/2 O(logN) or asymptotically speaking it would be O(NlogN).
I think I actually didn’t spend this much time creating the heap. What it means is that we didn’t do a (logN) operation for each of the N/2 nodes. Probably there we need some more work to understand if it actually takes O(NlogN) or a smaller term for creation of heap.
There are couple of points we must first agree on before deriving any stricter bound on the running time.
- Not all the nodes in the heap are of height logN. There are very few nodes at height logN (in fact just one node is at that height).
- The number of nodes at any given height is N/(2h+1) for any height. Below is the proof:
- The total number of nodes in the tree is N, say the height of the tree is H.
- The total number of nodes at height zero, is all the leaves which is at most 2H . Let us say N(0) = 2H
- For every level upward the number of nodes get divided by 2. Means the number of nodes at height 1 is at most N(1) = (2H)/2
- Going further up for height 2, the number of nodes at height 2 would be most N(2) = (2H)/2*2 and so on.
- Now the relation between 2H and N is , N < (2H+1) or we can say 2H = N/2
- let us generically find N(h) which is the number of nodes at height h. N(h) = (2H)/(2h)
- Replacing 2H = N/2, we get N(h) = N/(2h+1)
- At any height h the running time of heapify is proportional to h. Hence, the running time of heapify is O(h).
- Once we agree to these terms, we can say that the running time of this algorithm can be expressed in the terms of height h as
Where the summation of the series h/2h over infinite numbers converges to 2. Hence the running time T(N) can be denoted as O(n), which is linear in the number of nodes in Heap.
That would be all for this post. Please let me know in comments if any of the part was unclear and needs a formal proof. We learnt to build a heap in linear time and proved it as well.
In the next post we will try and understand the usage of heaps for e.g. heap sort and priority queues and further extend it for some real problems. Stay connected and stay subscribed. 🙂