Heap Sort is one of the efficient sorting algorithms with a guaranteed worst case running time of O(NlogN). This is an in place sorting algorithm but it does not offer stable sorting.
In place Sorting: A sorting algorithm which sorts the input using very small or no extra space. It means that the input data is overwritten by the resulting output data.
Stable Sorting: A stable sorting algorithm is one, in which the relative order of equal keys is preserved. It means if there are non unique keys in the input data, then their relative order of occurrence in the input as well as output remains the same. We can verify this by attaching unique satellite data with the duplicate keys.
After reading the previous posts about Heaps and how to build one from an array, understanding the Heap Sort algorithm is quite easy.
Steps involved in Heap Sort
Let us assume that there is an input array, which is unsorted. The task at hand is to rearrange the elements in the array such that the relative order of all the elements is increasing.
To solve such a problem, we need to use a max heap. Here are the steps involved in heap sort.
- Build a max heap using the array of elements. You can reference the post for this. Building is a process which takes O(N) running time.
- Extract the max element from the heap and swap it with the last element of the heap. This takes O(logN) time.
- This process will reduce the heap size.
- Repeat the process till the heap size reduces to one.
If we closely notice the steps of the algorithm, it divides the array into two parts a sorted one which is the rear part of the array and an unsorted part which is everything from beginning of the array till the left boundary of the sorted part. This is similar to the selection sort except for the fact that the selection sort takes O(N) time to find the max value in the array where as the heap sort algorithm does that it O(logN) time. We will demonstrate this below.
In this process, the total time taken would be O(N) + N * O(logN). Which asymptotically equals to O(NlogN).
Why does Heap Sort guarantees O(NlogN) running time in all cases?
As we can see the first step which is heap building takes O(N) in all cases irrespective of the order of the input data.
Similarly, we are accounting for each of the element to percolate through a distance logN (from root to leaf) which can be the maximum distance possible. Hence no no input would require running time more than O(NlogN) for sorting using the max heap.
Note: You might argue that the running time can be proved to be linear as the heap building process. Here is the reply from Professor Cormen which explains why we can’t have a linear running time.
Now that the algorithm is defined and clear above, we can try and write the pseudo code for the same
for i <- LENGTH(A) down to 2
HEAPSIZE = HEAPSIZE - 1
We can use the example from the last post to diagrammatically understand the whole process.