Median of a Stream of Integers

Problem Statement

Here is a very interesting problem which has been asked in many interviews and coding competitions. The problem statement goes as follows:

Given a never ending stream of number, at any point in time, find the median of all the numbers received till that point.

This means, we need to recalculate the new median whenever there is a number available in the input stream. This problems seems to be a very practical one, imagine a lot of sensors in manufacturing plants are pushing data about temperature, pressure and other physical parameters. It is very important to keep track of the average and median of all such parameters. Hence, Median of a Stream of Integers becomes an interesting and useful problem.

Approaching the Problem

Before approaching the problem, we need to understand the meaning of median.

Meaning of Median

Median of a list of given numbers is defined as below:

  • If the list contains odd number of elements, it is the middle element of the sorted list.
  • If the list contains even number of elements, it is the average of the two middle elements of the sorted list.
  • For e.g for a list 3, 5, 2, 4, 1, 9, 8, 6, 7 . The median can be calculated after sorting the list into 1, 2, 3, 4, 5, 6, 7, 8, 9 and finding the middle element that is 5.
  • In case the list contains even number of elements 3, 5, 2, 4, 1, 9, 8, 6, 7, 10 . The median is calculated after sorting the list into 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 and taking the average of the two middle elements. (5+6)/2 = 11/2 = 5.5

Approach

At any point of time, there can be two scenarios:

  • Total numbers in the stream are even
    • Here we need the two middle elements of the sorted array and take an average for the answer.
  • Total numbers in the stream are odd.
    • Here we need the middle element of the sorted array, which is the answer itself.

m1

Let us assume that at any point the median is M

  • Before the start of the stream M would be 0.
  • At any time we just need two numbers
    • The largest number smaller than the median, and
    • The smallest number larger than the median.
  • But to find these two numbers we need to maintain two equal sized lists.
  • Let say the list with numbers smaller than median is LOW, and
  • The list with numbers bigger than median is HIGH

lh

Algorithm

Let us try to solve the problem generically, for a given incoming number in the stream

  1. If both the lists are of equal size
    1. If the incoming number is smaller than M then add it into the LOW list, the highest number in the LOW list is new median.
    2. Otherwise add the incoming number to the HIGH list, the smallest number in the HIGH list is new median.
  2. If HIGH contains more elements than LOW
    1. If the incoming number is smaller than M, add it to the LOW list
    2. Otherwise remove the smallest number from HIGH list and add it to the LOW list and add the incoming number to the HIGH list.
    3. Take the average of the smallest from HIGH and highest from LOW and that is the new median.
  3. If HIGH contains lesser elements than LOW
    1. If the incoming number is smaller than M, add it to the HIGH list
    2. Otherwise remove the biggest number from the LOW list and add it to the HIGH list and add the incoming number in the LOW list
    3. Take the average of the smallest from HIGH and highest from LOW and that is the new median.
  • Santhosh Unnikrishnan

    One confirmation:
    In sub-section “If HIGH contains lesser elements than LOW”, the first point says “If the incoming number is smaller than M, add it to the HIGH list” is it really smaller than M or greater than M? .