Finding K nearest distances

Problem Statement

You are given millions of two dimensional point and a utility method to calculate their distances from the origin. Write a code to return the nearest K unique distances from the origin.

If there are more than one point at the same distance, the distance must be just returned once. For e.g. If there are five points (1,1), (2,1), (1,2), (2,2) and (3,3) and the value of K is 3, then we need to return the following:

  • 1.414 – distance of (1,1) from origin.
  • 2.236 – distance of (2,1) or (1,2) from origin as both will be same.
  • 2.828 – distance of (2,2) from origin

Test Cases

Here are few critical test scenarios which must be handled properly

  1. No points
  2. Points less than k
  3. Points more than k, but points with unique distance is less than k
  4. A lot of repeated points
  5. Millions of points

Naive Approach for Finding K nearest distances

  1. Traverse through the list/array of points and add all of them to a Set (handling test case 4)
  2. Traverse through the set and add the distance of each point into another set (enabler for test case 3)
  3. Add all the distances from the set to a List (you can use sort function of the Collections framework on List not Set)
  4. If the size of the List is less than K, return the list (handling test cases 2 and 3)
  5. Else return a sub list of the List from index 0 to K

Source Code

The test case 5 is a bottleneck and if the number of points increase, it would be really tough to allocate that much of memory, consider billions or trillions of points. Creating auxiliary sets or arrays for distances would increase the space complexity. So, we need a different approach

Heap based solution for Finding K nearest distances

  1. Create a MaxHeap
  2. Add first K unique distances into the Heap
  3. Iterate through the remaining points
  4. Calculate the distance of each point.
  5. If the distance is not added in the Heap and it is less than the max value in Heap, insert this value in the Heap and extract the max from the Heap. (This approach doesn’t require auxiliary memory more than size K.)
  6. Return the content of the Heap.

To get the complete code, please visit the github link .

Please suggest more test cases so that it can help everyone who is preparing for an interview.

Stay connected and stay subscribed.