Find Median from Data Stream

Question

Numbers keep coming, return the median of numbers at every time a new number added.

Clarification

What's the definition of Median?

  • Median is the number that in the middle of a sorted array. If there are n numbers in a sorted array A, the median is A[(n - 1) / 2]. For example, if A=[1,2,3], median is 2. If A=[1,19], median is 1.

Example

For numbers coming list: [1, 2, 3, 4, 5], return [1, 1, 2, 2, 3].

For numbers coming list: [4, 5, 1, 3, 2, 6, 0], return [4, 4, 4, 3, 3, 3, 3].

For numbers coming list: [2, 20, 100], return [2, 2, 20].

Challenge

Total run time in O(nlogn).

Tags

LintCode Copyright Heap Priority Queue Google

Related Problems

Hard Sliding Window Median 17 % Easy Median 22 % Hard Median of two Sorted Arrays

Problem description on LeetCode:

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

For example,

[2,3,4], the median is3

[2,3], the median is(2 + 3) / 2 = 2.5

Design a data structure that supports the following two operations:

  • void addNum(int num) - Add a integer number from the data stream to the data structure.

  • double findMedian() - Return the median of all elements so far.

Example:

addNum(1)
addNum(2)
findMedian() -> 1.5
addNum(3) 
findMedian() -> 2

Follow up:

  1. If all integer numbers from the stream are between 0 and 100, how would you optimize it?

  2. If 99% of all integer numbers from the stream are between 0 and 100, how would you optimize it?

Analysis

寻找中位数median,这里有一种很巧妙的思路,需要利用Heap的半排序特性,是指root node的值是min (min heap)或者max(max heap)。

利用两个heap,一个minHeap,一个maxHeap,将nums[]的数顺序存入时,则可以分别存入maxHeap和minHeap,并保持这两个heap的size相同,保持两者size相同的操作通过minHeap.offer(maxHeap.poll());maxHeap.offer(minHeap.poll());

The basic idea is to maintain two heaps: a max-heap and a min-heap. The max heap stores the smaller half of all numbers while the min heap stores the larger half. The sizes of two heaps need to be balanced each time when a new number is inserted so that their size will not be different by more than 1. Therefore each time when findMedian() is called we check if two heaps have the same size. If they do, we should return the average of the two top values of heaps. Otherwise we return the top of the heap which has one more element. -- @hanhanbu

https://discuss.leetcode.com/topic/27506/easy-to-understand-double-heap-solution-in-java

Notice

LeetCode中与LintCode里稍有不同,在于对于Median的定义:

LeetCode要求是如果是even number,就计算中间两个数字的平均值,也就是 (maxHeap.peek() + (minHeap.peek()))/2

而LintCode的要求则是当even number时取[N/2]那一个,也就是说不论是否even number,需要返回的都是 maxHeap.peek()

延伸思考

这里建立Heap依然需要知道即将传入的nums[]的长度,如果对于一个未知长度的nums[]这里应当如何处理呢?

Solution

Double Heap (minHeap + maxHeap)

public class Solution {
    PriorityQueue<Integer> maxHeap;//lower half
    PriorityQueue<Integer> minHeap;//higher half

     /**
     * @param nums: A list of integers.
     * @return: the median of numbers
     */
    public int[] medianII(int[] nums) {

        int count = nums.length;
        maxHeap = new PriorityQueue<Integer>(count, Collections.reverseOrder());
        minHeap = new PriorityQueue<Integer>(count);

        int[] ans = new int[count];

        for (int i = 0; i < count; ++i) {
            addNum(nums[i]);
            ans[i] = findMedian();
        }
        return ans;
    }

    // Adds a number into the data structure.
    public void addNum(int num) {
        maxHeap.offer(num);
        minHeap.offer(maxHeap.poll());

        if(maxHeap.size() < minHeap.size()){
            maxHeap.offer(minHeap.poll());
        }
    }

    // Returns the median of current data stream
    public int findMedian() {
        if(maxHeap.size() == minHeap.size()){
            return maxHeap.peek(); // Or `(maxHeap.peek() + (minHeap.peek()))/2`
        }else{
            return maxHeap.peek();
        }
    }
}

Reference

Last updated