This website contains ALL LeetCode **Premium** problems for
**FREE!!**.

All leaked interview problems are collected from Internet.

All leaked interview problems are collected from Internet.

Median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value. So the median is the mean of the two middle value.

Examples:`[2,3,4]`

, the median is `3`

`[2,3]`

, the median is `(2 + 3) / 2 = 2.5`

Design a data structure that supports the following two operations:

- void addNum(int num) - Add a integer number from the data stream to the data structure.
- double findMedian() - Return the median of all elements so far.

For example:

addNum(1) addNum(2) findMedian() -> 1.5 addNum(3) findMedian() -> 2

**Credits:**

Special thanks to @Louis1992 for adding this problem and creating all test cases.

b'

\n## Solution

\n

\n#### Approach #1 Simple Sorting [Time Limit Exceeded]

\n

\n#### Approach #2 Insertion Sort [Time Limit Exceeded]

\n

\n#### Approach #3 Two Heaps! [Accepted]

\n\n\n

\n#### Approach #4 Multiset and Two Pointers [Accepted]

\n

\n#### Further Thoughts

\n

\n

'
\n\n

\n\n

**Intuition**

Do what the question says.

\n**Algorithm**

Store the numbers in a resize-able container. Every time you need to output the median, sort the container and output the median.

\n\n**Complexity Analysis**

- \n
- \n
Time complexity: .

\n- \n
- Adding a number takes amortized time for a container with an efficient resizing scheme. \n
- Finding the median is primarily dependent on the sorting that takes place. This takes time for a standard comparative sort. \n

\n - \n
Space complexity: linear space to hold input in a container. No extra space other than that needed (since sorting can usually be done in-place).

\n \n

\n

**Intuition**

Keeping our input container always sorted (i.e. maintaining the sorted nature of the container as an *invariant*).

**Algorithm**

Which algorithm allows a number to be added to a sorted list of numbers and yet keeps the entire list sorted? Well, for one, **insertion sort!**

We assume that the current list is already sorted. When a new number comes, we have to add it to the list while maintaining the sorted nature of the list. This is achieved easily by finding the correct place to insert the incoming number, using a **binary search** (remember, the list is *always sorted*). Once the position is found, we need to shift all higher elements by one space to make room for the incoming number.

This method would work well when the amount of insertion queries is lesser or about the same as the amount of median finding queries.

\n\n**Complexity Analysis**

- \n
- \n
Time complexity: .

\n- \n
- Binary Search takes time to find correct insertion position. \n
- Insertion can take up to time since elements have to be shifted inside the container to make room for the new element. \n

\n

\n\n\n

Pop quiz:Can we use alinearsearch instead of abinarysearch to find insertion position, without incurring any significant runtime penalty?

- \n
- Space complexity: linear space to hold input in a container. \n

\n

**Intuition**

The above two approaches gave us some valuable insights on how to tackle this problem. Concretely, one can infer two things:

\n- \n
- If we could maintain direct access to median elements at all times, then finding the median would take a constant amount of time. \n
- If we could find a reasonably fast way of adding numbers to our containers, additional penalties incurred could be lessened. \n

But perhaps the most important insight, which is not readily observable, is the fact that we *only* need a consistent way to access the median elements. Keeping the *entire* input sorted is **not a requirement.**

\n\nWell, if only there were a data structure which could handle our needs.

\n

As it turns out there are two data structures for the job:

\n- \n
- Heaps (or Priority Queues
^{1}) \n - Self-balancing Binary Search Trees (we\'ll talk more about them in Approach #4) \n

Heaps are a natural ingredient for this dish! Adding elements to them take logarithmic order of time. They also give direct access to the maximal/minimal elements in a group.

\nIf we could maintain *two* heaps in the following way:

- \n
- A max-heap to store the smaller half of the input numbers \n
- A min-heap to store the larger half of the input numbers \n

This gives access to median values in the input: they comprise the top of the heaps!

\n**Wait, what? How?**

If the following conditions are met:

\n- \n
- Both the heaps are balanced (or nearly balanced) \n
- The max-heap contains all the smaller numbers while the min-heap contains all the larger numbers \n

then we can say that:

\n- \n
- All the numbers in the max-heap are smaller or equal to the top element of the max-heap (let\'s call it ) \n
- All the numbers in the min-heap are larger or equal to the top element of the min-heap (let\'s call it ) \n

Then and/or are smaller than (or equal to) almost half of the elements and larger than (or equal to) the other half. That is *the* definition of **median** elements.

This leads us to a huge point of pain in this approach: **balancing the two heaps!**

**Algorithm**

- \n
- \n
Two priority queues:

\n- \n
- A max-heap
`lo`

to store the smaller half of the numbers \n - A min-heap
`hi`

to store the larger half of the numbers \n

\n - A max-heap
- \n
The max-heap

\n`lo`

is allowed to store, at worst, one more element more than the min-heap`hi`

. Hence if we have processed elements:- \n
- If , then
`lo`

is allowed to hold elements, while`hi`

can hold elements. \n - If , then both heaps are balanced and hold elements each. \n

This gives us the nice property that when the heaps are perfectly balanced, the median can be derived from the tops of both heaps. Otherwise, the top of the max-heap

\n`lo`

holds the legitimate median. \n - If , then
- \n
Adding a number

\n`num`

:- \n
- Add
`num`

to max-heap`lo`

. Since`lo`

received a new element, we must do a balancing step for`hi`

. So remove the largest element from`lo`

and offer it to`hi`

. \n - The min-heap
`hi`

might end holding more elements than the max-heap`lo`

, after the previous operation. We fix that by removing the smallest element from`hi`

and offering it to`lo`

. \n

The above step ensures that we do not disturb the nice little size property we just mentioned.

\n \n - Add

A little example will clear this up! Say we take input from the stream `[41, 35, 62, 5, 97, 108]`

. The run-though of the algorithm looks like this:

Adding number 41\nMaxHeap lo: [41] // MaxHeap stores the largest value at the top (index 0)\nMinHeap hi: [] // MinHeap stores the smallest value at the top (index 0)\nMedian is 41\n=======================\nAdding number 35\nMaxHeap lo: [35]\nMinHeap hi: [41]\nMedian is 38\n=======================\nAdding number 62\nMaxHeap lo: [41, 35]\nMinHeap hi: [62]\nMedian is 41\n=======================\nAdding number 4\nMaxHeap lo: [35, 4]\nMinHeap hi: [41, 62]\nMedian is 38\n=======================\nAdding number 97\nMaxHeap lo: [41, 35, 4]\nMinHeap hi: [62, 97]\nMedian is 41\n=======================\nAdding number 108\nMaxHeap lo: [41, 35, 4]\nMinHeap hi: [62, 97, 108]\nMedian is 51.5\n

**Complexity Analysis**

- \n
- \n
Time complexity: .

\n- \n
- At worst, there are three heap insertions and two heap deletions from the top. Each of these takes about time. \n
- Finding the mean takes constant time since the tops of heaps are directly accessible. \n

\n - \n
Space complexity: linear space to hold input in containers.

\n \n

\n

**Intuition**

Self-balancing Binary Search Trees (like an AVL Tree) have some *very* interesting properties. They maintain the tree\'s height to a logarithmic bound. Thus inserting a new element has reasonably good time performance. The median **always** winds up in the root of the tree and/or one of its children. Solving this problem using the same approach as Approach #3 but using a Self-balancing BST seems like a good choice. Except the fact that implementing such a tree is not trivial and prone to errors.

*Why reinvent the wheel?* Most languages implement a `multiset`

class which emulates such behavior. The only problem remains keeping track of the median elements. That is easily solved with **pointers!** ^{2}

We maintain two pointers: one for the lower median element and the other for the higher median element. When the total number of elements is odd, both the pointers point to the same median element (since there is only one median in this case). When the number of elements is even, the pointers point to two consecutive elements, whose mean is the representative median of the input.

\n**Algorithm**

- \n
- \n
Two iterators/pointers

\n`lo_median`

and`hi_median`

, which iterate over the`data`

multiset. \n - \n
While adding a number

\n`num`

, three cases arise:- \n
- The container is currently
**empty.**Hence we simply insert`num`

and set both pointers to point to this element. \n - \n
The container currently holds an

\n**odd**number of elements. This means that both the pointers currently point to the same element.- \n
- If
`num`

is not equal to the current median element, then`num`

goes on either side of it. Whichever side it goes, the size of that part increases and hence the corresponding pointer is updated. For example, if`num`

is less than the median element, the size of the lesser half of input increases by on inserting`num`

. Thus it makes sense to decrement`lo_median`

. \n - If
`num`

is equal to the current median element, then the action taken is dependent on how`num`

is inserted into`data`

.**NOTE:**In our given C++ code example,`std::multiset::insert`

inserts an element*after*all elements of equal value. Hence we increment`hi_median`

. \n

\n - If
- \n
The container currently holds an

\n**even**number of elements. This means that the pointers currently point to consecutive elements.- \n
- If
`num`

is a number between both median elements, then`num`

becomes the new median. Both pointers must point to it. \n - Otherwise,
`num`

increases the size of either the lesser or higher half of the input. We update the pointers accordingly. It is important to remember that both the pointerspoint to the same element now.*must*\n

\n - If

\n - The container is currently
- \n
Finding the median is easy! It is simply the

\n**mean**of the elements pointed to by the two pointers`lo_median`

and`hi_median`

. \n

A much shorter (but harder to understand), *one**pointer* version ^{3} of this solution is given below:

**Complexity Analysis**

- \n
- \n
Time complexity: .

\n- \n
- Inserting a number takes time for a standard
`multiset`

scheme.^{4}\n - Finding the mean takes constant time since the median elements are directly accessible from the two pointers. \n

\n - Inserting a number takes time for a standard
- \n
Space complexity: linear space to hold input in container.

\n \n

\n

There are so many ways around this problem, that frankly, it is scary. Here are a few more that I came across:

\n- \n
- \n

\n**Buckets!**If the numbers in the stream are statistically distributed, then it is easier to keep track of buckets where the median would land, than the entire array. Once you know the correct bucket, simply sort it find the median. If the bucket size is significantly smaller than the size of input processed, this results in huge time saving. @mitbbs8080 has an interesting implementation here. \n - \n

\n**Reservoir Sampling.**Following along the lines of using buckets: if the stream is statistically distributed, you can rely on Reservoir Sampling. Basically, if you could maintain just one good bucket (or*reservoir*) which could hold a representative sample of the entire stream, you could estimate the median of the entire stream from just this one bucket. This means good time and memory performance. Reservoir Sampling lets you do just that. Determining a**"good"**size for your reservoir?*Now, that\'s a whole other challenge.*A good explanation for this can be found in this StackOverflow answer. \n - \n

\n**Segment Trees**are a great data structure if you need to do a lot of insertions or a lot of read queries over a limited range of input values. They allow us to do all such operations*fast*and in roughly the*same amount of time*,**always.**The only problem is that they are far from trivial to implement. Take a look at my introductory article on Segment Trees if you are interested. \n - \n

\n**Order Statistic Trees**are data structures which seem to be tailor-made for this problem. They have all the nice features of a BST, but also let you find the order element stored in the tree. They are a pain to implement and no standard interview would require you to code these up. But they are fun to use if they are already implemented in the language of your choice.^{5}\n

\n

Analysis written by @babhishek21.

\n\n

\n

\n\n

- \n
- \n
Priority Queues queue out elements based on a predefined priority. They are an abstract concept and can, as such, be implemented in many different ways. Heaps are an efficient way to implement Priority Queues.\xc2\xa0\xe2\x86\xa9

\n \n - \n
Shout-out to @pharese for this approach.\xc2\xa0\xe2\x86\xa9

\n \n - \n
Inspired from this post by @StefanPochmann.\xc2\xa0\xe2\x86\xa9

\n \n - \n
Hinting can reduce that to amortized constant time.\xc2\xa0\xe2\x86\xa9

\n \n - \n

\n**GNU**`libstdc++`

users are in luck! Take a look at this StackOverflow answer.\xc2\xa0\xe2\x86\xa9 \n