Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
Given arrival and departure times of N trains that reach a railway station, for a given k platforms, return the maximum number of trains that we can house on the k platforms.
k <<< N
Arrival and departure time Array
Input: arr[] = {9:00, 9:40, 9:50, 11:00, 15:00, 18:00}
dep[] = {9:10, 12:00, 11:20, 11:30, 19:00, 20:00}
This question was asked of me in some interview, so what is best algorithm for that? This question is slightly modified from this question.
I tried greedy algorithm for this question but it is not working for all test cases.
For eg:for k=2
We are given time intervals
arr[]={1:00,1:30,2:00}
dept[]={1:40,2:10,3:30}
'
be removing the {1:30 and 2:10 interval we can do the task for k=2
{1:00-1:40} and {2:00-3:30} as no other train occurs between this time
It seems to me(I do not have a rigorous proof for it) that a greedy algorithm should work:
Sort the trains by their departure time.
Let's maintain an array lastDeparture of size k, where the lastDeparture[i] is the time when the last train leaves the i platform(initially filled with zeros).
Let's iterate over the trains array and do the following:
Find all such platforms that lastDeparture[i] <= currentTrain.arrival.
If there are no such platforms, continue.
Otherwise, choose the one with the largest lastDeparture value(if there are several such platforms, we can pick any of them).
increase the answer by one and add the current train to this platform(that is, assign lastDeparture[i] = currentTrain.departure.
A sketch of a proof:
Let's assume that our solution is not optimal. Let's find the first train that is in our answer but not in the optimal one. There should be some other train instead of it. But it's departure time is greater. Thus, the total cannot increase when we exchange these two trains.
Time complexity: O(n log n)(step 3 can be handled efficiently using a balanced binary search tree that keeps platforms sorted by the last departure time).
I think I previously misunderstood the question.
If we have a limited number of platforms I now think we're being asked to cancel a minimum number of trains so that the schedule never overwhelms the platforms.
Brute force:
Merge & Sort the Arrivals and Departures (but keeping track of which is which and identifying which train arrives/departs).
Walk through the array adding one to a counter for each arrival and subtracting one for each departure.
If the counter is k and a train arrives cancel a train which is at the station with the longest time left on the platform at the time of the 'overflowing' arrival. NB: This may be the arriving train or a train already on a platform.
The answer is the total number of trains minus the number of cancelled trains.
Notice that by cancelling a train at the station with the longest time left on the platform we cancel the minimum number of trains. We have to cancel a train at the station to release a platform and the ones with the most time left have maximal potential to free space for future arrivals.
This will be O(N*K) complexity in the worst case if the arrivals and departures are given sorted and can be quickly riffled together. I notice the given example is nearly so.
The complexity is the worst case of the sort and O(N*K) tallying.
If I understand the problem correctly, I believe this can be done by using a stack of size k that will contain the trains currently on a platform. For each train (in sorted order by departure times):
while current.ArrivalTime > stack.Last.DepartureTime:
remove the top element (train) from the stack
push the current train IF there is room, else ignore it
answer = max(answer, stack.Size)
The maximum size that your stack reaches would be the answer to the problem.
This should be O(n log n) because of the sort, since each train enters / leaves the stack at most once.
Related
This is an interview question.
There are billions and billions of stars in the universe. Which data structure would you use to answer the query
"Give me the k stars nearest to Earth".
I thought of heaps. As we can do heapify in O(n) and get the n_smallest in O(logn). Is there a better data structure suited for this purpose?
Assuming the input could not be all stored in memory at the same time (that would be a challenge!), but would be a stream of the stars in the universe -- like you would get an iterator or something -- you could benefit from using a Max Heap (instead of a Min Heap which might come to mind first).
At the start you would just push the stars in the heap, keyed by their distance to earth, until your heap has k entries.
From then on, you ignore any new star when it has a greater distance than the root of your heap. When it is closer than the root-star, substitute the root with that new star and sift it down to restore the heap property.
Your heap will not grow greater than k elements, and at all times it will consist of the k closest stars among those you have processed.
Some remarks:
Since it is a Max Heap, you don't know which is the closest star (in constant time). When you would stop the algorithm and then pull out the root node one after the other, you would get the k closest stars in order of descending distance.
As the observable(!) Universe has an estimated 1021 number of stars, you would need one of the best supercomputers (1 exaFLOPS) to hope to process all those stars in a reasonable time. But at least this algorithm only needs to keep k stars in memory.
The first problem you're going to run into is scale. There are somewhere between 100 billion and 400 billion stars in the Milky Way galaxy alone. There is an estimated 10 billion galaxies. If we assume an average of 100 billion stars per galaxy, that's 10^19 stars in the universe. It's unlikely you'll have the memory for that. And even if you did have enough memory, you probably don't have the time. Assuming your heapify operation could do a billion iterations per second, it would take a trillion seconds (31,700 years). And then you have to add the time it would take to remove the k smallest from the heap.
It's unlikely that you could get a significant improvement by using multiple threads or processes to build the heap.
The key here will be to pre-process the data and store it in a form that lets you quickly eliminate the majority of possibilities. The easiest way would be to have a sorted list of stars, ordered by their distance from Earth. So Sol would be at the top of the list, Proxima Centauri would be next, etc. Then, getting the nearest k stars would be an O(k) operation: just read the top k items from the list.
A sorted list would be pretty hard to update, though. A better alternative would be a k-d tree. It's easier to update, and getting the k nearest neighbors is still reasonably quick.
So this question is more of an algorithm/approach seeking question where I'm looking for any thoughts/insights on how I can approach this problem. I'm browsing through a set of programming problems and came across one question where I'm required to provide the minimum number of moves needed to sort a list of items. Although this problem is marked as 'Easy', I can't find a good solution for this. Your thoughts are welcome.
The problem statement is something like this.
X has N disks of equal radius. Every disk has a distinct number out of 1 to N associated with it. Disks are placed one over other in a single pile in a random order. X wants to sort this pile of disk in increasing order, top to bottom. But he has a very special method of doing this. In a single step he can only choose one disk out of the pile and he can only put it at the top. And X wants to sort his pile of disks in minimum number of possible steps. Can you find the minimum number of moves required to sort this pile of randomly ordered disks?
The easy way to solving it without considering making minimum moves will be:
Take a disk that is max value and put it on top. And then take the second max and put it on top. And so on till all are sorted. Now this greedy approach will not always give you min steps.
Consider this example: [5,4,1,2,3] with the above greedy approach it will be like this:
[5,4,1,2,3]
[4,1,2,3,5]
[1,2,3,5,4]
[1,2,5,4,3]
[1,5,4,3,2]
[5,4,3,2,1]
Which takes 5 moves, but the min moves should be this:
[5,4,1,2,3]
[5,4,1,3,2]
[5,4,3,2,1]
Which takes only 2
To get min moves, first think how many values are already in descending order starting from N, you can consider those something you don’t need to move. And for the rest you have to move which is the min value. For example
[1,5,2,3,10,4,9,6,8,7]
Here starting from 10 there are in total 4 numbers that are in desc order [10,9,8,7] for the rest you need to move. So the min moves will the 10-4 = 6
[1,5,2,3,10,4,9,6,8,7]
[1,5,2,3,10,4,9,8,7,6]
[1,2,3,10,4,9,8,7,6,5]
[1,2,3,10,9,8,7,6,5,4]
[1,2,10,9,8,7,6,5,4,3]
[1,10,9,8,7,6,5,4,3,2]
[10,9,8,7,6,5,4,3,2,1]
I'm trying to solve a variant of 2048 by a Monte-Carlo Tree Search. I found that UCT could a good way to have some trade-off between exploration/exploitation.
My only issue is that all the versions I've seen assume that the score is a win percentage. How can I adapt it to a game where the score is the value of the board at the last state, and thus going from 1-MAX and not a win.
I could normalize the score using the constant c by dividing by MAX but then it would overweight exploration at early stage of the game (since you get bad average score) and overweight exploitation at late stage of the game.
Indeed most of the literature assumes your games are either lost or won and award a score of 0 or 1, which will turn into a win ratio when averaged over the number of games played. Then exploration parameter C is usually set to sqrt(2) which is optimal for the UCB in bandit problems.
To find out what a good C is in general you have to step back a bit and see what the UCT is really doing. If one node in your tree had an exceptionally bad score in the one rollout it had then exploitation says you should never choose it again. But you've only played that node once, so it might have just been bad luck. To acknowledge this you give that node a bonus. How much? Enough to make it a viable choice even if its average score is the lowest possible and some other node has the highest average score possible. Because with enough plays it might turn out that the one rollout your bad node had was indeed a fluke, and the node actually turns out to be pretty reliable with good scores. Of course, if you get more bad scores then it will likely not be bad luck so it won't deserve more rollouts.
So with scores ranging from 0 to 1 a C of sqrt(2) is a good value. If your game has a maximum achievable score then you can normalize your scores by dividing by the max and force your scores into to 0-1 range to suit a C of sqrt(2). Alternatively you don't normalize the scores but multiply C by your maximum score. The effect is the same: the UCT exploration bonus is large enough to give your underdog nodes some rollouts and a chance to prove themselves.
There is an alternative way of setting C dynamically that has given me good results. As you play, you keep track of the highest and lowest scores you've ever seen in each node (and subtree). This is the range of scores possible and this gives you a hint of how big C should be in order to give not well explored underdog nodes a fair chance. Every time i descend into the tree and pick a new root i adjust C to be sqrt(2) * score range for the new root. In addition, as rollouts complete and their scores turn out the be a new highest or lowest score i adjust C in the same way. By continually adjusting C this way as you play but also as you pick a new root you keep C as large as it needs to be to converge but as small as it can be to converge fast. Note that the minimum score is as important as the max: if every rollout will yield at minimum a certain score then C won't need to overcome it. Only the difference between max and min matters.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Years ago, in a job interview, I was asked: When is it worth to sort an array? I remember not being able to answer properly, recently I did an algorithm course, and I come to the conclusion that providing a more "academical" response would have probably got me that job... anyway, it is not possible to fix the past, so far I am trying to formally answer it to myself, currently, this is where I am:
Given an array, the time to search will be
O(n) if not sorted
O(log(n)) if sorted
Considering that quick sort sorts in O(n*log(n))
When is it worth to sort an array? It would of course depend on the number of times we are going to search the array.
Cost of searching x times in sorted array = O(n*log(n)) + [O(log(n)) * x]
Cost of searching x times in unsorted array = O(n) * x
What would be the value of x?
Personally I would answer, that it is worth sorting an array if any of the following is true:
we plan to often ask for the biggest value in the array (cut cost from O(n) to O(1)),
we plan to often ask for the smallest value in the array (cut cost from O(n) to O(1)),
we will often seek for a given value in the array (cut cost from O(n) to O(log(n)).
If it is possible to sort the array in O(n) (for ex., the data fulfills the criteria for counting sort) we would start gaining from search operation (so the total time, including time necessary for sorting, will be smaller than time taken by searching in an unsorted array) after k operations, where k = constantOfSortingOperation / (n/log(n)) (time it took to sort the array divided by gain from searching the sorter array).
If we sort the array in O(nlogn), using for ex. HeapSort or QuickSort (where the constant hidden in big-O notation is small) we would start gaining from search operation after k = (constant*nlogn)/ (n/logn). constant/nlogn is basically how many Times could we search the unsorted array if we spend the time not on sorting, but on searching. n/logn is how much we gain from a single search in a sorter array compared to search in an unsorted array. So if we consider our constant to be small (much smaller than n) the time after we start gaining (= x, more or less) would be approximately n*logn * logn / n = (log(n))^2.
If we include calculations of gains from getting the biggest/smallest value, we start gaining from sorting an array much faster.
I have input array A
A[0], A[1], ... , A[N-1]
I want function Max(T,A) which return B represent max value on A over previous moving window of size T where
B[i+T] = Max(A[i], A[i+T])
By using max heap to keep track of max value on current moving windows A[i] to A[i+T], this algorithm yields O(N log(T)) worst case.
I would like to know is there any better algorithm? Maybe an O(N) algorithm
O(N) is possible using Deque data structure. It holds pairs (Value; Index).
at every step:
if (!Deque.Empty) and (Deque.Head.Index <= CurrentIndex - T) then
Deque.ExtractHead;
//Head is too old, it is leaving the window
while (!Deque.Empty) and (Deque.Tail.Value > CurrentValue) do
Deque.ExtractTail;
//remove elements that have no chance to become minimum in the window
Deque.AddTail(CurrentValue, CurrentIndex);
CurrentMin = Deque.Head.Value
//Head value is minimum in the current window
it's called RMQ(range minimum query). Actually i once wrote an article about that(with c++ code). See http://attiix.com/2011/08/22/4-ways-to-solve-%C2%B11-rmq/
or you may prefer the wikipedia, Range Minimum Query
after the preparation, you can get the max number of any given range in O(1)
There is a sub-field in image processing called Mathematical Morphology. The operation you are implementing is a core concept in this field, called dilation. Obviously, this operation has been studied extensively and we know how to implement it very efficiently.
The most efficient algorithm for this problem was proposed in 1992 and 1993, independently by van Herk, and Gil and Werman. This algorithm needs exactly 3 comparisons per sample, independently of the size of T.
Some years later, Gil and Kimmel further refined the algorithm to need only 2.5 comparisons per sample. Though the increased complexity of the method might offset the fewer comparisons (I find that more complex code runs more slowly). I have never implemented this variant.
The HGW algorithm, as it's called, needs two intermediate buffers of the same size as the input. For ridiculously large inputs (billions of samples), you could split up the data into chunks and process it chunk-wise.
In sort, you walk through the data forward, computing the cumulative max over chunks of size T. You do the same walking backward. Each of these require one comparison per sample. Finally, the result is the maximum over one value in each of these two temporary arrays. For data locality, you can do the two passes over the input at the same time.
I guess you could even do a running version, where the temporary arrays are of length 2*T, but that would be more complex to implement.
van Herk, "A fast algorithm for local minimum and maximum filters on rectangular and octagonal kernels", Pattern Recognition Letters 13(7):517-521, 1992 (doi)
Gil, Werman, "Computing 2-D min, median, and max filters", IEEE Transactions on Pattern Analysis and Machine Intelligence 15(5):504-507 , 1993 (doi)
Gil, Kimmel, "Efficient dilation, erosion, opening, and closing algorithms", IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12):1606-1617, 2002 (doi)
(Note: cross-posted from this related question on Code Review.)