Magic Array Index Time/Space Complexity - arrays

I've been looking at the following problem:
Magic Index: A magic index in an array A[0...n-1] is defined to be an index i such as A[i] = i. Given a sorted non-distinct array of integers, write a method to find a magic index if one exists.
Here is my solution:
static int magicNonDistinct(int[] array, int start, int end) {
if (end < start) return -1;
int mid = start + (end - start) / 2;
if (mid < 0 || mid >= array.length) return -1;
int v = array[mid];
if (v == mid) return mid;
int leftEnd = Math.min(v, mid - 1);
int leftRes = magicNonDistinct(array, start, leftEnd);
if (leftRes != -1) return leftRes;
int rightStart = Math.max(v, mid + 1);
int rightRes = magicNonDistinct(array, rightStart, end);
return rightRes;
}
It works just fine and is the recommended solution from the book Cracking The Code Interview 6th Edition, problem 8.3 Follow up (sorry for spoiling).
However when running this on a distinct array with no magic index, it visits all the elements, yielding a worst case running time of O(n).
Since it is recursive it takes O(n) memory as worst case.
Why would this solution be preferable to just iterating over the array? This solution (my own) is better I would argue:
static int magicNonDistinctV2(int[] array) {
for (int i = 0; i < array.length; ++i) {
int v = array[i];
if (v == i) return v;
if (v >= array.length) return -1;
else if (v > i) i = v - 1;
}
return -1;
}
O(n) running time O(1) space always?
Could somebody derive a better time complexity for the initial algorithm? I've been thinking about looking if it is O(d), where d is the number of distinct elements, however that case is also wrong since the min/max only works in one direction (think about if v = 5, mid = 4 and the lower part of the array is all fives).
EDIT:
Ok people think I'm bananas and scream O(log(n)) as soon as they see something that looks like binary search. Sorry for being unclear folks.
Let's talk about the code in the first posting I made (the solution by CTCI):
If we have an array looking like this: [-1, 0, 1, 2, 3, 4, 5, 6, 7, 8], actually an array looking like this: [-1,...,n-2] of size n, we know that there is not element that can match. However - the algorithm will visit all elements since the elements aren't unique. I dare you, run it, it can not divide the search space by 2 as in a regular binary search. Please tell me what is wrong with my reasoning.

No, in my opinion the first solution is not O(log n) as other answers state, it is really O(n) worst case (in the worst case it still needs to go through all the elements, consider equivalence array shifted by one as also mentioned by the author).
The cause why it is not O(log n) is because it needs to search on both sides of the middle (binary search only checks one side of middle therefore it is O(log n)).
It allows to skip items if you're lucky, however your second iterative solution skips items too if not needed to look on them (because you know there cannot be magic index in such range as the array is sorted) so in my opinion the second solution is better (the same complexity + iterative i.e. better space complexity and no recursive calls which are relatively expensive).
EDIT: However when I thought about the first solution again, it on the other side allows to also "skip backwards" if possible, which the iterative solution does not allow - consider for example an array like { -10, -9, -8, -7, -6, -5 } - the iterative solution would need to check all the elements, because it starts at the beginning and the values do not allow to skip forward, whereas when starting from the middle, the algo can completely skip checking the first half, then the first half of the second half, etc.

You are correct, the worst case complexity is O(n). You may have to visit all the elements of your array.
There is only one reason to not visit the array elements [mid, end] and that is when array[mid] > end (because in that case, the magic index is surely absent from [mid, end] elements).
Similarly, there is only one reason to not visit the array elements [start, mid] and that is when array[start] > mid.
So, there is a hope that you may not have to visit all the elements. Therefore it is one optimization which may work.
Thus, this binary-like method seems better than iterating over the entire array linearly but in worst case, you will hit O(n).
PS: I've assumed that array is sorted in ascending order.

It looks like you misunderstood the time complexity the required solution. The worse case is not O(n), it is O(log(n)). This is because during each pass you search next time only half of the array.
Here is a C++ example and check that for the whole array of 11 elements, it take only 3 checks.

Related

Worst case time complexity to search an element in a closely sorted array of elements?

This was asked in my interview,Here the actual meaning of the question is to find the time complexity or specifically worst case time complexity of an array of elements which are already in the sorted order.
Main point to note here is the difference between the two adjacent numbers in the array are very small or insignificant.
I approached this problem as a simple binary search which requires the array to be in sorted order and thought the Worst case time complexity is O(log n). But will this answer will change if the array elements are very close to each other as mentioned in the question.
What is the correct approach to solve this problem.
According to the question we can assume the array as below picture.
Thisis defenitely not what iam asking which was shown below , because the elements are sparely differ in the difference between them and we can use binary seach.
The O(log n) binary search complexity will not change even if all the elements are equal (or "very close to each other"), as long as array is sorted. Perhaps we can improve performance by taking advantage on array values distribution and using interpolation search https://en.wikipedia.org/wiki/Interpolation_search
But if implemented poorly Interpolation search could result in O(n) complexity
If the array shows an almost linear slope, meaning that the difference between 2 consecutive elements is almost constant across the array, you could use linear interpolation to make a guess for the index where the value could be stored:
Here is an implementation in JavaScript, but without much of specific syntax. It should be clear what is happening:
function search(arr, val) {
var low, high, guess;
low = 0;
high = arr.length-1;
while (low <= high && val >= arr[low] && val <= arr[high]) {
// Use linear interpolation to make guess for index:
guess = Math.round(low + (high - low) * (val - arr[low]) / (arr[high] - arr[low]));
if (arr[guess] == val) return guess;
console.log('Tried index ' + guess + '. No match yet for ' + val);
if (arr[guess] < val) {
low = guess + 1;
} else {
high = guess - 1;
}
}
return -1; // not found
}
var arr = [1, 2, 3, 4, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16];
index = search(arr, 7);
console.log('Search result: index ' + index);
When the array would be perfectly linear, the algorithm will find the element on the first guess, so in O(1) time. Depending on how much deviation is present in the intervals, the time will be somewhere between O(1) and O(long n).
In a normal binary search, each time you have to split the remaining array into 2 halves and check the right half's median, BUT if you know the elements are very close, you can add a simple check to the procedure:
instead of:
check median
if median is bigger: go left
else if median is smaller: go right
else return median
repeat
you can modify 2 & 3 into: go right/left by abs(median - searched_number) this should shorten your average case time complexity, but not sure how to measure it

Analysis of Algorithms - Find missing Integer in Sorted Array better than O(n)

I am working through analysis of algorithms class for the first time, and was wondering if anyone could assist with the below example. I believe I have solved it for an O(n) complexity, but was wondering if there is a better version that I am not thinking of O(logn)?
Let A= A[1] <= ... <= A[n+1] be a sorted array of n distinct integers, in which each integer is in the range [1...n+1]. That is, exactly one integer out of {1,...,n+1} is missing from A. Describe an efficeint algorithm to find the missing integer. Analyze the worst case complexity (number of accesses to array A) of your algorithm.
The solution I have is relatively simple, and I believe results in a worst case N complexity. Maybe I am over thinking the example, but is there a better solution?
My Solution
for(i = 1; i < n +1; i++) :
if(A[i-1] > i) :
return i
The logic behind this is since it is sorted, the first element must be 1, the second must be 2, and so on and so forth, until the element in the array is larger than the element it is supposed to be, indiciating an element was missed, return the element it should be and we have the missing one.
Is this correct logic? Is there a better way to go about it?
Thanks for reading and thanks in advance for the assistance.
This logic is certainly correct, but it is not fast enough to beat O(n) because you check every element.
You can do it faster by observing that if A[i]==i, then all elements at j < i are at their proper places. This observation should be sufficient to construct a divide-and-conquer approach that runs in O(log2n):
Check the element in the middle
If it's at the wrong spot, go left
Otherwise, go right
More formally, you are looking for a spot where A[i]==i and A[i+1]==i+2. You start with the interval at the ends of the array. Each probe at the middle of the interval shrinks the remaining interval twofold. At some point you are left with an interval of just two elements. The element on the left is the last "correct" element, while the element on the right is the first element after the missing number.
You can binary search for the first index i with A[i] > i. If the missing integer is k, then A[i] = i for i < k and A[i] = i + 1 for i >= k.
Sometimes the trick is to think of the problem in a different way.
In spirit, you are simply working with an array of boolean values; the entry at index n is the truth of a[n] > n.
Furthermore, this array begins with zero or more consecutive false, and the remaining entries are all true.
Your problem, now, is to find the index of the first instance of true in the (sorted) array of boolean values.

Given an array, find out the next smaller element for each element

Given an array find the next smaller element in array for each element without changing the original order of the elements.
For example, suppose the given array is 4,2,1,5,3.
The resultant array would be 2,1,-1,3,-1.
I was asked this question in an interview, but i couldn't think of a solution better than the trivial O(n^2) solution.
Any approach that I could think of, i.e. making a binary search tree, or sorting the array, will distort the original order of the elements and hence lead to a wrong result.
Any help would be highly appreciated.
O(N) Algorithm
Initialize output array to all -1s.
Create an empty stack of indexes of items we have visited in the input array but don't yet know the answer for in the output array.
Iterate over each element in the input array:
Is it smaller than the item indexed by the top of the stack?
Yes. It is the first such element to be so. Fill in the corresponding element in our output array, remove the item from the stack, and try again until the stack is empty or the answer is no.
No. Continue to 3.2.
Add this index to the stack. Continue iteration from 3.
Python implementation
def find_next_smaller_elements(xs):
ys=[-1 for x in xs]
stack=[]
for i,x in enumerate(xs):
while len(stack)>0 and x<xs[stack[-1]]:
ys[stack.pop()]=x
stack.append(i)
return ys
>>> find_next_smaller_elements([4,2,1,5,3])
[2, 1, -1, 3, -1]
>>> find_next_smaller_elements([1,2,3,4,5])
[-1, -1, -1, -1, -1]
>>> find_next_smaller_elements([5,4,3,2,1])
[4, 3, 2, 1, -1]
>>> find_next_smaller_elements([1,3,5,4,2])
[-1, 2, 4, 2, -1]
>>> find_next_smaller_elements([6,4,2])
[4, 2, -1]
Explanation
How it works
This works because whenever we add an item to the stack, we know its value is greater or equal to every element in the stack already. When we visit an element in the array, we know that if it's lower than any item in the stack, it must be lower than the last item in the stack, because the last item must be the largest. So we don't need to do any kind of search on the stack, we can just consider the last item.
Note: You can skip the initialization step so long as you add a final step to empty the stack and use each remaining index to set the corresponding output array element to -1. It's just easier in Python to initialize it to -1s when creating it.
Time complexity
This is O(N). The main loop clearly visits each index once. Each index is added to the stack exactly once and removed at most once.
Solving as an interview question
This kind of question can be pretty intimidating in an interview, but I'd like to point out that (hopefully) an interviewer isn't going to expect the solution to spring from your mind fully-formed. Talk them through your thought process. Mine went something like this:
Is there some relationship between the positions of numbers and their next smaller number in the array? Does knowing some of them constrain what the others might possibly be?
If I were in front of a whiteboard I would probably sketch out the example array and draw lines between the elements. I might also draw them as a 2D bar graph - horizontal axis being position in input array and vertical axis being value.
I had a hunch this would show a pattern, but no paper to hand. I think the diagram would make it obvious. Thinking about it carefully, I could see that the lines would not overlap arbitrarily, but would only nest.
Around this point, it occurred to me that this is incredibly similar to the algorithm Python uses internally to transform indentation into INDENT and DEDENT virtual tokens, which I'd read about before. See "How does the compiler parse the indentation?" on this page: http://www.secnetix.de/olli/Python/block_indentation.hawk However, it wasn't until I actually worked out an algorithm that I followed up on this thought and determined that it was in fact the same, so I don't think it helped too much. Still, if you can see a similarity to some other problem you know, it's probably a good idea to mention it, and say how it's similar and how it's different.
From here the general shape of the stack-based algorithm became apparent, but I still needed to think about it a bit more to be sure it would work okay for those elements that have no subsequent smaller element.
Even if you don't come up with a working algorithm, try to let your interviewer see what you're thinking about. Often it is the thought process more than the answer that they're interested in. For a tough problem, failing to find the best solution but showing insight into the problem can be better than knowing a canned answer but not being able to give it much analysis.
Start making a BST, starting from the array end. For each value 'v' answer would be the last node "Right" that you took on your way to inserting 'v', of which you can easily keep track of in recursive or iterative version.
UPDATE:
Going by your requirements, you can approach this in a linear fashion:
If every next element is smaller than the current element(e.g. 6 5 4 3 2 1) you can process this linearly without requiring any extra memory. Interesting case arises when you start getting jumbled elements(e.g. 4 2 1 5 3), in which case you need to remember their order as long as you dont' get their 'smaller counterparts'.
A simple stack based approach goes like this:
Push the first element (a[0]) in a stack.
For each next element a[i], you peek into the stack and if value ( peek() ) is greater than the one in hand a[i], you got your next smaller number for that stack element (peek()) { and keep on popping the elements as long as peek() > a[i] }. Pop them out and print/store the corresponding value.
else, simply push back your a[i] into the stack.
In the end stack 'll contain those elements which never had a value smaller than them(to their right). You can fill in -1 for them in your outpput.
e.g. A=[4, 2, 1, 5, 3];
stack: 4
a[i] = 2, Pop 4, Push 2 (you got result for 4)
stack: 2
a[i] = 1, Pop 2, Push 1 (you got result for 2)
stack: 1
a[i] = 5
stack: 1 5
a[i] = 3, Pop 5, Push 3 (you got result for 5)
stack: 1 3
1,3 don't have any counterparts for them. so store -1 for them.
Assuming you meant first next element which is lower than the current element, here are 2 solutions -
Use sqrt(N) segmentation. Divide the array in sqrt(N) segments with each segment's length being sqrt(N). For each segment calculate its' minimum element using a loop. In this way, you have pre-calculated each segments' minimum element in O(N). Now, for each element, the next lower element can be in the same segment as that one or in any of the subsequent segments. So, first check all the next elements in the current segment. If all are larger, then loop through all the subsequent segments to find out which has an element lower than current element. If you couldn't find any, result would be -1. Otherwise, check every element of that segment to find out what is the first element lower than current element. Overall, algorithm complexity is O(N*sqrt(N)) or O(N^1.5).
You can achieve O(NlgN) using a segment tree with a similar approach.
Sort the array ascending first (keeping original position of the elements as satellite data). Now, assuming each element of the array is distinct, for each element, we will need to find the lowest original position on the left side of that element. It is a classic RMQ (Range Min Query) problem and can be solved in many ways including a O(N) one. As we need to sort first, overall complexity is O(NlogN). You can learn more about RMQ in a TopCoder tutorial.
For some reasons, I find it easier to reason about "previous smaller element", aka "all nearest smaller elements". Thus applied backward gives the "next smaller".
For the record, a Python implementation in O(n) time, O(1) space (i.e. without stack), supporting negative values in the array :
def next_smaller(l):
""" Return positions of next smaller items """
res = [None] * len(l)
for i in range(len(l)-2,-1,-1):
j=i+1
while j is not None and (l[j] > l[i]):
j = res[j]
res[i] = j
return res
def next_smaller_elements(l):
""" Return next smaller items themselves """
res = next_smaller(l)
return [l[i] if i is not None else None for i in res]
Here is the javascript code . This video explains the Algo better
function findNextSmallerElem(source){
let length = source.length;
let outPut = [...Array(length)].map(() => -1);
let stack = [];
for(let i = 0 ; i < length ; i++){
let stackTopVal = stack[ stack.length - 1] && stack[ stack.length - 1].val;
// If stack is empty or current elem is greater than stack top
if(!stack.length || source[i] > stackTopVal ){
stack.push({ val: source[i], ind: i} );
} else {
// While stacktop is greater than current elem , keep popping
while( source[i] < (stack[ stack.length - 1] && stack[ stack.length - 1].val) ){
outPut[stack.pop().ind] = source[i];
}
stack.push({ val: source[i], ind: i} );
}
}
return outPut;
}
Output -
findNextSmallerElem([98,23,54,12,20,7,27])
[23, 12, 12, 7, 7, -1, -1]
Time complexity O(N), space complexity O(N).
Clean solution on java keeping order of the array:
public static int[] getNGE(int[] a) {
var s = new Stack<Pair<Integer, Integer>>();
int n = a.length;
var result = new int[n];
s.push(Pair.of(0, a[0]));
for (int i = 1; i < n; i++) {
while (!s.isEmpty() && s.peek().v2 > a[i]) {
var top = s.pop();
result[top.v1] = a[i];
}
s.push(Pair.of(i, a[i]));
}
while (!s.isEmpty()) {
var top = s.pop();
result[top.v1] = -1;
}
return result;
}
static class Pair<K, V> {
K v1;
V v2;
public static <K, V> Pair<K, V> of (K v1, V v2) {
Pair p = new Pair();
p.v1 = v1;
p.v2 = v2;
return p;
}
}
Here is an observation that I think can be made into an O(n log n) solution. Suppose you have the answer for the last k elements of the array. What would you need in order to figure out the value for the element just before this? You can think of the last k elements as being split into a series of ranges, each of which starts at some element and continues forward until it hits a smaller element. These ranges must be in descending order, so you could think about doing a binary search over them to find the first interval smaller than that element. You could then update the ranges to factor in this new element.
Now, how best to represent this? The best way I've thought of is to use a splay tree whose keys are the elements defining these ranges and whose values are the index at which they start. You can then in time O(log n) amortized do a predecessor search to find the predecessor of the current element. This finds the earliest value smaller than the current. Then, in amortized O(log n) time, insert the current element into the tree. This represents defining a new range from that element forward. To discard all ranges this supercedes, you then cut the right child of the new node, which because this is a splay tree is at the root, from the tree.
Overall, this does O(n) iterations of an O(log n) process for total O(n lg n).
Here is a O(n) algorithm using DP (actually O(2n) ):
int n = array.length();
The array min[] record the minimum number found from index i until the end of the array.
int[] min = new int[n];
min[n-1] = array[n-1];
for(int i=n-2; i>=0; i--)
min[i] = Math.min(min[i+1],array[i]);
Search and compare through the original array and min[].
int[] result = new int[n];
result[n-1] = -1;
for(int i=0; i<n-1; i++)
result[i] = min[i+1]<array[i]?min[i+1]:-1;
Here is the new solution to find "next smaller element":
int n = array.length();
int[] answer = new int[n];
answer[n-1] = -1;
for(int i=0; i<n-1; i++)
answer[i] = array[i+1]<array[i]?array[i+1]:-1;
All that is actually not required i think
case 1: a,b
answer : -a+b
case 2: a,b,c
answer : a-2b+c
case 3: a,b,c,d
answer : -a+3b-3c+d
case 4 :a,b,c,d,e
answer : a-4b+6c-4d+e
.
.
.
recognize the pattern in it?
it is the pascal's triangle!
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
so it can be calculated using Nth row of pascal's triangle!
with alternate + ans - for odd even levels!
it is O(1)
You can solve this in O(n) runtime with O(n) space complexity.
Start with a Stack and keep pushing elements till you find arr[i] such that arr[i] < stack.top element. Then store this index .
Code Snippet:
vector<int> findNext(vector<int> values) {
stack<int> st;
vector<int> nextSmall(values.size(), -1);
st.push(0);
for (int i = 1; i < values.size(); i++) {
while (!st.empty() && values[i] < values[st.top()]) {
// change values[i] < values[st.top()] to values[i] > values[st.top()] to find the next greater element.
nextSmall[st.top()] = i;
st.pop();
}
st.push(i);
}
return nextSmall;
}
Solution with O(1) space complexity and O(n) time complexity.
void replace_next_smallest(int a[], int n)
{
int ns = a[n - 1];
for (int i = n - 1; i >= 0; i--) {
if (i == n - 1) {
a[i] = -1;
}
else if (a[i] > ns) {
int t = ns;
ns = a[i];
a[i] = t;
}
else if (a[i] == ns) {
a[i] = a[i + 1];
}
else {
ns = a[i];
a[i] = -1;
}
}
}
Solution With O(n) Time Complexity and O(1) Space Complexity. This Solution is not complex to understand and implemented without stack.
def min_secMin(a,n):
min = a[0]
sec_min = a[1]
for i in range(1,n):
if(a[i]<min):
sec_min = min
min = a[i]
if(a[i]>min and a[i]<sec_min):
sec_min = a[i]
return min,sec_min
Given an array find the next smaller element in array for each element without changing the original order of the elements.
where arr is the array and n is length of the array..
Using Python logic,
def next_smallest_array(arr,n):
for i in range(0,n-1,1):
if arr[i]>arr[i+1]:
arr[i]=arr[i+1]
else:
arr[i]=-1
arr[n-1]=-1
return arr
Find_next_smaller_elements([4,2,1,5,3])
Output is [2, 1, -1, 3, -1]
Find_next_smaller_elements([1,2,3,4,5])
Output is [-1, -1, -1, -1, -1]

Finding kth smallest number from n sorted arrays

So, you have n sorted arrays (not necessarily of equal length), and you are to return the kth smallest element in the combined array (i.e the combined array formed by merging all the n sorted arrays)
I have been trying it and its other variants for quite a while now, and till now I only feel comfortable in the case where there are two arrays of equal length, both sorted and one has to return the median of these two.
This has logarithmic time complexity.
After this I tried to generalize it to finding kth smallest among two sorted arrays. Here is the question on SO.
Even here the solution given is not obvious to me. But even if I somehow manage to convince myself of this solution, I am still curious as to how to solve the absolute general case (which is my question)
Can somebody explain me a step by step solution (which again in my opinion should take logarithmic time i.e O( log(n1) + log(n2) ... + log(nN) where n1, n2...nN are the lengths of the n arrays) which starts from the more specific cases and moves on to the more general one?
I know similar questions for more specific cases are there all over the internet, but I haven't found a convincing and clear answer.
Here is a link to a question (and its answer) on SO which deals with 5 sorted arrays and finding the median of the combined array. The answer just gets too complicated for me to able to generalize it.
Even clean approaches for the more specific cases (as I mentioned during the post) are welcome.
PS: Do you think this can be further generalized to the case of unsorted arrays?
PPS: It's not a homework problem, I am just preparing for interviews.
This doesn't generalize the links, but does solve the problem:
Go through all the arrays and if any have length > k, truncate to length k (this is silly, but we'll mess with k later, so do it anyway)
Identify the largest remaining array A. If more than one, pick one.
Pick the middle element M of the largest array A.
Use a binary search on the remaining arrays to find the same element (or the largest element <= M).
Based on the indexes of the various elements, calculate the total number of elements <= M and > M. This should give you two numbers: L, the number <= M and G, the number > M
If k < L, truncate all the arrays at the split points you've found and iterate on the smaller arrays (use the bottom halves).
If k > L, truncate all the arrays at the split points you've found and iterate on the smaller arrays (use the top halves, and search for element (k-L).
When you get to the point where you only have one element per array (or 0), make a new array of size n with those data, sort, and pick the kth element.
Because you're always guaranteed to remove at least half of one array, in N iterations, you'll get rid of half the elements. That means there are N log k iterations. Each iteration is of order N log k (due to the binary searches), so the whole thing is N^2 (log k)^2 That's all, of course, worst case, based on the assumption that you only get rid of half of the largest array, not of the other arrays. In practice, I imagine the typical performance would be quite a bit better than the worst case.
It can not be done in less than O(n) time. Proof Sketch If it did, it would have to completely not look at at least one array. Obviously, one array can arbitrarily change the value of the kth element.
I have a relatively simple O(n*log(n)*log(m)) where m is the length of the longest array. I'm sure it is possible to be slightly faster, but not a lot faster.
Consider the simple case where you have n arrays each of length 1. Obviously, this is isomorphic to finding the kth element in an unsorted list of length n. It is possible to find this in O(n), see Median of Medians algorithm, originally by Blum, Floyd, Pratt, Rivest and Tarjan, and no (asymptotically) faster algorithms are possible.
Now the problem is how to expand this to longer sorted arrays. Here is the algorithm: Find the median of each array. Sort the list of tuples (median,length of array/2) and sort it by median. Walk through keeping a sum of the lengths, until you reach a sum greater than k. You now have a pair of medians, such that you know the kth element is between them. Now for each median, we know if the kth is greater or less than it, so we can throw away half of each array. Repeat. Once the arrays are all one element long (or less), we use the selection algorithm.
Implementing this will reveal additional complexities and edge conditions, but nothing that increases the asymptotic complexity. Each step
Finds the medians or the arrays, O(1) each, so O(n) total
Sorts the medians O(n log n)
Walks through the sorted list O(n)
Slices the arrays O(1) each so, O(n) total
that is O(n) + O(n log n) + O(n) + O(n) = O(n log n). And, we must perform this untill the longest array is length 1, which will take log m steps for a total of O(n*log(n)*log(m))
You ask if this can be generalized to the case of unsorted arrays. Sadly, the answer is no. Consider the case where we only have one array, then the best algorithm will have to compare at least once with each element for a total of O(m). If there were a faster solution for n unsorted arrays, then we could implement selection by splitting our single array into n parts. Since we just proved selection is O(m), we are stuck.
You could look at my recent answer on the related question here. The same idea can be generalized to multiple arrays instead of 2. In each iteration you could reject the second half of the array with the largest middle element if k is less than sum of mid indexes of all arrays. Alternately, you could reject the first half of the array with the smallest middle element if k is greater than sum of mid indexes of all arrays, adjust k. Keep doing this until you have all but one array reduced to 0 in length. The answer is kth element of the last array which wasn't stripped to 0 elements.
Run-time analysis:
You get rid of half of one array in each iteration. But to determine which array is going to be reduced, you spend time linear to the number of arrays. Assume each array is of the same length, the run time is going to be cclog(n), where c is the number of arrays and n is the length of each array.
There exist an generalization that solves the problem in O(N log k) time, see the question here.
Old question, but none of the answers were good enough. So I am posting the solution using sliding window technique and heap:
class Node {
int elementIndex;
int arrayIndex;
public Node(int elementIndex, int arrayIndex) {
super();
this.elementIndex = elementIndex;
this.arrayIndex = arrayIndex;
}
}
public class KthSmallestInMSortedArrays {
public int findKthSmallest(List<Integer[]> lists, int k) {
int ans = 0;
PriorityQueue<Node> pq = new PriorityQueue<>((a, b) -> {
return lists.get(a.arrayIndex)[a.elementIndex] -
lists.get(b.arrayIndex)[b.elementIndex];
});
for (int i = 0; i < lists.size(); i++) {
Integer[] arr = lists.get(i);
if (arr != null) {
Node n = new Node(0, i);
pq.add(n);
}
}
int count = 0;
while (!pq.isEmpty()) {
Node curr = pq.poll();
ans = lists.get(curr.arrayIndex)[curr.elementIndex];
if (++count == k) {
break;
}
curr.elementIndex++;
pq.offer(curr);
}
return ans;
}
}
The maximum number of elements that we need to access here is O(K) and there are M arrays. So the effective time complexity will be O(K*log(M)).
This would be the code. O(k*log(m))
public int findKSmallest(int[][] A, int k) {
PriorityQueue<int[]> queue = new PriorityQueue<>(Comparator.comparingInt(x -> A[x[0]][x[1]]));
for (int i = 0; i < A.length; i++)
queue.offer(new int[] { i, 0 });
int ans = 0;
while (!queue.isEmpty() && --k >= 0) {
int[] el = queue.poll();
ans = A[el[0]][el[1]];
if (el[1] < A[el[0]].length - 1) {
el[1]++;
queue.offer(el);
}
}
return ans;
}
If the k is not that huge, we can maintain a priority min queue. then loop for every head of the sorted array to get the smallest element and en-queue. when the size of the queue is k. we get the first k smallest .
maybe we can regard the n sorted array as buckets then try the bucket sort method.
This could be considered the second half of a merge sort. We could simply merge all the sorted lists into a single list...but only keep k elements in the combined lists from merge to merge. This has the advantage of only using O(k) space, but something slightly better than merge sort's O(n log n) complexity. That is, it should in practice operate slightly faster than a merge sort. Choosing the kth smallest from the final combined list is O(1). This is kind of complexity is not so bad.
It can be done by doing binary search in each array, while calculating the number of smaller elements.
I used the bisect_left and bisect_right to make it work for non-unique numbers as well,
from bisect import bisect_left
from bisect import bisect_right
def kthOfPiles(givenPiles, k, count):
'''
Perform binary search for kth element in multiple sorted list
parameters
==========
givenPiles are list of sorted list
count is the total number of
k is the target index in range [0..count-1]
'''
begins = [0 for pile in givenPiles]
ends = [len(pile) for pile in givenPiles]
#print('finding k=', k, 'count=', count)
for pileidx,pivotpile in enumerate(givenPiles):
while begins[pileidx] < ends[pileidx]:
mid = (begins[pileidx]+ends[pileidx])>>1
midval = pivotpile[mid]
smaller_count = 0
smaller_right_count = 0
for pile in givenPiles:
smaller_count += bisect_left(pile,midval)
smaller_right_count += bisect_right(pile,midval)
#print('check midval', midval,smaller_count,k,smaller_right_count)
if smaller_count <= k and k < smaller_right_count:
return midval
elif smaller_count > k:
ends[pileidx] = mid
else:
begins[pileidx] = mid+1
return -1
Please find the below C# code to Find the k-th Smallest Element in the Union of Two Sorted Arrays. Time Complexity : O(logk)
public int findKthElement(int k, int[] array1, int start1, int end1, int[] array2, int start2, int end2)
{
// if (k>m+n) exception
if (k == 0)
{
return Math.Min(array1[start1], array2[start2]);
}
if (start1 == end1)
{
return array2[k];
}
if (start2 == end2)
{
return array1[k];
}
int mid = k / 2;
int sub1 = Math.Min(mid, end1 - start1);
int sub2 = Math.Min(mid, end2 - start2);
if (array1[start1 + sub1] < array2[start2 + sub2])
{
return findKthElement(k - mid, array1, start1 + sub1, end1, array2, start2, end2);
}
else
{
return findKthElement(k - mid, array1, start1, end1, array2, start2 + sub2, end2);
}
}

What is the bug in this code?

Based on a this logic given as an answer on SO to a different(similar) question, to remove repeated numbers in a array in O(N) time complexity, I implemented that logic in C, as shown below. But the result of my code does not return unique numbers. I tried debugging but could not get the logic behind it to fix this.
int remove_repeat(int *a, int n)
{
int i, k;
k = 0;
for (i = 1; i < n; i++)
{
if (a[k] != a[i])
{
a[k+1] = a[i];
k++;
}
}
return (k+1);
}
main()
{
int a[] = {1, 4, 1, 2, 3, 3, 3, 1, 5};
int n;
int i;
n = remove_repeat(a, 9);
for (i = 0; i < n; i++)
printf("a[%d] = %d\n", i, a[i]);
}
1] What is incorrect in above code to remove duplicates.
2] Any other O(N) or O(NlogN) solution for this problem. Its logic?
Heap sort in O(n log n) time.
Iterate through in O(n) time replacing repeating elements with a sentinel value (such as INT_MAX).
Heap sort again in O(n log n) to distil out the repeating elements.
Still bounded by O(n log n).
Your code only checks whether an item in the array is the same as its immediate predecessor.
If your array starts out sorted, that will work, because all instances of a particular number will be contiguous.
If your array isn't sorted to start with, that won't work because instances of a particular number may not be contiguous, so you have to look through all the preceding numbers to determine whether one has been seen yet.
To do the job in O(N log N) time, you can sort the array, then use the logic you already have to remove duplicates from the sorted array. Obviously enough, this is only useful if you're all right with rearranging the numbers.
If you want to retain the original order, you can use something like a hash table or bit set to track whether a number has been seen yet or not, and only copy each number to the output when/if it has not yet been seen. To do this, we change your current:
if (a[k] != a[i])
a[k+1] = a[i];
to something like:
if (!hash_find(hash_table, a[i])) {
hash_insert(hash_table, a[i]);
a[k+1] = a[i];
}
If your numbers all fall within fairly narrow bounds or you expect the values to be dense (i.e., most values are present) you might want to use a bit-set instead of a hash table. This would be just an array of bits, set to zero or one to indicate whether a particular number has been seen yet.
On the other hand, if you're more concerned with the upper bound on complexity than the average case, you could use a balanced tree-based collection instead of a hash table. This will typically use more memory and run more slowly, but its expected complexity and worst case complexity are essentially identical (O(N log N)). A typical hash table degenerates from constant complexity to linear complexity in the worst case, which will change your overall complexity from O(N) to O(N2).
Your code would appear to require that the input is sorted. With unsorted inputs as you are testing with, your code will not remove all duplicates (only adjacent ones).
You are able to get O(N) solution if the number of integers is known up front and smaller than the amount of memory you have :). Make one pass to determine the unique integers you have using auxillary storage, then another to output the unique values.
Code below is in Java, but hopefully you get the idea.
int[] removeRepeats(int[] a) {
// Assume these are the integers between 0 and 1000
Boolean[] v = new Boolean[1000]; // A lazy way of getting a tri-state var (false, true, null)
for (int i=0;i<a.length;++i) {
v[a[i]] = Boolean.TRUE;
}
// v[i] = null => number not seen
// v[i] = true => number seen
int[] out = new int[a.length];
int ptr = 0;
for (int i=0;i<a.length;++i) {
if (v[a[i]] != null && v[a[i]].equals(Boolean.TRUE)) {
out[ptr++] = a[i];
v[a[i]] = Boolean.FALSE;
}
}
// Out now doesn't contain duplicates, order is preserved and ptr represents how
// many elements are set.
return out;
}
You are going to need two loops, one to go through the source and one to check each item in the destination array.
You are not going to get O(N).
[EDIT]
The article you linked to suggests a sorted output array which means the search for duplicates in the output array can be a binary search...which is O(LogN).
Your logic just wrong, so the code is wrong too. Do your logic by yourself before coding it.
I suggest a O(NlnN) way with a modification of heapsort.
With heapsort, we join from a[i] to a[n], find the minimum and replace it with a[i], right?
So now is the modification, if the minimum is the same with a[i-1] then swap minimum and a[n], reduce your array item's number by 1.
It should do the trick in O(NlnN) way.
Your code will work only on particular cases. Clearly, you're checking adjacent values but duplicate values can occur any where in array. Hence, it's totally wrong.

Resources