Find shortest subarray containing all elements - arrays

Suppose you have an array of numbers, and another set of numbers. You have to find the shortest subarray containing all numbers with minimal complexity.
The array can have duplicates, and let's assume the set of numbers does not. It's not ordered - the subarray may contain the set of number in any order.
For example:
Array: 1 2 5 8 7 6 2 6 5 3 8 5
Numbers: 5 7
Then the shortest subarray is obviously Array[2:5] (python notation).
Also, what would you do if you want to avoid sorting the array for some reason (a la online algorithms)?

Proof of a linear-time solution
I will write right-extension to mean increasing the right endpoint of a range by 1, and left-contraction to mean increasing the left endpoint of a range by 1. This answer is a slight variation of Aasmund Eldhuset's answer. The difference here is that once we find the smallest j such that [0, j] contains all interesting numbers, we thereafter consider only ranges that contain all interesting numbers. (It's possible to interpret Aasmund's answer this way, but it's also possible to interpret it as allowing a single interesting number to be lost due to a left-contraction -- an algorithm whose correctness has yet to be established.)
The basic idea is that for each position j, we will find the shortest satisfying range ending at position j, given that we know the shortest satisfying range ending at position j-1.
EDIT: Fixed a glitch in the base case.
Base case: Find the smallest j' such that [0, j'] contains all interesting numbers. By construction, there can be no ranges [0, k < j'] that contain all interesting numbers so we don't need to worry about them further. Now find the smallestlargest i such that [i, j'] contains all interesting numbers (i.e. hold j' fixed). This is the smallest satisfying range ending at position j'.
To find the smallest satisfying range ending at any arbitrary position j, we can right-extend the smallest satisfying range ending at position j-1 by 1 position. This range will necessarily also contain all interesting numbers, though it may not be minimal-length. The fact that we already know this is a satisfying range means that we don't have to worry about extending the range "backwards" to the left, since that can only increase the range over its minimal length (i.e. make the solution worse). The only operations we need to consider are left-contractions that preserve the property of containing all interesting numbers. So the left endpoint of the range should be advanced as far as possible while this property holds. When no more left-contractions can be performed, we have the minimal-length satisfying range ending at j (since further left-contractions clearly cannot make the range satisfying again) and we are done.
Since we perform this for each rightmost position j, we can take the minimum-length range over all rightmost positions to find the overall minimum. This can be done using a nested loop in which j advances on each outer loop cycle. Clearly j advances by 1 n times. Since at any point in time we only ever need the leftmost position of the best range for the previous value of j, we can store this in i and just update it as we go. i starts at 0, is at all times <= j <= n, and only ever advances upwards by 1, meaning it can advance at most n times. Both i and j advance at most n times, meaning that the algorithm is linear-time.
In the following pseudo-code, I've combined both phases into a single loop. We only try to contract the left side if we have reached the stage of having all interesting numbers:
# x[0..m-1] is the array of interesting numbers.
# Load them into a hash/dictionary:
For i from 0 to m-1:
isInteresting[x[i]] = 1
i = 0
nDistinctInteresting = 0
minRange = infinity
For j from 0 to n-1:
If count[a[j]] == 0 and isInteresting[a[j]]:
nDistinctInteresting++
count[a[j]]++
If nDistinctInteresting == m:
# We are in phase 2: contract the left side as far as possible
While count[a[i]] > 1 or not isInteresting[a[i]]:
count[a[i]]--
i++
If j - i < minRange:
(minI, minJ) = (i, j)
count[] and isInteresting[] are hashes/dictionaries (or plain arrays if the numbers involved are small).

This sounds like a problem that is well-suited for a sliding window approach: maintain a window (a subarray) that is gradually expanding and contracting, and use a hashmap to keep track of the number of times each "interesting" number occurs in the window. E.g. start with an empty window, then expand it to include only element 0, then elements 0-1, then 0-2, 0-3, and so on, by adding subsequent elements (and using the hashmap to keep track of which numbers exist in the window). When the hashmap tells you that all interesting numbers exist in the window, you can begin contracting it: e.g. 0-5, 1-5, 2-5, etc., until you find out that the window no longer contains all interesting numbers. Then, you can begin expanding it on the right hand side again, and so on. I'm quite (but not entirely) sure that this would work for your problem, and it can be implemented to run in linear time.

Say the array has n elements, and set has m elements
Sort the array, noting the reverse index (position in the original array)
// O (n log n) time
for each element in given set
find it in the array
// O (m log n) time - log n for binary serch, m times
keep track of the minimum and maximum index for each found element
min - max defines your range
Total time complexity: O ((m+n) log n)

This solution definitely does not run in O(n) time as suggested by some of the pseudocode above, however it is real (Python) code that solves the problem and by my estimates runs in O(n^2):
def small_sub(A, B):
len_A = len(A)
len_B = len(B)
sub_A = []
sub_size = -1
dict_b = {}
for elem in B:
if elem in dict_b:
dict_b[elem] += 1
else:
dict_b.update({elem: 1})
for i in range(0, len_A - len_B + 1):
if A[i] in dict_b:
temp_size, temp_sub = find_sub(A[i:], dict_b.copy())
if (sub_size == -1 or (temp_size != -1 and temp_size < sub_size)):
sub_A = temp_sub
sub_size = temp_size
return sub_size, sub_A
def find_sub(A, dict_b):
index = 0
for i in A:
if len(dict_b) == 0:
break
if i in dict_b:
dict_b[i] -= 1
if dict_b[i] <= 0:
del(dict_b[i])
index += 1
if len(dict_b) > 0:
return -1, {}
else:
return index, A[0:index]

Here's how I solved this problem in linear time using collections.Counter objects
from collections import Counter
def smallest_subsequence(stream, search):
if not search:
return [] # the shortest subsequence containing nothing is nothing
stream_counts = Counter(stream)
search_counts = Counter(search)
minimal_subsequence = None
start = 0
end = 0
subsequence_counts = Counter()
while True:
# while subsequence_counts doesn't have enough elements to cancel out every
# element in search_counts, take the next element from search
while search_counts - subsequence_counts:
if end == len(stream): # if we've reached the end of the list, we're done
return minimal_subsequence
subsequence_counts[stream[end]] += 1
end += 1
# while subsequence_counts has enough elements to cover search_counts, keep
# removing from the start of the sequence
while not search_counts - subsequence_counts:
if minimal_subsequence is None or (end - start) < len(minimal_subsequence):
minimal_subsequence = stream[start:end]
subsequence_counts[stream[start]] -= 1
start += 1
print(smallest_subsequence([1, 2, 5, 8, 7, 6, 2, 6, 5, 3, 8, 5], [5, 7]))
# [5, 8, 7]

Java solution
List<String> paragraph = Arrays.asList("a", "c", "d", "m", "b", "a");
Set<String> keywords = Arrays.asList("a","b");
Subarray result = new Subarray(-1,-1);
Map<String, Integer> keyWordFreq = new HashMap<>();
int numKeywords = keywords.size();
// slide the window to contain the all the keywords**
// starting with [0,0]
for (int left = 0, right = 0 ; right < paragraph.size() ; right++){
// expand right to contain all the keywords
String currRight = paragraph.get(right);
if (keywords.contains(currRight)){
keyWordFreq.put(currRight, keyWordFreq.get(currRight) == null ? 1 : keyWordFreq.get(currRight) + 1);
}
// loop enters when all the keywords are present in the current window
// contract left until the all the keywords are still present
while (keyWordFreq.size() == numKeywords){
String currLeft = paragraph.get(left);
if (keywords.contains(currLeft)){
// remove from the map if its the last available so that loop exists
if (keyWordFreq.get(currLeft).equals(1)){
// now check if current sub array is the smallest
if((result.start == -1 && result.end == -1) || (right - left) < (result.end - result.start)){
result = new Subarray(left, right);
}
keyWordFreq.remove(currLeft);
}else {
// else reduce the frequcency
keyWordFreq.put(currLeft, keyWordFreq.get(currLeft) - 1);
}
}
left++;
}
}
return result;
}

Related

Counting segments without the function Count

I had the next problem: Given an array, count the number of segments of length k in which this happens; the number of positives in the left-half of the segment is bigger or equal to the right-half.
As an example (imagine segments can only be even, so that there is no discussion about what a half is):
k=2 ---> count(array[-4,-2,2,1],k) ---> 2, as [-4,-2] fulfills and also [2,1]
k=4 ---> count(array[-4,-2,2,1],k) ---> 0, as [-4,-2,2,1] does not fulfil.
k=6 ---> count(array[-4,-2,2,1],k) ---> 0, as there are not length 6 segments.
I have solved it recursively, using the function Count, in a trivial way: I move the array from left to right, enumerating all the segments of length k, and applying the count on each of those. It is done in Dafny:
function method Count_segments(sequ: seq<int>, seg_length:int): int
{
if |sequ| == 0 then 0
else (if (Count(x => x >= 0, sequ[0..seg_length/2])) >= (Count(x => x >= 0, sequ[seg_length/2..seg_length])) then
1 + if (|sequ|-1 < seg_length) then 0 //I add the condition that says that if in the next iteration, sequ will be smaller than the sequence_length, then we end.
else Count_segments(sequ[1..], seg_length)
else if (|sequ|-1 < seg_length) then 0
else Count_segments(sequ[1..], seg_length)
)
}
But, obviously, using Count, I am not doing a linear search iteratively (in the first example, instead of searching 4 times, it does 6 times). I would like to implement this in O(n) but cannot find any info, does anyone have an idea? I do not care about the programming language (can answer me in any language), but about the algorithm itself.
Thanks!!

Number of subarrays with same 'degree' as the array

So this problem was asked in a quiz and the problem goes like:
You are given an array 'a' with elements ranging from 1-106 and the size of array could be maximum 105 Now we are asked to find the number of subarrays with the same 'degree' as the original array. Degree of an array is defined as the frequency of maximum occurring element in the array. Multiple elements could have the same frequency.
I was stuck in this problem for like an hour but couldn't think of any solution. How do I solve it?
Sample Input:
first-input
1,2,2,3,1
first-output 2
second-input
1,1,2,1,2,2
second-output 4
The element that occurs most frequently is called the mode; this problem defines degree as the frequency count. Your tasks are:
Identify all of the mode values.
For each mode value, find the index range of that value. For instance, in the array
[1, 1, 2, 1, 3, 3, 2, 4, 2, 4, 5, 5, 5]
You have three modes (1 2 5) with a degree of 3. The index ranges are
1 - 0:3
2 - 2:8
5 - 10:12
You need to count all index ranges (subarrays) that include at least one of those three ranges.
I've tailored this example to have both basic cases: modes that overlap, and those that do not. Note that containment is a moot point: if you have an array where one mode's range contains another:
[0, 1, 1, 1, 0, 0]
You can ignore the outer one altogether: any subarray that contains 0 will also contain 1.
ANALYSIS
A subarray is defined by two numbers, the starting and ending indices. Since we must have 0 <= start <= end <= len(array), this is the "handshake" problem between array bounds. We have N(N+1)/2 possible subarrays.
For 10**5 elements, you could just brute-force the problem from here: for each pair of indices, check to see whether that range contains any of the mode ranges. However, you can easily cut that down with interval recognition.
ALGORITHM
Step through the mode ranges, left to right. First, count all subranges that include the first mode range [0:3]. There is only 1 possible starts [0] and 10 possible ends [3:12]; that's 10 subarrays.
Now move to the second mode range, [2:8]. You need to count subarrays that include this, but exclude those you've already counted. Since there's an overlap, you need a starting point later than 0, or an ending point before 3. This second clause is not possible with the given range.
Thus, you consider start [1:2], end [8:12]. That's 2 * 5 more subarrays.
For the third range [10:12 (no overlap), you need a starting point that does not include any other subrange. This means that any starting point [3:10] will do. Since there's only one possible endpoint, you have 8*1, or 8 more subarrays.
Can you turn this into something formal?
Taking reference from leet code
https://leetcode.com/problems/degree-of-an-array/solution/
solve
class Solution {
public int findShortestSubArray(int[] nums) {
Map<Integer, Integer> left = new HashMap(),
right = new HashMap(), count = new HashMap();
for (int i = 0; i < nums.length; i++) {
int x = nums[i];
if (left.get(x) == null) left.put(x, i);
right.put(x, i);
count.put(x, count.getOrDefault(x, 0) + 1);
}
int ans = nums.length;
int degree = Collections.max(count.values());
for (int x: count.keySet()) {
if (count.get(x) == degree) {
ans = Math.min(ans, right.get(x) - left.get(x) + 1);
}
}
return ans;
}
}

Optimal way to find number of operation required to convert all K numbers to lie in the range [L,R] (i.e. L≤x≤R)

I am solving this question which requires some optimized techniques to
solve it. I can think of the brute force method only which requires
combinatorics.
Given an array A consisting of n integers. We call an integer "good"
if it lies in the range [L,R] (i.e. L≤x≤R). We need to make sure if we
pick up any K integers from the array at least one of them should be a
good integer.
For achieving this, in a single operation, we are allowed to
increase/decrease any element of the array by one.
What will be the minimum number of operations we will need for a
fixed k?"
i.e k=1 to n.
input:
L R
1 2
A=[ 1 3 3 ]
output:
for k=1 : 2
for k=2 : 1
for k=3 : 0
For k=1, you have to convert both the 3s into 2s to make sure that if
you select any one of the 3 integers, the selected integer is good.
For k=2, one of the possible ways is to convert one of the 3s into 2.
For k=3, no operation is needed as 1 is a good integer.
As burnpanck has explained in his answer, to make sure that when you pick any k elements in the array, and at least one of them is in range [L,R], we need to make sure that there are at least n - k + 1 numbers in range [L,R] in the array.
So, first , for each element, we calculate the cost to make this element be a valid element (which is in range [L,R]) and store those cost in an array cost.
We notice that:
For k = 1, the minimum cost is the sum of array cost.
For k = 2, the minimum cost is the sum of cost, minus the largest element.
For k = 3, the minimum cost is the sum of cost, minus two largest elements.
...
So, we need to have a prefixSum array, which ith position is the sum of sorted cost array from 0 to ith.
After calculate prefixSum, we can answer result for each k in O(1)
So here is the algo in Java, notice the time complexity is O(n logn):
int[]cost = new int[n];
for(int i = 0; i < n; i++)
cost[i] = //Calculate min cost for element i
Arrays.sort(cost);
int[]prefix = new int[n];
for(int i = 0; i < n; i++)
prefix[i] = cost[i] + (i > 0 ? prefix[i - 1] : 0);
for(int i = n - 1; i >= 0; i--)
System.out.println("Result for k = " + (n - i) + " is " + prefix[i]);
To be sure that from picking k elements will give at least one valid means you should have not more than k-1 invalid in your set. You therefore need to find the shortest way to make enough elements valid. This I would do as follows: In a single pass, generate a map that counts how many elements are in the set that need $n$ operations to be made valid. Then, you clearly want to take those elements that need the least operations, so take the required number of elements in ascending order of required number of operations, and sum the number of operations.
In python:
def min_ops(L,R,A_set):
n_ops = dict() # create an empty mapping
for a in A_set: # loop over all a in the set A_set
n = max(0,max(a-R,L-a)) # the number of operations requied to make a valid
n_ops[n] = n_ops.get(n,0) + 1 # in the mapping, increment the element keyed by *n* by ones. If it does not exist yet, assume it was 0.
allret = [] # create a new list to hold the result for all k
for k in range(1,len(A_set)+1): # iterate over all k in the range [1,N+1) == [1,N]
n_good_required = len(A_set) - k + 1
ret = 0
# iterator over all pairs of keys,values from the mapping, sorted by key.
# The key is the number of ops required, the value the number of elements available
for n,nel in sorted(n_ops.items()):
if n_good_required:
return ret
ret += n * min(nel,n_good_required)
n_good_required -= nel
allret.append(ret) # append the answer for this k to the result list
return allret
As an example:
A_set = [1,3,3,6,8,5,4,7]
L,R = 4,6
For each A, we find how many operations we need to make it valid:
n = [3,1,1,0,2,0,0,1]
(i.e. 1 needs 3 steps, 3 needs one, and so on)
Then we count them:
n_ops = {
0: 3, # we already have three valid elements
1: 3, # three elements that require one op
2: 1,
3: 1, # and finally one that requires 3 ops
}
Now, for each k, we find out how many valid elements we need in the set,
e.g. for k = 4, we need at most 3 invalid in the set of 8, so we need 5 valid ones.
Thus:
ret = 0
n_good_requied = 5
with n=0, we have 3 so take all of them
ret = 0
n_good_required = 2
with n=1, we have 3, but we need just two, so take those
ret = 2
we're finished

Given an array, find out the next smaller element for each element

Given an array find the next smaller element in array for each element without changing the original order of the elements.
For example, suppose the given array is 4,2,1,5,3.
The resultant array would be 2,1,-1,3,-1.
I was asked this question in an interview, but i couldn't think of a solution better than the trivial O(n^2) solution.
Any approach that I could think of, i.e. making a binary search tree, or sorting the array, will distort the original order of the elements and hence lead to a wrong result.
Any help would be highly appreciated.
O(N) Algorithm
Initialize output array to all -1s.
Create an empty stack of indexes of items we have visited in the input array but don't yet know the answer for in the output array.
Iterate over each element in the input array:
Is it smaller than the item indexed by the top of the stack?
Yes. It is the first such element to be so. Fill in the corresponding element in our output array, remove the item from the stack, and try again until the stack is empty or the answer is no.
No. Continue to 3.2.
Add this index to the stack. Continue iteration from 3.
Python implementation
def find_next_smaller_elements(xs):
ys=[-1 for x in xs]
stack=[]
for i,x in enumerate(xs):
while len(stack)>0 and x<xs[stack[-1]]:
ys[stack.pop()]=x
stack.append(i)
return ys
>>> find_next_smaller_elements([4,2,1,5,3])
[2, 1, -1, 3, -1]
>>> find_next_smaller_elements([1,2,3,4,5])
[-1, -1, -1, -1, -1]
>>> find_next_smaller_elements([5,4,3,2,1])
[4, 3, 2, 1, -1]
>>> find_next_smaller_elements([1,3,5,4,2])
[-1, 2, 4, 2, -1]
>>> find_next_smaller_elements([6,4,2])
[4, 2, -1]
Explanation
How it works
This works because whenever we add an item to the stack, we know its value is greater or equal to every element in the stack already. When we visit an element in the array, we know that if it's lower than any item in the stack, it must be lower than the last item in the stack, because the last item must be the largest. So we don't need to do any kind of search on the stack, we can just consider the last item.
Note: You can skip the initialization step so long as you add a final step to empty the stack and use each remaining index to set the corresponding output array element to -1. It's just easier in Python to initialize it to -1s when creating it.
Time complexity
This is O(N). The main loop clearly visits each index once. Each index is added to the stack exactly once and removed at most once.
Solving as an interview question
This kind of question can be pretty intimidating in an interview, but I'd like to point out that (hopefully) an interviewer isn't going to expect the solution to spring from your mind fully-formed. Talk them through your thought process. Mine went something like this:
Is there some relationship between the positions of numbers and their next smaller number in the array? Does knowing some of them constrain what the others might possibly be?
If I were in front of a whiteboard I would probably sketch out the example array and draw lines between the elements. I might also draw them as a 2D bar graph - horizontal axis being position in input array and vertical axis being value.
I had a hunch this would show a pattern, but no paper to hand. I think the diagram would make it obvious. Thinking about it carefully, I could see that the lines would not overlap arbitrarily, but would only nest.
Around this point, it occurred to me that this is incredibly similar to the algorithm Python uses internally to transform indentation into INDENT and DEDENT virtual tokens, which I'd read about before. See "How does the compiler parse the indentation?" on this page: http://www.secnetix.de/olli/Python/block_indentation.hawk However, it wasn't until I actually worked out an algorithm that I followed up on this thought and determined that it was in fact the same, so I don't think it helped too much. Still, if you can see a similarity to some other problem you know, it's probably a good idea to mention it, and say how it's similar and how it's different.
From here the general shape of the stack-based algorithm became apparent, but I still needed to think about it a bit more to be sure it would work okay for those elements that have no subsequent smaller element.
Even if you don't come up with a working algorithm, try to let your interviewer see what you're thinking about. Often it is the thought process more than the answer that they're interested in. For a tough problem, failing to find the best solution but showing insight into the problem can be better than knowing a canned answer but not being able to give it much analysis.
Start making a BST, starting from the array end. For each value 'v' answer would be the last node "Right" that you took on your way to inserting 'v', of which you can easily keep track of in recursive or iterative version.
UPDATE:
Going by your requirements, you can approach this in a linear fashion:
If every next element is smaller than the current element(e.g. 6 5 4 3 2 1) you can process this linearly without requiring any extra memory. Interesting case arises when you start getting jumbled elements(e.g. 4 2 1 5 3), in which case you need to remember their order as long as you dont' get their 'smaller counterparts'.
A simple stack based approach goes like this:
Push the first element (a[0]) in a stack.
For each next element a[i], you peek into the stack and if value ( peek() ) is greater than the one in hand a[i], you got your next smaller number for that stack element (peek()) { and keep on popping the elements as long as peek() > a[i] }. Pop them out and print/store the corresponding value.
else, simply push back your a[i] into the stack.
In the end stack 'll contain those elements which never had a value smaller than them(to their right). You can fill in -1 for them in your outpput.
e.g. A=[4, 2, 1, 5, 3];
stack: 4
a[i] = 2, Pop 4, Push 2 (you got result for 4)
stack: 2
a[i] = 1, Pop 2, Push 1 (you got result for 2)
stack: 1
a[i] = 5
stack: 1 5
a[i] = 3, Pop 5, Push 3 (you got result for 5)
stack: 1 3
1,3 don't have any counterparts for them. so store -1 for them.
Assuming you meant first next element which is lower than the current element, here are 2 solutions -
Use sqrt(N) segmentation. Divide the array in sqrt(N) segments with each segment's length being sqrt(N). For each segment calculate its' minimum element using a loop. In this way, you have pre-calculated each segments' minimum element in O(N). Now, for each element, the next lower element can be in the same segment as that one or in any of the subsequent segments. So, first check all the next elements in the current segment. If all are larger, then loop through all the subsequent segments to find out which has an element lower than current element. If you couldn't find any, result would be -1. Otherwise, check every element of that segment to find out what is the first element lower than current element. Overall, algorithm complexity is O(N*sqrt(N)) or O(N^1.5).
You can achieve O(NlgN) using a segment tree with a similar approach.
Sort the array ascending first (keeping original position of the elements as satellite data). Now, assuming each element of the array is distinct, for each element, we will need to find the lowest original position on the left side of that element. It is a classic RMQ (Range Min Query) problem and can be solved in many ways including a O(N) one. As we need to sort first, overall complexity is O(NlogN). You can learn more about RMQ in a TopCoder tutorial.
For some reasons, I find it easier to reason about "previous smaller element", aka "all nearest smaller elements". Thus applied backward gives the "next smaller".
For the record, a Python implementation in O(n) time, O(1) space (i.e. without stack), supporting negative values in the array :
def next_smaller(l):
""" Return positions of next smaller items """
res = [None] * len(l)
for i in range(len(l)-2,-1,-1):
j=i+1
while j is not None and (l[j] > l[i]):
j = res[j]
res[i] = j
return res
def next_smaller_elements(l):
""" Return next smaller items themselves """
res = next_smaller(l)
return [l[i] if i is not None else None for i in res]
Here is the javascript code . This video explains the Algo better
function findNextSmallerElem(source){
let length = source.length;
let outPut = [...Array(length)].map(() => -1);
let stack = [];
for(let i = 0 ; i < length ; i++){
let stackTopVal = stack[ stack.length - 1] && stack[ stack.length - 1].val;
// If stack is empty or current elem is greater than stack top
if(!stack.length || source[i] > stackTopVal ){
stack.push({ val: source[i], ind: i} );
} else {
// While stacktop is greater than current elem , keep popping
while( source[i] < (stack[ stack.length - 1] && stack[ stack.length - 1].val) ){
outPut[stack.pop().ind] = source[i];
}
stack.push({ val: source[i], ind: i} );
}
}
return outPut;
}
Output -
findNextSmallerElem([98,23,54,12,20,7,27])
[23, 12, 12, 7, 7, -1, -1]
Time complexity O(N), space complexity O(N).
Clean solution on java keeping order of the array:
public static int[] getNGE(int[] a) {
var s = new Stack<Pair<Integer, Integer>>();
int n = a.length;
var result = new int[n];
s.push(Pair.of(0, a[0]));
for (int i = 1; i < n; i++) {
while (!s.isEmpty() && s.peek().v2 > a[i]) {
var top = s.pop();
result[top.v1] = a[i];
}
s.push(Pair.of(i, a[i]));
}
while (!s.isEmpty()) {
var top = s.pop();
result[top.v1] = -1;
}
return result;
}
static class Pair<K, V> {
K v1;
V v2;
public static <K, V> Pair<K, V> of (K v1, V v2) {
Pair p = new Pair();
p.v1 = v1;
p.v2 = v2;
return p;
}
}
Here is an observation that I think can be made into an O(n log n) solution. Suppose you have the answer for the last k elements of the array. What would you need in order to figure out the value for the element just before this? You can think of the last k elements as being split into a series of ranges, each of which starts at some element and continues forward until it hits a smaller element. These ranges must be in descending order, so you could think about doing a binary search over them to find the first interval smaller than that element. You could then update the ranges to factor in this new element.
Now, how best to represent this? The best way I've thought of is to use a splay tree whose keys are the elements defining these ranges and whose values are the index at which they start. You can then in time O(log n) amortized do a predecessor search to find the predecessor of the current element. This finds the earliest value smaller than the current. Then, in amortized O(log n) time, insert the current element into the tree. This represents defining a new range from that element forward. To discard all ranges this supercedes, you then cut the right child of the new node, which because this is a splay tree is at the root, from the tree.
Overall, this does O(n) iterations of an O(log n) process for total O(n lg n).
Here is a O(n) algorithm using DP (actually O(2n) ):
int n = array.length();
The array min[] record the minimum number found from index i until the end of the array.
int[] min = new int[n];
min[n-1] = array[n-1];
for(int i=n-2; i>=0; i--)
min[i] = Math.min(min[i+1],array[i]);
Search and compare through the original array and min[].
int[] result = new int[n];
result[n-1] = -1;
for(int i=0; i<n-1; i++)
result[i] = min[i+1]<array[i]?min[i+1]:-1;
Here is the new solution to find "next smaller element":
int n = array.length();
int[] answer = new int[n];
answer[n-1] = -1;
for(int i=0; i<n-1; i++)
answer[i] = array[i+1]<array[i]?array[i+1]:-1;
All that is actually not required i think
case 1: a,b
answer : -a+b
case 2: a,b,c
answer : a-2b+c
case 3: a,b,c,d
answer : -a+3b-3c+d
case 4 :a,b,c,d,e
answer : a-4b+6c-4d+e
.
.
.
recognize the pattern in it?
it is the pascal's triangle!
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
so it can be calculated using Nth row of pascal's triangle!
with alternate + ans - for odd even levels!
it is O(1)
You can solve this in O(n) runtime with O(n) space complexity.
Start with a Stack and keep pushing elements till you find arr[i] such that arr[i] < stack.top element. Then store this index .
Code Snippet:
vector<int> findNext(vector<int> values) {
stack<int> st;
vector<int> nextSmall(values.size(), -1);
st.push(0);
for (int i = 1; i < values.size(); i++) {
while (!st.empty() && values[i] < values[st.top()]) {
// change values[i] < values[st.top()] to values[i] > values[st.top()] to find the next greater element.
nextSmall[st.top()] = i;
st.pop();
}
st.push(i);
}
return nextSmall;
}
Solution with O(1) space complexity and O(n) time complexity.
void replace_next_smallest(int a[], int n)
{
int ns = a[n - 1];
for (int i = n - 1; i >= 0; i--) {
if (i == n - 1) {
a[i] = -1;
}
else if (a[i] > ns) {
int t = ns;
ns = a[i];
a[i] = t;
}
else if (a[i] == ns) {
a[i] = a[i + 1];
}
else {
ns = a[i];
a[i] = -1;
}
}
}
Solution With O(n) Time Complexity and O(1) Space Complexity. This Solution is not complex to understand and implemented without stack.
def min_secMin(a,n):
min = a[0]
sec_min = a[1]
for i in range(1,n):
if(a[i]<min):
sec_min = min
min = a[i]
if(a[i]>min and a[i]<sec_min):
sec_min = a[i]
return min,sec_min
Given an array find the next smaller element in array for each element without changing the original order of the elements.
where arr is the array and n is length of the array..
Using Python logic,
def next_smallest_array(arr,n):
for i in range(0,n-1,1):
if arr[i]>arr[i+1]:
arr[i]=arr[i+1]
else:
arr[i]=-1
arr[n-1]=-1
return arr
Find_next_smaller_elements([4,2,1,5,3])
Output is [2, 1, -1, 3, -1]
Find_next_smaller_elements([1,2,3,4,5])
Output is [-1, -1, -1, -1, -1]

How can I find a number which occurs an odd number of times in a SORTED array in O(n) time?

I have a question and I tried to think over it again and again... but got nothing so posting the question here. Maybe I could get some view-point of others, to try and make it work...
The question is: we are given a SORTED array, which consists of a collection of values occurring an EVEN number of times, except one, which occurs ODD number of times. We need to find the solution in log n time.
It is easy to find the solution in O(n) time, but it looks pretty tricky to perform in log n time.
Theorem: Every deterministic algorithm for this problem probes Ω(log2 n) memory locations in the worst case.
Proof (completely rewritten in a more formal style):
Let k > 0 be an odd integer and let n = k2. We describe an adversary that forces (log2 (k + 1))2 = Ω(log2 n) probes.
We call the maximal subsequences of identical elements groups. The adversary's possible inputs consist of k length-k segments x1 x2 … xk. For each segment xj, there exists an integer bj ∈ [0, k] such that xj consists of bj copies of j - 1 followed by k - bj copies of j. Each group overlaps at most two segments, and each segment overlaps at most two groups.
Group boundaries
| | | | |
0 0 1 1 1 2 2 3 3
| | | |
Segment boundaries
Wherever there is an increase of two, we assume a double boundary by convention.
Group boundaries
| || | |
0 0 0 2 2 2 2 3 3
Claim: The location of the jth group boundary (1 ≤ j ≤ k) is uniquely determined by the segment xj.
Proof: It's just after the ((j - 1) k + bj)th memory location, and xj uniquely determines bj. //
We say that the algorithm has observed the jth group boundary in case the results of its probes of xj uniquely determine xj. By convention, the beginning and the end of the input are always observed. It is possible for the algorithm to uniquely determine the location of a group boundary without observing it.
Group boundaries
| X | | |
0 0 ? 1 2 2 3 3 3
| | | |
Segment boundaries
Given only 0 0 ?, the algorithm cannot tell for sure whether ? is a 0 or a 1. In context, however, ? must be a 1, as otherwise there would be three odd groups, and the group boundary at X can be inferred. These inferences could be problematic for the adversary, but it turns out that they can be made only after the group boundary in question is "irrelevant".
Claim: At any given point during the algorithm's execution, consider the set of group boundaries that it has observed. Exactly one consecutive pair is at odd distance, and the odd group lies between them.
Proof: Every other consecutive pair bounds only even groups. //
Define the odd-length subsequence bounded by the special consecutive pair to be the relevant subsequence.
Claim: No group boundary in the interior of the relevant subsequence is uniquely determined. If there is at least one such boundary, then the identity of the odd group is not uniquely determined.
Proof: Without loss of generality, assume that each memory location not in the relevant subsequence has been probed and that each segment contained in the relevant subsequence has exactly one location that has not been probed. Suppose that the jth group boundary (call it B) lies in the interior of the relevant subsequence. By hypothesis, the probes to xj determine B's location up to two consecutive possibilities. We call the one at odd distance from the left observed boundary odd-left and the other odd-right. For both possibilities, we work left to right and fix the location of every remaining interior group boundary so that the group to its left is even. (We can do this because they each have two consecutive possibilities as well.) If B is at odd-left, then the group to its left is the unique odd group. If B is at odd-right, then the last group in the relevant subsequence is the unique odd group. Both are valid inputs, so the algorithm has uniquely determined neither the location of B nor the odd group. //
Example:
Observed group boundaries; relevant subsequence marked by […]
[ ] |
0 0 Y 1 1 Z 2 3 3
| | | |
Segment boundaries
Possibility #1: Y=0, Z=2
Possibility #2: Y=1, Z=2
Possibility #3: Y=1, Z=1
As a consequence of this claim, the algorithm, regardless of how it works, must narrow the relevant subsequence to one group. By definition, it therefore must observe some group boundaries. The adversary now has the simple task of keeping open as many possibilities as it can.
At any given point during the algorithm's execution, the adversary is internally committed to one possibility for each memory location outside of the relevant subsequence. At the beginning, the relevant subsequence is the entire input, so there are no initial commitments. Whenever the algorithm probes an uncommitted location of xj, the adversary must commit to one of two values: j - 1, or j. If it can avoid letting the jth boundary be observed, it chooses a value that leaves at least half of the remaining possibilities (with respect to observation). Otherwise, it chooses so as to keep at least half of the groups in the relevant interval and commits values for the others.
In this way, the adversary forces the algorithm to observe at least log2 (k + 1) group boundaries, and in observing the jth group boundary, the algorithm is forced to make at least log2 (k + 1) probes.
Extensions:
This result extends straightforwardly to randomized algorithms by randomizing the input, replacing "at best halved" (from the algorithm's point of view) with "at best halved in expectation", and applying standard concentration inequalities.
It also extends to the case where no group can be larger than s copies; in this case the lower bound is Ω(log n log s).
A sorted array suggests a binary search. We have to redefine equality and comparison. Equality simple means an odd number of elements. We can do comparison by observing the index of the first or last element of the group. The first element will be an even index (0-based) before the odd group, and an odd index after the odd group. We can find the first and last elements of a group using binary search. The total cost is O((log N)²).
PROOF OF O((log N)²)
T(2) = 1 //to make the summation nice
T(N) = log(N) + T(N/2) //log(N) is finding the first/last elements
For some N=2^k,
T(2^k) = (log 2^k) + T(2^(k-1))
= (log 2^k) + (log 2^(k-1)) + T(2^(k-2))
= (log 2^k) + (log 2^(k-1)) + (log 2^(k-2)) + ... + (log 2^2) + 1
= k + (k-1) + (k-2) + ... + 1
= k(k+1)/2
= (k² + k)/2
= (log(N)² + log(N))/ 2
= O(log(N)²)
Look at the middle element of the array. With a couple of appropriate binary searches, you can find the first and its last appearance in the array. E.g., if the middle element is 'a', you need to find i and j as shown below:
[* * * * a a a a * * *]
^ ^
| |
| |
i j
Is j - i an even number? You are done! Otherwise (and this is the key here), the question to ask is i an even or an odd number? Do you see what this piece of knowledge implies? Then the rest is easy.
This answer is in support of the answer posted by "throwawayacct". He deserves the bounty. I spent some time on this question and I'm totally convinced that his proof is correct that you need Ω(log(n)^2) queries to find the number that occurs an odd number of times. I'm convinced because I ended up recreating the exact same argument after only skimming his solution.
In the solution, an adversary creates an input to make life hard for the algorithm, but also simple for a human analyzer. The input consists of k pages that each have k entries. The total number of entries is n = k^2, and it is important that O(log(k)) = O(log(n)) and Ω(log(k)) = Ω(log(n)). To make the input, the adversary makes a string of length k of the form 00...011...1, with the transition in an arbitrary position. Then each symbol in the string is expanded into a page of length k of the form aa...abb...b, where on the ith page, a=i and b=i+1. The transition on each page is also in an arbitrary position, except that the parity agrees with the symbol that the page was expanded from.
It is important to understand the "adversary method" of analyzing an algorithm's worst case. The adversary answers queries about the algorithm's input, without committing to future answers. The answers have to be consistent, and the game is over when the adversary has been pinned down enough for the algorithm to reach a conclusion.
With that background, here are some observations:
1) If you want to learn the parity of a transition in a page by making queries in that page, you have to learn the exact position of the transition and you need Ω(log(k)) queries. Any collection of queries restricts the transition point to an interval, and any interval of length more than 1 has both parities. The most efficient search for the transition in that page is a binary search.
2) The most subtle and most important point: There are two ways to determine the parity of a transition inside a specific page. You can either make enough queries in that page to find the transition, or you can infer the parity if you find the same parity in both an earlier and a later page. There is no escape from this either-or. Any set of queries restricts the transition point in each page to some interval. The only restriction on parities comes from intervals of length 1. Otherwise the transition points are free to wiggle to have any consistent parities.
3) In the adversary method, there are no lucky strikes. For instance, suppose that your first query in some page is toward one end instead of in the middle. Since the adversary hasn't committed to an answer, he's free to put the transition on the long side.
4) The end result is that you are forced to directly probe the parities in Ω(log(k)) pages, and the work for each of these subproblems is also Ω(log(k)).
5) Things are not much better with random choices than with adversarial choices. The math is more complicated, because now you can get partial statistical information, rather than a strict yes you know a parity or no you don't know it. But it makes little difference. For instance, you can give each page length k^2, so that with high probability, the first log(k) queries in each page tell you almost nothing about the parity in that page. The adversary can make random choices at the beginning and it still works.
Start at the middle of the array and walk backward until you get to a value that's different from the one at the center. Check whether the number above that boundary is at an odd or even index. If it's odd, then the number occurring an odd number of times is to the left, so repeat your search between the beginning and the boundary you found. If it's even, then the number occurring an odd number of times must be later in the array, so repeat the search in the right half.
As stated, this has both a logarithmic and a linear component. If you want to keep the whole thing logarithmic, instead of just walking backward through the array to a different value, you want to use a binary search instead. Unless you expect many repetitions of the same numbers, the binary search may not be worthwhile though.
I have an algorithm which works in log(N/C)*log(K), where K is the length of maximum same-value range, and C is the length of range being searched for.
The main difference of this algorithm from most posted before is that it takes advantage of the case where all same-value ranges are short. It finds boundaries not by binary-searching the entire array, but by first quickly finding a rough estimate by jumping back by 1, 2, 4, 8, ... (log(K) iterations) steps, and then binary-searching the resulting range (log(K) again).
The algorithm is as follows (written in C#):
// Finds the start of the range of equal numbers containing the index "index",
// which is assumed to be inside the array
//
// Complexity is O(log(K)) with K being the length of range
static int findRangeStart (int[] arr, int index)
{
int candidate = index;
int value = arr[index];
int step = 1;
// find the boundary for binary search:
while(candidate>=0 && arr[candidate] == value)
{
candidate -= step;
step *= 2;
}
// binary search:
int a = Math.Max(0,candidate);
int b = candidate+step/2;
while(a+1!=b)
{
int c = (a+b)/2;
if(arr[c] == value)
b = c;
else
a = c;
}
return b;
}
// Finds the index after the only "odd" range of equal numbers in the array.
// The result should be in the range (start; end]
// The "end" is considered to always be the end of some equal number range.
static int search(int[] arr, int start, int end)
{
if(arr[start] == arr[end-1])
return end;
int middle = (start+end)/2;
int rangeStart = findRangeStart(arr,middle);
if((rangeStart & 1) == 0)
return search(arr, middle, end);
return search(arr, start, rangeStart);
}
// Finds the index after the only "odd" range of equal numbers in the array
static int search(int[] arr)
{
return search(arr, 0, arr.Length);
}
Take the middle element e. Use binary search to find the first and last occurrence. O(log(n))
If it is odd return e.
Otherwise, recurse onto the side that has an odd number of elements [....]eeee[....]
Runtime will be log(n) + log(n/2) + log(n/4).... = O(log(n)^2).
AHhh. There is an answer.
Do a binary search and as you search, for each value, move backwards until you find the first entry with that same value. If its index is even, it is before the oddball, so move to the right.
If its array index is odd, it is after the oddball, so move to the left.
In pseudocode (this is the general idea, not tested...):
private static int FindOddBall(int[] ary)
{
int l = 0,
r = ary.Length - 1;
int n = (l+r)/2;
while (r > l+2)
{
n = (l + r) / 2;
while (ary[n] == ary[n-1])
n = FindBreakIndex(ary, l, n);
if (n % 2 == 0) // even index we are on or to the left of the oddball
l = n;
else // odd index we are to the right of the oddball
r = n-1;
}
return ary[l];
}
private static int FindBreakIndex(int[] ary, int l, int n)
{
var t = ary[n];
var r = n;
while(ary[n] != t || ary[n] == ary[n-1])
if(ary[n] == t)
{
r = n;
n = (l + r)/2;
}
else
{
l = n;
n = (l + r)/2;
}
return n;
}
You can use this algorithm:
int GetSpecialOne(int[] array, int length)
{
int specialOne = array[0];
for(int i=1; i < length; i++)
{
specialOne ^= array[i];
}
return specialOne;
}
Solved with the help of a similar question which can be found here on http://www.technicalinterviewquestions.net
We don't have any information about the distribution of lenghts inside the array, and of the array as a whole, right?
So the arraylength might be 1, 11, 101, 1001 or something, 1 at least with no upper bound, and must contain at least 1 type of elements ('number') up to (length-1)/2 + 1 elements, for total sizes of 1, 11, 101: 1, 1 to 6, 1 to 51 elements and so on.
Shall we assume every possible size of equal probability? This would lead to a middle length of subarrays of size/4, wouldn't it?
An array of size 5 could be divided into 1, 2 or 3 sublists.
What seems to be obvious is not that obvious, if we go into details.
An array of size 5 can be 'divided' into one sublist in just one way, with arguable right to call it 'dividing'. It's just a list of 5 elements (aaaaa). To avoid confusion let's assume the elements inside the list to be ordered characters, not numbers (a,b,c, ...).
Divided into two sublist, they might be (1, 4), (2, 3), (3, 2), (4, 1). (abbbb, aabbb, aaabb, aaaab).
Now let's look back at the claim made before: Shall the 'division' (5) be assumed the same probability as those 4 divisions into 2 sublists? Or shall we mix them together, and assume every partition as evenly probable, (1/5)?
Or can we calculate the solution without knowing the probability of the length of the sublists?
The clue is you're looking for log(n). That's less than n.
Stepping through the entire array, one at a time? That's n. That's not going to work.
We know the first two indexes in the array (0 and 1) should be the same number. Same with 50 and 51, if the odd number in the array is after them.
So find the middle element in the array, compare it to the element right after it. If the change in numbers happens on the wrong index, we know the odd number in the array is before it; otherwise, it's after. With one set of comparisons, we figure out which half of the array the target is in.
Keep going from there.
Use a hash table
For each element E in the input set
if E is set in the hash table
increment it's value
else
set E in the hash table and initialize it to 0
For each key K in hash table
if K % 2 = 1
return K
As this algorithm is 2n it belongs to O(n)
Try this:
int getOddOccurrence(int ar[], int ar_size)
{
int i;
int xor = 0;
for (i=0; i < ar_size; i++)
xor = xor ^ ar[i];
return res;
}
XOR will cancel out everytime you XOR with the same number so 1^1=0 but 1^1^1=1 so every pair should cancel out leaving the odd number out.
Assume indexing start at 0. Binary search for the smallest even i such that x[i] != x[i+1]; your answer is x[i].
edit: due to public demand, here is the code
int f(int *x, int min, int max) {
int size = max;
min /= 2;
max /= 2;
while (min < max) {
int i = (min + max)/2;
if (i==0 || x[2*i-1] == x[2*i])
min = i+1;
else
max = i-1;
}
if (2*max == size || x[2*max] != x[2*max+1])
return x[2*max];
return x[2*min];
}

Resources