Counting segments without the function Count

Counting segments without the function Count - arrays

I had the next problem: Given an array, count the number of segments of length k in which this happens; the number of positives in the left-half of the segment is bigger or equal to the right-half.
As an example (imagine segments can only be even, so that there is no discussion about what a half is):
k=2 ---> count(array[-4,-2,2,1],k) ---> 2, as [-4,-2] fulfills and also [2,1]
k=4 ---> count(array[-4,-2,2,1],k) ---> 0, as [-4,-2,2,1] does not fulfil.
k=6 ---> count(array[-4,-2,2,1],k) ---> 0, as there are not length 6 segments.
I have solved it recursively, using the function Count, in a trivial way: I move the array from left to right, enumerating all the segments of length k, and applying the count on each of those. It is done in Dafny:
function method Count_segments(sequ: seq<int>, seg_length:int): int
{
if |sequ| == 0 then 0
else (if (Count(x => x >= 0, sequ[0..seg_length/2])) >= (Count(x => x >= 0, sequ[seg_length/2..seg_length])) then
1 + if (|sequ|-1 < seg_length) then 0 //I add the condition that says that if in the next iteration, sequ will be smaller than the sequence_length, then we end.
else Count_segments(sequ[1..], seg_length)
else if (|sequ|-1 < seg_length) then 0
else Count_segments(sequ[1..], seg_length)
)
}
But, obviously, using Count, I am not doing a linear search iteratively (in the first example, instead of searching 4 times, it does 6 times). I would like to implement this in O(n) but cannot find any info, does anyone have an idea? I do not care about the programming language (can answer me in any language), but about the algorithm itself.
Thanks!!

Related

Find the longest subarray that contains a majority element

I am trying to solve this algorithmic problem:
https://dunjudge.me/analysis/problems/469/
For convenience, I have summarized the problem statement below.
Given an array of length (<= 2,000,000) containing integers in the range [0, 1,000,000], find the
longest subarray that contains a majority element.
A majority element is defined as an element that occurs > floor(n/2) times in a list of length n.
Time limit: 1.5s
For example:
If the given array is [1, 2, 1, 2, 3, 2],
The answer is 5 because the subarray [2, 1, 2, 3, 2] of length 5 from position 1 to 5 (0-indexed) has the number 2 which appears 3 > floor(5/2) times. Note that we cannot take the entire array because 3 = floor(6/2).
My attempt:
The first thing that comes to mind is an obvious brute force (but correct) solution which fixes the start and end indexes of a subarray and loop through it to check if it contains a majority element. Then we take the length of the longest subarray that contains a majority element. This works in O(n^2) with a small optimization. Clearly, this will not pass the time limit.
I was also thinking of dividing the elements into buckets that contain their indexes in sorted order.
Using the example above, these buckets would be:
1: 0, 2
2: 1, 3, 5
3: 4
Then for each bucket, I would make an attempt to merge the indexes together to find the longest subarray that contains k as the majority element where k is the integer label of that bucket.
We could then take the maximum length over all values of k. I didn't try out this solution as I didn't know how to perform the merging step.
Could someone please advise me on a better approach to solve this problem?
Edit:
I solved this problem thanks to the answers of PhamTrung and hk6279. Although I accepted the answer from PhamTrung because he first suggested the idea, I highly recommend looking at the answer by hk6279 because his answer elaborates the idea of PhamTrung and is much more detailed (and also comes with a nice formal proof!).

Note: attempt 1 is wrong as #hk6279 has given a counter example. Thanks for pointing it out.
Attempt 1:
The answer is quite complex, so I will discuss a brief idea
Let process each unique number one by one.
Processing each occurrence of number x from left to right, at index i, let add an segment (i, i) indicates the start and end of the current subarray. After that, we need to look to the left side of this segment, and try to merge the left neighbour of this segment into (i, i), (So, if the left is (st, ed), we try to make it become (st, i) if it satisfy the condition) if possible, and continue to merge them until we are not able to merge, or there is no left neighbour.
We keep all those segments in a stack for faster look up/add/remove.
Finally, for each segment, we try to enlarge them as large as possible, and keep the biggest result.
Time complexity should be O(n) as each element could only be merged once.
Attempt 2:
Let process each unique number one by one
For each unique number x, we maintain an array of counter. From 0 to end of the array, if we encounter a value x we increase the count, and if we don't we decrease, so for this array
[0,1,2,0,0,3,4,5,0,0] and number 0, we have this array counter
[1,0,-1,0,1,0,-1,-2,-1,0]
So, in order to make a valid subarray which ends at a specific index i, the value of counter[i] - counter[start - 1] must be greater than 0 (This can be easily explained if you view the array as making from 1 and -1 entries; with 1 is when there is an occurrence of x, -1 otherwise; and the problem can be converted into finding the subarray with sum is positive)
So, with the help of a binary search, the above algo still have an complexity of O(n ^ 2 log n) (in case we have n/2 unique numbers, we need to do the above process n/2 times, each time take O (n log n))
To improve it, we make an observation that, we actually don't need to store all values for all counter, but just the values of counter of x, we saw that we can store for above array counter:
[1,#,#,0,1,#,#,#,-1,0]
This will leads to O (n log n) solution, which only go through each element once.

This elaborate and explain how attempt 2 in #PhamTrung solution is working
To get the length of longest subarray. We should
Find the max. number of majority element in a valid array, denote as m
This is done by attempt 2 in #PhamTrung solution
Return min( 2*m-1, length of given array)
Concept
The attempt is stem from a method to solve longest positive subarray
We maintain an array of counter for each unique number x. We do a +1 when we encounter x. Otherwise, do a -1.
Take array [0,1,2,0,0,3,4,5,0,0,1,0] and unique number 0, we have array counter [1,0,-1,0,1,0,-1,-2,-1,0,-1,0]. If we blind those are not target unique number, we get [1,#,#,0,1,#,#,#,-1,0,#,0].
We can get valid array from the blinded counter array when there exist two counter such that the value of the right counter is greater than or equal to the left one. See Proof part.
To further improve it, we can ignore all # as they are useless and we get [1(0),0(3),1(4),-1(8),0(9),0(11)] in count(index) format.
We can further improve this by not record counter that is greater than its previous effective counter. Take counter of index 8,9 as an example, if you can form subarray with index 9, then you must be able to form subarray with index 8. So, we only need [1(0),0(3),-1(8)] for computation.
You can form valid subarray with current index with all previous index using binary search on counter array by looking for closest value that is less than or equal to current counter value (if found)
Proof
When right counter greater than left counter by r for a particular x, where k,r >=0 , there must be k+r number of x and k number of non x exist after left counter. Thus
The two counter is at index position i and r+2k+i
The subarray form between [i, r+2k+i] has exactly k+r+1 number of x
The subarray length is 2k+r+1
The subarray is valid as (2k+r+1) <= 2 * (k+r+1) -1
Procedure
Let m = 1
Loop the array from left to right
For each index pi
If the number is first encounter,
Create a new counter array [1(pi)]
Create a new index record storing current index value (pi) and counter value (1)
Otherwise, reuse the counter array and index array of the number and perform
Calculate current counter value ci by cprev+2-(pi - pprev), where cprev,pprev are counter value and index value in index record
Perform binary search to find the longest subarray that can be formed with current index position and all previous index position. i.e. Find the closest c, cclosest, in counter array where c<=ci. If not found, jump to step 5
Calculate number of x in the subarray found in step 2
r = ci - cclosest
k = (pi-pclosest-r)/2
number of x = k+r+1
Update counter m by number of x if subarray has number of x > m
Update counter array by append current counter if counter value less than last recorded counter value
Update index record by current index (pi) and counter value (ci)

For completeness, here's an outline of an O(n) theory. Consider the following, where * are characters different from c:
* c * * c * * c c c
i: 0 1 2 3 4 5 6 7 8 9
A plot for adding 1 for c and subtracting 1 for a character other than c could look like:
sum_sequence
0 c c
-1 * * c c
-2 * * c
-3 *
A plot for the minimum of the above sum sequence, seen for c, could look like:
min_sum
0 c * *
-1 * c * *
-2 c c c
Clearly, for each occurrence of c, we are looking for the leftmost occurrence of c with sum_sequence lower than or equal to the current sum_sequence. A non-negative difference would mean c is a majority, and leftmost guarantees the interval is the longest up to our position. (We can extrapolate a maximal length that is bounded by characters other than c from the inner bounds of c as the former can be flexible without affecting the majority.)
Observe that from one occurrence of c to the next, its sum_sequence can decrease by an arbitrary size. However, it can only ever increase by 1 between two consecutive occurrences of c. Rather than each value of min_sum for c, we can record linear segments, marked by cs occurrences. A visual example:
[start_min
\
\
\
\
end_min, start_min
\
\
end_min]
We iterate over occurrences of c and maintain a pointer to the optimal segment of min_sum. Clearly we can derive the next sum_sequence value for c from the previous one since it is exactly diminished by the number of characters in between.
An increase in sum_sequence for c corresponds with a shift of 1 back or no change in the pointer to the optimal min_sum segment. If there is no change in the pointer, we hash the current sum_sequence value as a key to the current pointer value. There can be O(num_occurrences_of_c) such hash keys.
With an arbitrary decrease in c's sum_sequence value, either (1) sum_sequence is lower than the lowest min_sum segment recorded so we add a new, lower segment and update the pointer, or (2) we've seen this exact sum_sequence value before (since all increases are by 1 only) and can use our hash to retrieve the optimal min_sum segment in O(1).
As Matt Timmermans pointed out in the question comments, if we were just to continually update the pointer to the optimal min_sum by iterating over the list, we would still only perform O(1) amortized-time iterations per character occurrence. We see that for each increasing segment of sum_sequence, we can update the pointer in O(1). If we used binary search only for the descents, we would add at most (log k) iterations for every k occurences (this assumes we jump down all the way), which keeps our overall time at O(n).

Algorithm :
Essentially, what Boyer-Moore does is look for a suffix sufsuf of nums where suf[0]suf[0] is the majority element in that suffix. To do this, we maintain a count, which is incremented whenever we see an instance of our current candidate for majority element and decremented whenever we see anything else. Whenever count equals 0, we effectively forget about everything in nums up to the current index and consider the current number as the candidate for majority element. It is not immediately obvious why we can get away with forgetting prefixes of nums - consider the following examples (pipes are inserted to separate runs of nonzero count).
[7, 7, 5, 7, 5, 1 | 5, 7 | 5, 5, 7, 7 | 7, 7, 7, 7]
Here, the 7 at index 0 is selected to be the first candidate for majority element. count will eventually reach 0 after index 5 is processed, so the 5 at index 6 will be the next candidate. In this case, 7 is the true majority element, so by disregarding this prefix, we are ignoring an equal number of majority and minority elements - therefore, 7 will still be the majority element in the suffix formed by throwing away the first prefix.
[7, 7, 5, 7, 5, 1 | 5, 7 | 5, 5, 7, 7 | 5, 5, 5, 5]
Now, the majority element is 5 (we changed the last run of the array from 7s to 5s), but our first candidate is still 7. In this case, our candidate is not the true majority element, but we still cannot discard more majority elements than minority elements (this would imply that count could reach -1 before we reassign candidate, which is obviously false).
Therefore, given that it is impossible (in both cases) to discard more majority elements than minority elements, we are safe in discarding the prefix and attempting to recursively solve the majority element problem for the suffix. Eventually, a suffix will be found for which count does not hit 0, and the majority element of that suffix will necessarily be the same as the majority element of the overall array.
Here's Java Solution :
Time complexity : O(n)
Space complexity : O(1)
public int majorityElement(int[] nums) {
int count = 0;
Integer candidate = null;
for (int num : nums) {
if (count == 0) {
candidate = num;
}
count += (num == candidate) ? 1 : -1;
}
return candidate;
}

How do you reorganize an array within O(n) runtime & O(1) space complexity?

I'm a 'space-complexity' neophyte and was given a problem.
Suppose I have an array of arbitrary integers:
[1,0,4,2,1,0,5]
How would I reorder this array to have all the zeros at one end:
[1,4,2,1,5,0,0]
...and compute the count of non-zero integers (in this case: 5)?
... in O(n) runtime with O(1) space complexity?
I'm not good at this.
My background is more environmental engineering than computer science so I normally think in the abstract.
I thought I could do a sort, then count the non-zero integers.
Then I thought I could merely do a element-per-element copy as I re-arrange the array.
Then I thought something like a bubble sort, switching neighboring elements till I reached the end with the zeroes.
I thought I could save on the 'space-complexity' via shift array-members' addresses, being that the array point points to the array, with offsets to its members.
I either enhance the runtime at the expense of the space complexity or vice versa.
What's the solution?

Two-pointer approach will solve this task and keep within the time and memory constraints.
Start by placing one pointer at the end, another at the start of the array. Then decrement the end pointer until you see the first non-zero element.
Now the main loop:
If the start pointer points to zero, swap it with the value pointed
by the end pointer; then decrement the end pointer.
Always increment the start pointer.
Finish when start pointer becomes greater than or equal to the end
pointer.
Finally, return the position of the start pointer - that's the number of nonzero elements.

This is the Swift code for the smart answer provided by #kfx
func putZeroesToLeft(inout nums: [Int]) {
guard var firstNonZeroIndex: Int = (nums.enumerate().filter { $0.element != 0 }).first?.index else { return }
for index in firstNonZeroIndex..<nums.count {
if nums[index] == 0 {
swap(&nums[firstNonZeroIndex], &nums[index])
firstNonZeroIndex += 1
}
}
}
Time complexity
There are 2 simple (not nested) loops repeated max n times (where n is the length of input array). So time is O(n).
Space complexity
Beside the input array we only use the firstAvailableSlot int var. So the space is definitely a constant: O(1).

As indicated by the other answers, the idea is to have two pointers, p and q, one pointing at the end of the array (specifically at the first nonzero entry from behind) and the other pointing at the beginning of the array. Scan the array with q, each time you hit a 0, swap elements pointed to by p and q, increment p and decrement q (specifically, make it point to the next nonzero entry from behind); iterate as long as p < q.
In C++, you could do something like this:
void rearrange(std::vector<int>& v) {
int p = 0, q = v.size()-1;
// make q point to the right position
while (q >= 0 && !v[q]) --q;
while (p < q) {
if (!v[p]) { // found a zero element
std::swap(v[p], v[q]);
while (q >= 0 && !v[q]) --q; // make q point to the right position
}
++p;
}
}

Start at the far end of the array and work backwards. First scan until you hit a nonzero (if any). Keep track of the location of this nonzero. Keep scanning. Whenever you encounter a zero -- swap. Otherwise increase the count of nonzeros.
A Python implementation:
def consolidateAndCount(nums):
count = 0
#first locate last nonzero
i = len(nums)-1
while nums[i] == 0:
i -=1
if i < 0:
#no nonzeros encountered
return 0
count = 1 #since a nonzero was encountered
for j in range(i-1,-1,-1):
if nums[j] == 0:
#move to end
nums[j], nums[i] = nums[i],nums[j] #swap is constant space
i -=1
else:
count += 1
return count
For example:
>>> nums = [1,0,4,2,1,0,5]
>>> consolidateAndCount(nums)
5
>>> nums
[1, 5, 4, 2, 1, 0, 0]

The suggested answers with 2 pointers and swapping are changing the order of non-zero array elements which is in conflict with the example provided. (Although he doesn't name that restriction explicitly, so maybe it is irrelevant)
Instead, go through the list from left to right and keep track of the number of 0s encountered so far.
Set counter = 0 (zeros encountered so far).
In each step, do the following:
Check if the current element is 0 or not.
If the current element is 0, increment the counter.
Otherwise, move the current element by counter to the left.
Go to the next element.
When you reach the end of the list, overwrite the values from array[end-counter] to the end of the array with 0s.
The number of non-zero integers is the size of the array minus the counted zeros.
This algorithm has O(n) time complexity as we go at most twice through the whole array (array of all 0s; we could modify the update scheme a little to only go through at most exactly once though). It only uses an additional variable for counting which satisfies the O(1) space constraint.

Start with iterating over the array (say, i) and maintaining count of zeros encountered (say zero_count) till now.
Do not increment the iterative counter when the current element is 0. Instead increment zero_count.
Copy the value in i + zero_count index to the current index i.
Terminate the loop when i + zero_count is greater than array length.
Set the remaining array elements to 0.
Pseudo code:
zero_count = 0;
i = 0;
while i + zero_count < arr.length
if (arr[i] == 0) {
zero_count++;
if (i + zero_count < arr.length)
arr[i] = arr[i+zero_count]
} else {
i++;
}
while i < arr.length
arr[i] = 0;
i++;
Additionally, this also preserves the order of non-zero elements in the array,

You can actually solve a more generic problem called the Dutch national flag problem, which is used to in Quicksort. It partitions an array into 3 parts according to a given mid value. First, place all numbers less than mid, then all numbers equal to mid and then all numbers greater than mid.
Then you can pick the mid value as infinity and treat 0 as infinity.
The pseudocode given by the above link:
procedure three-way-partition(A : array of values, mid : value):
i ← 0
j ← 0
n ← size of A - 1
while j ≤ n:
if A[j] < mid:
swap A[i] and A[j]
i ← i + 1
j ← j + 1
else if A[j] > mid:
swap A[j] and A[n]
n ← n - 1
else:
j ← j + 1

Find shortest subarray containing all elements

Suppose you have an array of numbers, and another set of numbers. You have to find the shortest subarray containing all numbers with minimal complexity.
The array can have duplicates, and let's assume the set of numbers does not. It's not ordered - the subarray may contain the set of number in any order.
For example:
Array: 1 2 5 8 7 6 2 6 5 3 8 5
Numbers: 5 7
Then the shortest subarray is obviously Array[2:5] (python notation).
Also, what would you do if you want to avoid sorting the array for some reason (a la online algorithms)?

Proof of a linear-time solution
I will write right-extension to mean increasing the right endpoint of a range by 1, and left-contraction to mean increasing the left endpoint of a range by 1. This answer is a slight variation of Aasmund Eldhuset's answer. The difference here is that once we find the smallest j such that [0, j] contains all interesting numbers, we thereafter consider only ranges that contain all interesting numbers. (It's possible to interpret Aasmund's answer this way, but it's also possible to interpret it as allowing a single interesting number to be lost due to a left-contraction -- an algorithm whose correctness has yet to be established.)
The basic idea is that for each position j, we will find the shortest satisfying range ending at position j, given that we know the shortest satisfying range ending at position j-1.
EDIT: Fixed a glitch in the base case.
Base case: Find the smallest j' such that [0, j'] contains all interesting numbers. By construction, there can be no ranges [0, k < j'] that contain all interesting numbers so we don't need to worry about them further. Now find the smallestlargest i such that [i, j'] contains all interesting numbers (i.e. hold j' fixed). This is the smallest satisfying range ending at position j'.
To find the smallest satisfying range ending at any arbitrary position j, we can right-extend the smallest satisfying range ending at position j-1 by 1 position. This range will necessarily also contain all interesting numbers, though it may not be minimal-length. The fact that we already know this is a satisfying range means that we don't have to worry about extending the range "backwards" to the left, since that can only increase the range over its minimal length (i.e. make the solution worse). The only operations we need to consider are left-contractions that preserve the property of containing all interesting numbers. So the left endpoint of the range should be advanced as far as possible while this property holds. When no more left-contractions can be performed, we have the minimal-length satisfying range ending at j (since further left-contractions clearly cannot make the range satisfying again) and we are done.
Since we perform this for each rightmost position j, we can take the minimum-length range over all rightmost positions to find the overall minimum. This can be done using a nested loop in which j advances on each outer loop cycle. Clearly j advances by 1 n times. Since at any point in time we only ever need the leftmost position of the best range for the previous value of j, we can store this in i and just update it as we go. i starts at 0, is at all times <= j <= n, and only ever advances upwards by 1, meaning it can advance at most n times. Both i and j advance at most n times, meaning that the algorithm is linear-time.
In the following pseudo-code, I've combined both phases into a single loop. We only try to contract the left side if we have reached the stage of having all interesting numbers:
# x[0..m-1] is the array of interesting numbers.
# Load them into a hash/dictionary:
For i from 0 to m-1:
isInteresting[x[i]] = 1
i = 0
nDistinctInteresting = 0
minRange = infinity
For j from 0 to n-1:
If count[a[j]] == 0 and isInteresting[a[j]]:
nDistinctInteresting++
count[a[j]]++
If nDistinctInteresting == m:
# We are in phase 2: contract the left side as far as possible
While count[a[i]] > 1 or not isInteresting[a[i]]:
count[a[i]]--
i++
If j - i < minRange:
(minI, minJ) = (i, j)
count[] and isInteresting[] are hashes/dictionaries (or plain arrays if the numbers involved are small).

This sounds like a problem that is well-suited for a sliding window approach: maintain a window (a subarray) that is gradually expanding and contracting, and use a hashmap to keep track of the number of times each "interesting" number occurs in the window. E.g. start with an empty window, then expand it to include only element 0, then elements 0-1, then 0-2, 0-3, and so on, by adding subsequent elements (and using the hashmap to keep track of which numbers exist in the window). When the hashmap tells you that all interesting numbers exist in the window, you can begin contracting it: e.g. 0-5, 1-5, 2-5, etc., until you find out that the window no longer contains all interesting numbers. Then, you can begin expanding it on the right hand side again, and so on. I'm quite (but not entirely) sure that this would work for your problem, and it can be implemented to run in linear time.

Say the array has n elements, and set has m elements
Sort the array, noting the reverse index (position in the original array)
// O (n log n) time
for each element in given set
find it in the array
// O (m log n) time - log n for binary serch, m times
keep track of the minimum and maximum index for each found element
min - max defines your range
Total time complexity: O ((m+n) log n)

This solution definitely does not run in O(n) time as suggested by some of the pseudocode above, however it is real (Python) code that solves the problem and by my estimates runs in O(n^2):
def small_sub(A, B):
len_A = len(A)
len_B = len(B)
sub_A = []
sub_size = -1
dict_b = {}
for elem in B:
if elem in dict_b:
dict_b[elem] += 1
else:
dict_b.update({elem: 1})
for i in range(0, len_A - len_B + 1):
if A[i] in dict_b:
temp_size, temp_sub = find_sub(A[i:], dict_b.copy())
if (sub_size == -1 or (temp_size != -1 and temp_size < sub_size)):
sub_A = temp_sub
sub_size = temp_size
return sub_size, sub_A
def find_sub(A, dict_b):
index = 0
for i in A:
if len(dict_b) == 0:
break
if i in dict_b:
dict_b[i] -= 1
if dict_b[i] <= 0:
del(dict_b[i])
index += 1
if len(dict_b) > 0:
return -1, {}
else:
return index, A[0:index]

Here's how I solved this problem in linear time using collections.Counter objects
from collections import Counter
def smallest_subsequence(stream, search):
if not search:
return [] # the shortest subsequence containing nothing is nothing
stream_counts = Counter(stream)
search_counts = Counter(search)
minimal_subsequence = None
start = 0
end = 0
subsequence_counts = Counter()
while True:
# while subsequence_counts doesn't have enough elements to cancel out every
# element in search_counts, take the next element from search
while search_counts - subsequence_counts:
if end == len(stream): # if we've reached the end of the list, we're done
return minimal_subsequence
subsequence_counts[stream[end]] += 1
end += 1
# while subsequence_counts has enough elements to cover search_counts, keep
# removing from the start of the sequence
while not search_counts - subsequence_counts:
if minimal_subsequence is None or (end - start) < len(minimal_subsequence):
minimal_subsequence = stream[start:end]
subsequence_counts[stream[start]] -= 1
start += 1
print(smallest_subsequence([1, 2, 5, 8, 7, 6, 2, 6, 5, 3, 8, 5], [5, 7]))
# [5, 8, 7]

Java solution
List<String> paragraph = Arrays.asList("a", "c", "d", "m", "b", "a");
Set<String> keywords = Arrays.asList("a","b");
Subarray result = new Subarray(-1,-1);
Map<String, Integer> keyWordFreq = new HashMap<>();
int numKeywords = keywords.size();
// slide the window to contain the all the keywords**
// starting with [0,0]
for (int left = 0, right = 0 ; right < paragraph.size() ; right++){
// expand right to contain all the keywords
String currRight = paragraph.get(right);
if (keywords.contains(currRight)){
keyWordFreq.put(currRight, keyWordFreq.get(currRight) == null ? 1 : keyWordFreq.get(currRight) + 1);
}
// loop enters when all the keywords are present in the current window
// contract left until the all the keywords are still present
while (keyWordFreq.size() == numKeywords){
String currLeft = paragraph.get(left);
if (keywords.contains(currLeft)){
// remove from the map if its the last available so that loop exists
if (keyWordFreq.get(currLeft).equals(1)){
// now check if current sub array is the smallest
if((result.start == -1 && result.end == -1) || (right - left) < (result.end - result.start)){
result = new Subarray(left, right);
}
keyWordFreq.remove(currLeft);
}else {
// else reduce the frequcency
keyWordFreq.put(currLeft, keyWordFreq.get(currLeft) - 1);
}
}
left++;
}
}
return result;
}

Finding the maximum subsequence binary sets that have an equal number of 1s and 0s

I found the following problem on the internet, and would like to know how I would go about solving it:
You are given an array ' containing 0s and 1s. Find O(n) time and O(1) space algorithm to find the maximum sub sequence which has equal number of 1s and 0s.
Examples:
10101010 -
The longest sub sequence that satisfies the problem is the input itself
1101000 -
The longest sub sequence that satisfies the problem is 110100

Update.
I have to completely rephrase my answer. (If you had upvoted the earlier version, well, you were tricked!)
Lets sum up the easy case again, to get it out of the way:
Find the longest prefix of the bit-string containing
an equal number of 1s and 0s of the
array.
This is trivial: A simple counter is needed, counting how many more 1s we have than 0s, and iterating the bitstring while maintaining this. The position where this counter becomes zero for the last time is the end of the longest sought prefix. O(N) time, O(1) space. (I'm completely convinced by now that this is what the original problem asked for. )
Now lets switch to the more difficult version of the problem: we no longer require subsequences to be prefixes - they can start anywhere.
After some back and forth thought, I thought there might be no linear algorithm for this. For example, consider the prefix "111111111111111111...". Every single 1 of those may be the start of the longest subsequence, there is no candidate subsequence start position that dominates (i.e. always gives better solutions than) any other position, so we can't throw away any of them (O(N) space) and at any step, we must be able to select the best start (which has an equal number of 1s and 0s to the current position) out of linearly many candidates, in O(1) time. It turns out this is doable, and easily doable too, since we can select the candidate based on the running sum of 1s (+1) and 0s (-1), this has at most size N, and we can store the first position we reach each sum in 2N cells - see pmod's answer below (yellowfog's comments and geometric insight too).
Failing to spot this trick, I had replaced a fast but wrong with a slow but sure algorithm, (since correct algorithms are preferable to wrong ones!):
Build an array A with the accumulated number of 1s from the start to that position, e.g. if the bitstring is "001001001", then the array would be [0, 0, 1, 1, 1, 2, 2, 2, 3]. Using this, we can test in O(1) whether the subsequence (i,j), inclusive, is valid: isValid(i, j) = (j - i + 1 == 2 * (A[j] - A[i - 1]), i.e. it is valid if its length is double the amount of 1s in it. For example, the subsequence (3,6) is valid because 6 - 3 + 1 == 2 * A[6] - A[2] = 4.
Plain old double loop:
maxSubsLength = 0
for i = 1 to N - 1
for j = i + 1 to N
if isValid(i, j) ... #maintain maxSubsLength
end
end
This can be sped up a bit using some branch-and-bound by skipping i/j sequences which are shorter than the current maxSubsLength, but asymptotically this is still O(n^2). Slow, but with a big plus on its side: correct!

Strictly speaking, the answer is that no such algorithm exists because the language of strings consisting of an equal number of zeros and ones is not regular.
Of course everyone ignores that fact that storing an integer of magnitude n is O(log n) in space and treats it as O(1) in space. :-) Pretty much all big-O's, including time ones, are full of (or rather empty of) missing log n factors, or equivalently, they assume n is bounded by the size of a machine word, which means you're really looking at a finite problem and everything is O(1).

New solution:
Suppose we have for n-bit input bit-array 2*n-size array to keep position of bit. So, the size of array element must have enough size to keep maximum position number. For 256 input bit array, it's needed 256x2 array of bytes (byte is enough to keep 255 - the maximum position).
Moving from the first position of bit-array we put the position into array starting from the middle of array (index is n) using a rule:
1. Increment the position if we passed "1" bit and decrement when passed "0" bit
2. When meet already initialized array element - don't change it and remember the difference between positions (current minus taken from array element) - this is a size of local maximum sequence.
3. Every time we meet local maximum compare it with the global maximum and update if the latter is less.
For example: bit sequence is 0,0,0,1,0,1
initial array index is n
set arr[n] = 0 (position)
bit 0 -> index--
set arr[n-1] = 1
bit 0 -> index--
set arr[n-2] = 2
bit 0 -> index--
set arr[n-3] = 3
bit 1 -> index++
arr[n-2] already contains 2 -> thus, local max seq is [3,2] becomes abs. maximum
will not overwrite arr[n-2]
bit 0 -> index--
arr[n-3] already contains 3 -> thus, local max seq is [4,3] is not abs. maximum
bit 1 -> index++
arr[n-2] already contains 2 -> thus, local max seq is [5,2] is abs. max
Thus, we passing through the whole bit array only once.
Does this solves the task?
input:
n - number of bits
a[n] - input bit-array
track_pos[2*n] = {0,};
ind = n;
/* start from position 1 since zero has
meaning track_pos[x] is not initialized */
for (i = 1; i < n+1; i++) {
if (track_pos[ind]) {
seq_size = i - track_pos[ind];
if (glob_seq_size < seq_size) {
/* store as interm. result */
glob_seq_size = seq_size;
glob_pos_from = track_pos[ind];
glob_pos_to = i;
}
} else {
track_pos[ind] = i;
}
if (a[i-1])
ind++;
else
ind--;
}
output:
glob_seq_size - length of maximum sequence
glob_pos_from - start position of max sequence
glob_pos_to - end position of max sequence

In this thread ( http://discuss.techinterview.org/default.asp?interview.11.792102.31 ), poster A.F. has given an algorithm that runs in O(n) time and uses O(sqrt(n log n)) bits.

brute force: start with maximum length of the array to count the o's and l's. if o eqals l, you are finished. else reduce search length by 1 and do the algorithm for all subsequences of the reduced length (that is maximium length minus reduced length) and so on. stop when the subtraction is 0.

As was pointed out by user "R..", there is no solution, strictly speaking, unless you ignore the "log n" space complexity. In the following, I will consider that the array length fits in a machine register (e.g. a 64-bit word) and that a machine register has size O(1).
The important point to notice is that if there are more 1's than 0's, then the maximum subsequence that you are looking for necessarily includes all the 0's, and that many 1's. So here the algorithm:
Notations: the array has length n, indices are counted from 0 to n-1.
First pass: count the number of 1's (c1) and 0's (c0). If c1 = c0 then your maximal subsequence is the entire array (end of algorithm). Otherwise, let d be the digit which appears the less often (d = 0 if c0 < c1, otherwise d = 1).
Compute m = min(c0, c1) * 2. This is the size of the subsequence you are looking for.
Second pass: scan the array to find the index j of the first occurrence of d.
Compute k = max(j, n - m). The subsequence starts at index k and has length m.
Note that there could be several solutions (several subsequences of maximal length which match the criterion).
In plain words: assuming that there are more 1's than 0's, then I consider the smallest subsequence which contains all the 0's. By definition, that subsequence is surrounded by bunches of 1's. So I just grab enough 1's from the sides.
Edit: as was pointed out, this does not work... The "important point" is actually wrong.

Try something like this:
/* bit(n) is a macro that returns the nth bit, 0 or 1. len is number of bits */
int c[2] = {0,0};
int d, i, a, b, p;
for(i=0; i<len; i++) c[bit(i)]++;
d = c[1] < c[0];
if (c[d] == 0) return; /* all bits identical; fail */
for(i=0; bit(i)!=d; i++);
a = b = i;
for(p=0; i<len; i++) {
p += 2*bit(i)-1;
if (!p) b = i;
}
if (a == b) { /* account for case where we need bits before the first d */
b = len - 1;
a -= abs(p);
}
printf("maximal subsequence consists of bits %d through %d\n", a, b);
Completely untested but modulo stupid mistakes it should work. Based on my reply to Thomas's answer which failed in certain cases.

New Solution:
Space complexity of O(1) and time complexity O(n^2)
int iStart = 0, iEnd = 0;
int[] arrInput = { 1, 0, 1, 1, 1,0,0,1,0,1,0,0 };
for (int i = 0; i < arrInput.Length; i++)
{
int iCurrEndIndex = i;
int iSum = 0;
for (int j = i; j < arrInput.Length; j++)
{
iSum = (arrInput[j] == 1) ? iSum+1 : iSum-1;
if (iSum == 0)
{
iCurrEndIndex = j;
}
}
if ((iEnd - iStart) < (iCurrEndIndex - i))
{
iEnd = iCurrEndIndex;
iStart = i;
}
}

I am not sure whether the array you are referring is int array of 0's and 1's or bitarray??
If its about bitarray, here is my approach:
int isEvenBitCount(int n)
{
//n ... //Decimal equivalent of the input binary sequence
int cnt1 = 0, cnt0 = 0;
while(n){
if(n&0x01) { printf("1 "); cnt1++;}
else { printf("0 "); cnt0++; }
n = n>>1;
}
printf("\n");
return cnt0 == cnt1;
}
int main()
{
int i = 40, j = 25, k = 35;
isEvenBitCount(i)?printf("-->Yes\n"):printf("-->No\n");
isEvenBitCount(j)?printf("-->Yes\n"):printf("-->No\n");
isEvenBitCount(k)?printf("-->Yes\n"):printf("-->No\n");
}
with use of bitwise operations the time complexity is almost O(1) also.

How can I find a number which occurs an odd number of times in a SORTED array in O(n) time?

I have a question and I tried to think over it again and again... but got nothing so posting the question here. Maybe I could get some view-point of others, to try and make it work...
The question is: we are given a SORTED array, which consists of a collection of values occurring an EVEN number of times, except one, which occurs ODD number of times. We need to find the solution in log n time.
It is easy to find the solution in O(n) time, but it looks pretty tricky to perform in log n time.

Theorem: Every deterministic algorithm for this problem probes Ω(log2 n) memory locations in the worst case.
Proof (completely rewritten in a more formal style):
Let k > 0 be an odd integer and let n = k2. We describe an adversary that forces (log2 (k + 1))2 = Ω(log2 n) probes.
We call the maximal subsequences of identical elements groups. The adversary's possible inputs consist of k length-k segments x1 x2 … xk. For each segment xj, there exists an integer bj ∈ [0, k] such that xj consists of bj copies of j - 1 followed by k - bj copies of j. Each group overlaps at most two segments, and each segment overlaps at most two groups.
Group boundaries
| | | | |
0 0 1 1 1 2 2 3 3
| | | |
Segment boundaries
Wherever there is an increase of two, we assume a double boundary by convention.
Group boundaries
| || | |
0 0 0 2 2 2 2 3 3
Claim: The location of the jth group boundary (1 ≤ j ≤ k) is uniquely determined by the segment xj.
Proof: It's just after the ((j - 1) k + bj)th memory location, and xj uniquely determines bj. //
We say that the algorithm has observed the jth group boundary in case the results of its probes of xj uniquely determine xj. By convention, the beginning and the end of the input are always observed. It is possible for the algorithm to uniquely determine the location of a group boundary without observing it.
Group boundaries
| X | | |
0 0 ? 1 2 2 3 3 3
| | | |
Segment boundaries
Given only 0 0 ?, the algorithm cannot tell for sure whether ? is a 0 or a 1. In context, however, ? must be a 1, as otherwise there would be three odd groups, and the group boundary at X can be inferred. These inferences could be problematic for the adversary, but it turns out that they can be made only after the group boundary in question is "irrelevant".
Claim: At any given point during the algorithm's execution, consider the set of group boundaries that it has observed. Exactly one consecutive pair is at odd distance, and the odd group lies between them.
Proof: Every other consecutive pair bounds only even groups. //
Define the odd-length subsequence bounded by the special consecutive pair to be the relevant subsequence.
Claim: No group boundary in the interior of the relevant subsequence is uniquely determined. If there is at least one such boundary, then the identity of the odd group is not uniquely determined.
Proof: Without loss of generality, assume that each memory location not in the relevant subsequence has been probed and that each segment contained in the relevant subsequence has exactly one location that has not been probed. Suppose that the jth group boundary (call it B) lies in the interior of the relevant subsequence. By hypothesis, the probes to xj determine B's location up to two consecutive possibilities. We call the one at odd distance from the left observed boundary odd-left and the other odd-right. For both possibilities, we work left to right and fix the location of every remaining interior group boundary so that the group to its left is even. (We can do this because they each have two consecutive possibilities as well.) If B is at odd-left, then the group to its left is the unique odd group. If B is at odd-right, then the last group in the relevant subsequence is the unique odd group. Both are valid inputs, so the algorithm has uniquely determined neither the location of B nor the odd group. //
Example:
Observed group boundaries; relevant subsequence marked by […]
[ ] |
0 0 Y 1 1 Z 2 3 3
| | | |
Segment boundaries
Possibility #1: Y=0, Z=2
Possibility #2: Y=1, Z=2
Possibility #3: Y=1, Z=1
As a consequence of this claim, the algorithm, regardless of how it works, must narrow the relevant subsequence to one group. By definition, it therefore must observe some group boundaries. The adversary now has the simple task of keeping open as many possibilities as it can.
At any given point during the algorithm's execution, the adversary is internally committed to one possibility for each memory location outside of the relevant subsequence. At the beginning, the relevant subsequence is the entire input, so there are no initial commitments. Whenever the algorithm probes an uncommitted location of xj, the adversary must commit to one of two values: j - 1, or j. If it can avoid letting the jth boundary be observed, it chooses a value that leaves at least half of the remaining possibilities (with respect to observation). Otherwise, it chooses so as to keep at least half of the groups in the relevant interval and commits values for the others.
In this way, the adversary forces the algorithm to observe at least log2 (k + 1) group boundaries, and in observing the jth group boundary, the algorithm is forced to make at least log2 (k + 1) probes.
Extensions:
This result extends straightforwardly to randomized algorithms by randomizing the input, replacing "at best halved" (from the algorithm's point of view) with "at best halved in expectation", and applying standard concentration inequalities.
It also extends to the case where no group can be larger than s copies; in this case the lower bound is Ω(log n log s).

A sorted array suggests a binary search. We have to redefine equality and comparison. Equality simple means an odd number of elements. We can do comparison by observing the index of the first or last element of the group. The first element will be an even index (0-based) before the odd group, and an odd index after the odd group. We can find the first and last elements of a group using binary search. The total cost is O((log N)²).
PROOF OF O((log N)²)
T(2) = 1 //to make the summation nice
T(N) = log(N) + T(N/2) //log(N) is finding the first/last elements
For some N=2^k,
T(2^k) = (log 2^k) + T(2^(k-1))
= (log 2^k) + (log 2^(k-1)) + T(2^(k-2))
= (log 2^k) + (log 2^(k-1)) + (log 2^(k-2)) + ... + (log 2^2) + 1
= k + (k-1) + (k-2) + ... + 1
= k(k+1)/2
= (k² + k)/2
= (log(N)² + log(N))/ 2
= O(log(N)²)

Look at the middle element of the array. With a couple of appropriate binary searches, you can find the first and its last appearance in the array. E.g., if the middle element is 'a', you need to find i and j as shown below:
[* * * * a a a a * * *]
^ ^
| |
| |
i j
Is j - i an even number? You are done! Otherwise (and this is the key here), the question to ask is i an even or an odd number? Do you see what this piece of knowledge implies? Then the rest is easy.

This answer is in support of the answer posted by "throwawayacct". He deserves the bounty. I spent some time on this question and I'm totally convinced that his proof is correct that you need Ω(log(n)^2) queries to find the number that occurs an odd number of times. I'm convinced because I ended up recreating the exact same argument after only skimming his solution.
In the solution, an adversary creates an input to make life hard for the algorithm, but also simple for a human analyzer. The input consists of k pages that each have k entries. The total number of entries is n = k^2, and it is important that O(log(k)) = O(log(n)) and Ω(log(k)) = Ω(log(n)). To make the input, the adversary makes a string of length k of the form 00...011...1, with the transition in an arbitrary position. Then each symbol in the string is expanded into a page of length k of the form aa...abb...b, where on the ith page, a=i and b=i+1. The transition on each page is also in an arbitrary position, except that the parity agrees with the symbol that the page was expanded from.
It is important to understand the "adversary method" of analyzing an algorithm's worst case. The adversary answers queries about the algorithm's input, without committing to future answers. The answers have to be consistent, and the game is over when the adversary has been pinned down enough for the algorithm to reach a conclusion.
With that background, here are some observations:
1) If you want to learn the parity of a transition in a page by making queries in that page, you have to learn the exact position of the transition and you need Ω(log(k)) queries. Any collection of queries restricts the transition point to an interval, and any interval of length more than 1 has both parities. The most efficient search for the transition in that page is a binary search.
2) The most subtle and most important point: There are two ways to determine the parity of a transition inside a specific page. You can either make enough queries in that page to find the transition, or you can infer the parity if you find the same parity in both an earlier and a later page. There is no escape from this either-or. Any set of queries restricts the transition point in each page to some interval. The only restriction on parities comes from intervals of length 1. Otherwise the transition points are free to wiggle to have any consistent parities.
3) In the adversary method, there are no lucky strikes. For instance, suppose that your first query in some page is toward one end instead of in the middle. Since the adversary hasn't committed to an answer, he's free to put the transition on the long side.
4) The end result is that you are forced to directly probe the parities in Ω(log(k)) pages, and the work for each of these subproblems is also Ω(log(k)).
5) Things are not much better with random choices than with adversarial choices. The math is more complicated, because now you can get partial statistical information, rather than a strict yes you know a parity or no you don't know it. But it makes little difference. For instance, you can give each page length k^2, so that with high probability, the first log(k) queries in each page tell you almost nothing about the parity in that page. The adversary can make random choices at the beginning and it still works.

Start at the middle of the array and walk backward until you get to a value that's different from the one at the center. Check whether the number above that boundary is at an odd or even index. If it's odd, then the number occurring an odd number of times is to the left, so repeat your search between the beginning and the boundary you found. If it's even, then the number occurring an odd number of times must be later in the array, so repeat the search in the right half.
As stated, this has both a logarithmic and a linear component. If you want to keep the whole thing logarithmic, instead of just walking backward through the array to a different value, you want to use a binary search instead. Unless you expect many repetitions of the same numbers, the binary search may not be worthwhile though.

I have an algorithm which works in log(N/C)*log(K), where K is the length of maximum same-value range, and C is the length of range being searched for.
The main difference of this algorithm from most posted before is that it takes advantage of the case where all same-value ranges are short. It finds boundaries not by binary-searching the entire array, but by first quickly finding a rough estimate by jumping back by 1, 2, 4, 8, ... (log(K) iterations) steps, and then binary-searching the resulting range (log(K) again).
The algorithm is as follows (written in C#):
// Finds the start of the range of equal numbers containing the index "index",
// which is assumed to be inside the array
//
// Complexity is O(log(K)) with K being the length of range
static int findRangeStart (int[] arr, int index)
{
int candidate = index;
int value = arr[index];
int step = 1;
// find the boundary for binary search:
while(candidate>=0 && arr[candidate] == value)
{
candidate -= step;
step *= 2;
}
// binary search:
int a = Math.Max(0,candidate);
int b = candidate+step/2;
while(a+1!=b)
{
int c = (a+b)/2;
if(arr[c] == value)
b = c;
else
a = c;
}
return b;
}
// Finds the index after the only "odd" range of equal numbers in the array.
// The result should be in the range (start; end]
// The "end" is considered to always be the end of some equal number range.
static int search(int[] arr, int start, int end)
{
if(arr[start] == arr[end-1])
return end;
int middle = (start+end)/2;
int rangeStart = findRangeStart(arr,middle);
if((rangeStart & 1) == 0)
return search(arr, middle, end);
return search(arr, start, rangeStart);
}
// Finds the index after the only "odd" range of equal numbers in the array
static int search(int[] arr)
{
return search(arr, 0, arr.Length);
}

Take the middle element e. Use binary search to find the first and last occurrence. O(log(n))
If it is odd return e.
Otherwise, recurse onto the side that has an odd number of elements [....]eeee[....]
Runtime will be log(n) + log(n/2) + log(n/4).... = O(log(n)^2).

AHhh. There is an answer.
Do a binary search and as you search, for each value, move backwards until you find the first entry with that same value. If its index is even, it is before the oddball, so move to the right.
If its array index is odd, it is after the oddball, so move to the left.
In pseudocode (this is the general idea, not tested...):
private static int FindOddBall(int[] ary)
{
int l = 0,
r = ary.Length - 1;
int n = (l+r)/2;
while (r > l+2)
{
n = (l + r) / 2;
while (ary[n] == ary[n-1])
n = FindBreakIndex(ary, l, n);
if (n % 2 == 0) // even index we are on or to the left of the oddball
l = n;
else // odd index we are to the right of the oddball
r = n-1;
}
return ary[l];
}
private static int FindBreakIndex(int[] ary, int l, int n)
{
var t = ary[n];
var r = n;
while(ary[n] != t || ary[n] == ary[n-1])
if(ary[n] == t)
{
r = n;
n = (l + r)/2;
}
else
{
l = n;
n = (l + r)/2;
}
return n;
}

You can use this algorithm:
int GetSpecialOne(int[] array, int length)
{
int specialOne = array[0];
for(int i=1; i < length; i++)
{
specialOne ^= array[i];
}
return specialOne;
}
Solved with the help of a similar question which can be found here on http://www.technicalinterviewquestions.net

We don't have any information about the distribution of lenghts inside the array, and of the array as a whole, right?
So the arraylength might be 1, 11, 101, 1001 or something, 1 at least with no upper bound, and must contain at least 1 type of elements ('number') up to (length-1)/2 + 1 elements, for total sizes of 1, 11, 101: 1, 1 to 6, 1 to 51 elements and so on.
Shall we assume every possible size of equal probability? This would lead to a middle length of subarrays of size/4, wouldn't it?
An array of size 5 could be divided into 1, 2 or 3 sublists.
What seems to be obvious is not that obvious, if we go into details.
An array of size 5 can be 'divided' into one sublist in just one way, with arguable right to call it 'dividing'. It's just a list of 5 elements (aaaaa). To avoid confusion let's assume the elements inside the list to be ordered characters, not numbers (a,b,c, ...).
Divided into two sublist, they might be (1, 4), (2, 3), (3, 2), (4, 1). (abbbb, aabbb, aaabb, aaaab).
Now let's look back at the claim made before: Shall the 'division' (5) be assumed the same probability as those 4 divisions into 2 sublists? Or shall we mix them together, and assume every partition as evenly probable, (1/5)?
Or can we calculate the solution without knowing the probability of the length of the sublists?

The clue is you're looking for log(n). That's less than n.
Stepping through the entire array, one at a time? That's n. That's not going to work.
We know the first two indexes in the array (0 and 1) should be the same number. Same with 50 and 51, if the odd number in the array is after them.
So find the middle element in the array, compare it to the element right after it. If the change in numbers happens on the wrong index, we know the odd number in the array is before it; otherwise, it's after. With one set of comparisons, we figure out which half of the array the target is in.
Keep going from there.

Use a hash table
For each element E in the input set
if E is set in the hash table
increment it's value
else
set E in the hash table and initialize it to 0
For each key K in hash table
if K % 2 = 1
return K
As this algorithm is 2n it belongs to O(n)

Try this:
int getOddOccurrence(int ar[], int ar_size)
{
int i;
int xor = 0;
for (i=0; i < ar_size; i++)
xor = xor ^ ar[i];
return res;
}
XOR will cancel out everytime you XOR with the same number so 1^1=0 but 1^1^1=1 so every pair should cancel out leaving the odd number out.

Assume indexing start at 0. Binary search for the smallest even i such that x[i] != x[i+1]; your answer is x[i].
edit: due to public demand, here is the code
int f(int *x, int min, int max) {
int size = max;
min /= 2;
max /= 2;
while (min < max) {
int i = (min + max)/2;
if (i==0 || x[2*i-1] == x[2*i])
min = i+1;
else
max = i-1;
}
if (2*max == size || x[2*max] != x[2*max+1])
return x[2*max];
return x[2*min];
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight