Shortest unique subsequence that distinguish a set of strings - arrays

I have N strings of bits (each of size M) X1[0..M], ..., XN[0..M]. I need the pseudocode/algorithm to find the smallest length subsequence (not necessarily contiguous) that is unique in each given string. For example,
If the strings are 011011, 011000, 010010 , the subsequence [2,4] (11, 10, 01) is different in each string. Or the subsequence [2, 4, 5] (111, 100, 010) . Or the subsequence [4, 5] (11, 00, 10).
But not the subsequence [0, 1, 5] (011, 010, 010) ---> Not unique in each string.
EDIT : 1 <= M <= 1000, 2 <= N <= 10.
EDIT : Currently, my solution is this :
The minimum length of subsequence will range between ceil(log2(N)) and N-1.
So, the pseudocode will be :
for i = ceil(log2(N)) to N-1 :
check all subsequence of size i
if any subsequence distinguish all N strings, return i
The first step can be done by generating all combinations mCi.
The second step can be done by extracting the subsequence for all N strings and checking if all of them are distinct.
But this algorithm is currently exponential complexity. I wanted to know if a better algorithm is possible.
EDIT : No, It isn't homework. It was asked in an interview.

i think something like this would work:
first:
create matriz A (mxm) and array B(m)
for each bit i from right to left, compute de decimal value of j word in A[i][j]
//that means A[i][j] holds the decimal value of word j until the i bit
in the same loop B[i] will hold if bit i from all words are the same.
if B[i] = true, it means that we dont need to look that position, cos it says nothing.
create deque D//to check if there is equal elements
create array C(m)
for each position P in [0...M] where B[i] = false :
for each bit i = P ... 0
for each word j
C[j] = C[j]*2 + word[j][i] //word[j][i] = word j in bit i
bool finished = true;
for each e in C:
if(D.count(e) > 0) {
finished = false;
break;
}
else{
D.add(e)
}
}
if(finished) return range(P...i);
D.clear()
not possible;
what this algorithm does is: starting from useful positions, it starts creating value for words from them, and in the moment you are able to add all of them in the deque (all of them are different), you are done finding a range where they differ (range is P - i + 1 sized).
You have to run this anyway for all i where B[i] = false, so in the worst case it should run about n³.
Note that there are some optimizations that can be done knowing the number of strings and their size, for example: if there are 10 strings of size 3, you know its impossible to distinguish (cos there arent different 10 different binaries of size 3). Given the number of strings, you can search only for (contiguous or not) sizes ceil(log(number of strings)). For example, 5 words cant differ in one bit, also they cant differ in 2 bits, but with 3 bits they can differ.

Related

Find the first missing positive integer

Given an array of integers, find the first missing positive integer in linear time and constant space. In other words, find the lowest positive integer that does not exist in the array. The array can contain duplicates and negative numbers as well.
For example, the input [3, 4, -1, 1] should give 2. The input [1, 2, 0] should give 3.
I did this but could not get through it and then searched it on google and got an answer on geeks for geeks but could not understand it. Can anyone provide a logic for this using simple concepts? I have just started competitive programming.
One way to find the solution is to rearrange the array, and then finding the first
number misplaced:
int find_missing(std::vector<int>& v)
{
for (std::size_t i = 0; i != v.size(); ++i) {
std::size_t e = i;
while (0 < v[e] // Correct range
&& std::size_t(v[e]) <= v.size() // Correct range
&& std::size_t(v[e]) != e + 1 // Correct place
&& v[e] != v[v[e] - 1] // Duplicate
) {
std::swap(v[e], v[v[e] - 1]);
}
}
// Now the array look like
// {1, 2, 3, x, 5, 6, x}
// Find first misplaced number
for (std::size_t i = 0; i != v.size(); ++i) {
if (std::size_t(v[i]) != i + 1) {
return i + 1;
}
}
// All are correctly placed:
return v.size();
}
Demo
If a bitmap (extension of bitmask) is acceptable, then we could use 1 bit per positive integer and then just scroll the array. The bitmap is initialized with all bits to 0. As we scroll the array, we ignore negatives and turn the nth bit on when we encounter n. When we find, for example, 13, we turn the 13th bit into 1. (Likewise the number 1 would turn the first bit to 1) Then we scroll the bitmask and check the first zero. Done.
However, this might not be considered a constant complexity at all, since when the max positive int is MAXINT, we need the bitmap to be MAXINT bits large. Too bad. In theory, though, this is correct. Also O(2*N) = O(N)
So we have to store some information in the array or this is impossible to solve in O(N) in a single go.
Another solution consists in mapping array index with integer and storing information using sign. If the array size is L, for example, the missing int will be less or equal to L+1 (L+1 when the array if full like [1,2,3,4], unless this case counts as no element missing). Thanks Jarod for the hint on this.
Considered O(3N) is still O(N), how about:
step 1: scroll the array and swap negatives and zeroes moving them to the beginning. Turn everything non positive, that was swapped this way, to 1. The authentic positives will start at index j.
step 2: The whole array is now positive but true data lies from j to the end of the array. Scroll the subarray with authentic data and when you find, say, number H, turn index the Hth indexed number of the whole array negative. If H is greater than the array size, skip it. When you find for example 2, turn arr[1] (second element) negative.
step 3: scroll again the array checking for the first positive number. Basing on the index you know what the first missing positive integer is.

Number of subarrays with same 'degree' as the array

So this problem was asked in a quiz and the problem goes like:
You are given an array 'a' with elements ranging from 1-106 and the size of array could be maximum 105 Now we are asked to find the number of subarrays with the same 'degree' as the original array. Degree of an array is defined as the frequency of maximum occurring element in the array. Multiple elements could have the same frequency.
I was stuck in this problem for like an hour but couldn't think of any solution. How do I solve it?
Sample Input:
first-input
1,2,2,3,1
first-output 2
second-input
1,1,2,1,2,2
second-output 4
The element that occurs most frequently is called the mode; this problem defines degree as the frequency count. Your tasks are:
Identify all of the mode values.
For each mode value, find the index range of that value. For instance, in the array
[1, 1, 2, 1, 3, 3, 2, 4, 2, 4, 5, 5, 5]
You have three modes (1 2 5) with a degree of 3. The index ranges are
1 - 0:3
2 - 2:8
5 - 10:12
You need to count all index ranges (subarrays) that include at least one of those three ranges.
I've tailored this example to have both basic cases: modes that overlap, and those that do not. Note that containment is a moot point: if you have an array where one mode's range contains another:
[0, 1, 1, 1, 0, 0]
You can ignore the outer one altogether: any subarray that contains 0 will also contain 1.
ANALYSIS
A subarray is defined by two numbers, the starting and ending indices. Since we must have 0 <= start <= end <= len(array), this is the "handshake" problem between array bounds. We have N(N+1)/2 possible subarrays.
For 10**5 elements, you could just brute-force the problem from here: for each pair of indices, check to see whether that range contains any of the mode ranges. However, you can easily cut that down with interval recognition.
ALGORITHM
Step through the mode ranges, left to right. First, count all subranges that include the first mode range [0:3]. There is only 1 possible starts [0] and 10 possible ends [3:12]; that's 10 subarrays.
Now move to the second mode range, [2:8]. You need to count subarrays that include this, but exclude those you've already counted. Since there's an overlap, you need a starting point later than 0, or an ending point before 3. This second clause is not possible with the given range.
Thus, you consider start [1:2], end [8:12]. That's 2 * 5 more subarrays.
For the third range [10:12 (no overlap), you need a starting point that does not include any other subrange. This means that any starting point [3:10] will do. Since there's only one possible endpoint, you have 8*1, or 8 more subarrays.
Can you turn this into something formal?
Taking reference from leet code
https://leetcode.com/problems/degree-of-an-array/solution/
solve
class Solution {
public int findShortestSubArray(int[] nums) {
Map<Integer, Integer> left = new HashMap(),
right = new HashMap(), count = new HashMap();
for (int i = 0; i < nums.length; i++) {
int x = nums[i];
if (left.get(x) == null) left.put(x, i);
right.put(x, i);
count.put(x, count.getOrDefault(x, 0) + 1);
}
int ans = nums.length;
int degree = Collections.max(count.values());
for (int x: count.keySet()) {
if (count.get(x) == degree) {
ans = Math.min(ans, right.get(x) - left.get(x) + 1);
}
}
return ans;
}
}

Is there an O(n) algorithm to generate a prefix-less array for an positive integer array?

For array [4,3,5,1,2],
we call prefix of 4 is NULL, prefix-less of 4 is 0;
prefix of 3 is [4], prefix-less of 3 is 0, because none in prefix is less than 3;
prefix of 5 is [4,3], prefix-less of 5 is 2, because 4 and 3 are both less than 5;
prefix of 1 is [4,3,5], prefix-less of 1 is 0, because none in prefix is less than 1;
prefix of 2 is [4,3,5,1], prefix-less of 2 is 1, because only 1 is less than 2
So for array [4, 3, 5, 1, 2], we get prefix-less arrary of [0,0, 2,0,1],
Can we get an O(n) algorithm to get prefix-less array?
It can't be done in O(n) for the same reasons a comparison sort requires O(n log n) comparisons. The number of possible prefix-less arrays is n! so you need at least log2(n!) bits of information to identify the correct prefix-less array. log2(n!) is O(n log n), by Stirling's approximation.
Assuming that the input elements are always fixed-width integers you can use a technique based on radix sort to achieve linear time:
L is the input array
X is the list of indexes of L in focus for current pass
n is the bit we are currently working on
Count is the number of 0 bits at bit n left of current location
Y is the list of indexs of a subsequence of L for recursion
P is a zero initialized array that is the output (the prefixless array)
In pseudo-code...
Def PrefixLess(L, X, n)
if (n == 0)
return;
// setup prefix less for bit n
Count = 0
For I in 1 to |X|
P(I) += Count
If (L(X(I))[n] == 0)
Count++;
// go through subsequence with bit n-1 with bit(n) = 1
Y = []
For I in 1 to |X|
If (L(X(I))[n] == 1)
Y.append(X(I))
PrefixLess(L, Y, n-1)
// go through subsequence on bit n-1 where bit(n) = 0
Y = []
For I in 1 to |X|
If (L(X(I))[n] == 0)
Y.append(X(I))
PrefixLess(L, Y, n-1)
return P
and then execute:
PrefixLess(L, 1..|L|, 32)
I think this should work, but double check the details. Let's call an element in the original array a[i] and one in the prefix array as p[i] where i is the ith element of the respective arrays.
So, say we are at a[i] and we have already computed the value of p[i]. There are three possible cases. If a[i] == a[i+1], then p[i] == p[i+1]. If a[i] < a[i+1], then p[i+1] >= p[i] + 1. This leaves us with the case where a[i] > a[i+1]. In this situation we know that p[i+1] >= p[i].
In the naïve case, we go back through the prefix and start counting items less than a[i]. However, we can do better than that. First, recognize that the minimum value for p[i] is 0 and the maximum is i. Next look at the case of an index j, where i > j. If a[i] >= a[j], then p[i] >= p[j]. If a[i] < a[j], then p[i] <= p[j] + j . So, we can start going backwards through p updating the values for p[i]_min and p[i]_max. If p[i]_min equals p[i]_max, then we have our solution.
Doing a back of the envelope analysis of the algorithm, it has O(n) best case performance. This is the case where the list is already sorted. The worst case is where it is reversed sorted. Then the performance is O(n^2). The average performance is going to be O(k*n) where k is how much one needs to backtrack. My guess is for randomly distributed integers, k will be small.
I am also pretty sure there would be ways to optimize this algorithm for cases of partially sorted data. I would look at Timsort for some inspiration on how to do this. It uses run detection to detect partially sorted data. So the basic idea for the algorithm would be to go through the list once and look for runs of data. For ascending runs of data you are going to have the case where p[i+1] = p[i]+1. For descending runs, p[i] = p_run[0] where p_run is the first element in the run.

Find shortest subarray containing all elements

Suppose you have an array of numbers, and another set of numbers. You have to find the shortest subarray containing all numbers with minimal complexity.
The array can have duplicates, and let's assume the set of numbers does not. It's not ordered - the subarray may contain the set of number in any order.
For example:
Array: 1 2 5 8 7 6 2 6 5 3 8 5
Numbers: 5 7
Then the shortest subarray is obviously Array[2:5] (python notation).
Also, what would you do if you want to avoid sorting the array for some reason (a la online algorithms)?
Proof of a linear-time solution
I will write right-extension to mean increasing the right endpoint of a range by 1, and left-contraction to mean increasing the left endpoint of a range by 1. This answer is a slight variation of Aasmund Eldhuset's answer. The difference here is that once we find the smallest j such that [0, j] contains all interesting numbers, we thereafter consider only ranges that contain all interesting numbers. (It's possible to interpret Aasmund's answer this way, but it's also possible to interpret it as allowing a single interesting number to be lost due to a left-contraction -- an algorithm whose correctness has yet to be established.)
The basic idea is that for each position j, we will find the shortest satisfying range ending at position j, given that we know the shortest satisfying range ending at position j-1.
EDIT: Fixed a glitch in the base case.
Base case: Find the smallest j' such that [0, j'] contains all interesting numbers. By construction, there can be no ranges [0, k < j'] that contain all interesting numbers so we don't need to worry about them further. Now find the smallestlargest i such that [i, j'] contains all interesting numbers (i.e. hold j' fixed). This is the smallest satisfying range ending at position j'.
To find the smallest satisfying range ending at any arbitrary position j, we can right-extend the smallest satisfying range ending at position j-1 by 1 position. This range will necessarily also contain all interesting numbers, though it may not be minimal-length. The fact that we already know this is a satisfying range means that we don't have to worry about extending the range "backwards" to the left, since that can only increase the range over its minimal length (i.e. make the solution worse). The only operations we need to consider are left-contractions that preserve the property of containing all interesting numbers. So the left endpoint of the range should be advanced as far as possible while this property holds. When no more left-contractions can be performed, we have the minimal-length satisfying range ending at j (since further left-contractions clearly cannot make the range satisfying again) and we are done.
Since we perform this for each rightmost position j, we can take the minimum-length range over all rightmost positions to find the overall minimum. This can be done using a nested loop in which j advances on each outer loop cycle. Clearly j advances by 1 n times. Since at any point in time we only ever need the leftmost position of the best range for the previous value of j, we can store this in i and just update it as we go. i starts at 0, is at all times <= j <= n, and only ever advances upwards by 1, meaning it can advance at most n times. Both i and j advance at most n times, meaning that the algorithm is linear-time.
In the following pseudo-code, I've combined both phases into a single loop. We only try to contract the left side if we have reached the stage of having all interesting numbers:
# x[0..m-1] is the array of interesting numbers.
# Load them into a hash/dictionary:
For i from 0 to m-1:
isInteresting[x[i]] = 1
i = 0
nDistinctInteresting = 0
minRange = infinity
For j from 0 to n-1:
If count[a[j]] == 0 and isInteresting[a[j]]:
nDistinctInteresting++
count[a[j]]++
If nDistinctInteresting == m:
# We are in phase 2: contract the left side as far as possible
While count[a[i]] > 1 or not isInteresting[a[i]]:
count[a[i]]--
i++
If j - i < minRange:
(minI, minJ) = (i, j)
count[] and isInteresting[] are hashes/dictionaries (or plain arrays if the numbers involved are small).
This sounds like a problem that is well-suited for a sliding window approach: maintain a window (a subarray) that is gradually expanding and contracting, and use a hashmap to keep track of the number of times each "interesting" number occurs in the window. E.g. start with an empty window, then expand it to include only element 0, then elements 0-1, then 0-2, 0-3, and so on, by adding subsequent elements (and using the hashmap to keep track of which numbers exist in the window). When the hashmap tells you that all interesting numbers exist in the window, you can begin contracting it: e.g. 0-5, 1-5, 2-5, etc., until you find out that the window no longer contains all interesting numbers. Then, you can begin expanding it on the right hand side again, and so on. I'm quite (but not entirely) sure that this would work for your problem, and it can be implemented to run in linear time.
Say the array has n elements, and set has m elements
Sort the array, noting the reverse index (position in the original array)
// O (n log n) time
for each element in given set
find it in the array
// O (m log n) time - log n for binary serch, m times
keep track of the minimum and maximum index for each found element
min - max defines your range
Total time complexity: O ((m+n) log n)
This solution definitely does not run in O(n) time as suggested by some of the pseudocode above, however it is real (Python) code that solves the problem and by my estimates runs in O(n^2):
def small_sub(A, B):
len_A = len(A)
len_B = len(B)
sub_A = []
sub_size = -1
dict_b = {}
for elem in B:
if elem in dict_b:
dict_b[elem] += 1
else:
dict_b.update({elem: 1})
for i in range(0, len_A - len_B + 1):
if A[i] in dict_b:
temp_size, temp_sub = find_sub(A[i:], dict_b.copy())
if (sub_size == -1 or (temp_size != -1 and temp_size < sub_size)):
sub_A = temp_sub
sub_size = temp_size
return sub_size, sub_A
def find_sub(A, dict_b):
index = 0
for i in A:
if len(dict_b) == 0:
break
if i in dict_b:
dict_b[i] -= 1
if dict_b[i] <= 0:
del(dict_b[i])
index += 1
if len(dict_b) > 0:
return -1, {}
else:
return index, A[0:index]
Here's how I solved this problem in linear time using collections.Counter objects
from collections import Counter
def smallest_subsequence(stream, search):
if not search:
return [] # the shortest subsequence containing nothing is nothing
stream_counts = Counter(stream)
search_counts = Counter(search)
minimal_subsequence = None
start = 0
end = 0
subsequence_counts = Counter()
while True:
# while subsequence_counts doesn't have enough elements to cancel out every
# element in search_counts, take the next element from search
while search_counts - subsequence_counts:
if end == len(stream): # if we've reached the end of the list, we're done
return minimal_subsequence
subsequence_counts[stream[end]] += 1
end += 1
# while subsequence_counts has enough elements to cover search_counts, keep
# removing from the start of the sequence
while not search_counts - subsequence_counts:
if minimal_subsequence is None or (end - start) < len(minimal_subsequence):
minimal_subsequence = stream[start:end]
subsequence_counts[stream[start]] -= 1
start += 1
print(smallest_subsequence([1, 2, 5, 8, 7, 6, 2, 6, 5, 3, 8, 5], [5, 7]))
# [5, 8, 7]
Java solution
List<String> paragraph = Arrays.asList("a", "c", "d", "m", "b", "a");
Set<String> keywords = Arrays.asList("a","b");
Subarray result = new Subarray(-1,-1);
Map<String, Integer> keyWordFreq = new HashMap<>();
int numKeywords = keywords.size();
// slide the window to contain the all the keywords**
// starting with [0,0]
for (int left = 0, right = 0 ; right < paragraph.size() ; right++){
// expand right to contain all the keywords
String currRight = paragraph.get(right);
if (keywords.contains(currRight)){
keyWordFreq.put(currRight, keyWordFreq.get(currRight) == null ? 1 : keyWordFreq.get(currRight) + 1);
}
// loop enters when all the keywords are present in the current window
// contract left until the all the keywords are still present
while (keyWordFreq.size() == numKeywords){
String currLeft = paragraph.get(left);
if (keywords.contains(currLeft)){
// remove from the map if its the last available so that loop exists
if (keyWordFreq.get(currLeft).equals(1)){
// now check if current sub array is the smallest
if((result.start == -1 && result.end == -1) || (right - left) < (result.end - result.start)){
result = new Subarray(left, right);
}
keyWordFreq.remove(currLeft);
}else {
// else reduce the frequcency
keyWordFreq.put(currLeft, keyWordFreq.get(currLeft) - 1);
}
}
left++;
}
}
return result;
}

Finding the maximum subsequence binary sets that have an equal number of 1s and 0s

I found the following problem on the internet, and would like to know how I would go about solving it:
You are given an array ' containing 0s and 1s. Find O(n) time and O(1) space algorithm to find the maximum sub sequence which has equal number of 1s and 0s.
Examples:
10101010 -
The longest sub sequence that satisfies the problem is the input itself
1101000 -
The longest sub sequence that satisfies the problem is 110100
Update.
I have to completely rephrase my answer. (If you had upvoted the earlier version, well, you were tricked!)
Lets sum up the easy case again, to get it out of the way:
Find the longest prefix of the bit-string containing
an equal number of 1s and 0s of the
array.
This is trivial: A simple counter is needed, counting how many more 1s we have than 0s, and iterating the bitstring while maintaining this. The position where this counter becomes zero for the last time is the end of the longest sought prefix. O(N) time, O(1) space. (I'm completely convinced by now that this is what the original problem asked for. )
Now lets switch to the more difficult version of the problem: we no longer require subsequences to be prefixes - they can start anywhere.
After some back and forth thought, I thought there might be no linear algorithm for this. For example, consider the prefix "111111111111111111...". Every single 1 of those may be the start of the longest subsequence, there is no candidate subsequence start position that dominates (i.e. always gives better solutions than) any other position, so we can't throw away any of them (O(N) space) and at any step, we must be able to select the best start (which has an equal number of 1s and 0s to the current position) out of linearly many candidates, in O(1) time. It turns out this is doable, and easily doable too, since we can select the candidate based on the running sum of 1s (+1) and 0s (-1), this has at most size N, and we can store the first position we reach each sum in 2N cells - see pmod's answer below (yellowfog's comments and geometric insight too).
Failing to spot this trick, I had replaced a fast but wrong with a slow but sure algorithm, (since correct algorithms are preferable to wrong ones!):
Build an array A with the accumulated number of 1s from the start to that position, e.g. if the bitstring is "001001001", then the array would be [0, 0, 1, 1, 1, 2, 2, 2, 3]. Using this, we can test in O(1) whether the subsequence (i,j), inclusive, is valid: isValid(i, j) = (j - i + 1 == 2 * (A[j] - A[i - 1]), i.e. it is valid if its length is double the amount of 1s in it. For example, the subsequence (3,6) is valid because 6 - 3 + 1 == 2 * A[6] - A[2] = 4.
Plain old double loop:
maxSubsLength = 0
for i = 1 to N - 1
for j = i + 1 to N
if isValid(i, j) ... #maintain maxSubsLength
end
end
This can be sped up a bit using some branch-and-bound by skipping i/j sequences which are shorter than the current maxSubsLength, but asymptotically this is still O(n^2). Slow, but with a big plus on its side: correct!
Strictly speaking, the answer is that no such algorithm exists because the language of strings consisting of an equal number of zeros and ones is not regular.
Of course everyone ignores that fact that storing an integer of magnitude n is O(log n) in space and treats it as O(1) in space. :-) Pretty much all big-O's, including time ones, are full of (or rather empty of) missing log n factors, or equivalently, they assume n is bounded by the size of a machine word, which means you're really looking at a finite problem and everything is O(1).
New solution:
Suppose we have for n-bit input bit-array 2*n-size array to keep position of bit. So, the size of array element must have enough size to keep maximum position number. For 256 input bit array, it's needed 256x2 array of bytes (byte is enough to keep 255 - the maximum position).
Moving from the first position of bit-array we put the position into array starting from the middle of array (index is n) using a rule:
1. Increment the position if we passed "1" bit and decrement when passed "0" bit
2. When meet already initialized array element - don't change it and remember the difference between positions (current minus taken from array element) - this is a size of local maximum sequence.
3. Every time we meet local maximum compare it with the global maximum and update if the latter is less.
For example: bit sequence is 0,0,0,1,0,1
initial array index is n
set arr[n] = 0 (position)
bit 0 -> index--
set arr[n-1] = 1
bit 0 -> index--
set arr[n-2] = 2
bit 0 -> index--
set arr[n-3] = 3
bit 1 -> index++
arr[n-2] already contains 2 -> thus, local max seq is [3,2] becomes abs. maximum
will not overwrite arr[n-2]
bit 0 -> index--
arr[n-3] already contains 3 -> thus, local max seq is [4,3] is not abs. maximum
bit 1 -> index++
arr[n-2] already contains 2 -> thus, local max seq is [5,2] is abs. max
Thus, we passing through the whole bit array only once.
Does this solves the task?
input:
n - number of bits
a[n] - input bit-array
track_pos[2*n] = {0,};
ind = n;
/* start from position 1 since zero has
meaning track_pos[x] is not initialized */
for (i = 1; i < n+1; i++) {
if (track_pos[ind]) {
seq_size = i - track_pos[ind];
if (glob_seq_size < seq_size) {
/* store as interm. result */
glob_seq_size = seq_size;
glob_pos_from = track_pos[ind];
glob_pos_to = i;
}
} else {
track_pos[ind] = i;
}
if (a[i-1])
ind++;
else
ind--;
}
output:
glob_seq_size - length of maximum sequence
glob_pos_from - start position of max sequence
glob_pos_to - end position of max sequence
In this thread ( http://discuss.techinterview.org/default.asp?interview.11.792102.31 ), poster A.F. has given an algorithm that runs in O(n) time and uses O(sqrt(n log n)) bits.
brute force: start with maximum length of the array to count the o's and l's. if o eqals l, you are finished. else reduce search length by 1 and do the algorithm for all subsequences of the reduced length (that is maximium length minus reduced length) and so on. stop when the subtraction is 0.
As was pointed out by user "R..", there is no solution, strictly speaking, unless you ignore the "log n" space complexity. In the following, I will consider that the array length fits in a machine register (e.g. a 64-bit word) and that a machine register has size O(1).
The important point to notice is that if there are more 1's than 0's, then the maximum subsequence that you are looking for necessarily includes all the 0's, and that many 1's. So here the algorithm:
Notations: the array has length n, indices are counted from 0 to n-1.
First pass: count the number of 1's (c1) and 0's (c0). If c1 = c0 then your maximal subsequence is the entire array (end of algorithm). Otherwise, let d be the digit which appears the less often (d = 0 if c0 < c1, otherwise d = 1).
Compute m = min(c0, c1) * 2. This is the size of the subsequence you are looking for.
Second pass: scan the array to find the index j of the first occurrence of d.
Compute k = max(j, n - m). The subsequence starts at index k and has length m.
Note that there could be several solutions (several subsequences of maximal length which match the criterion).
In plain words: assuming that there are more 1's than 0's, then I consider the smallest subsequence which contains all the 0's. By definition, that subsequence is surrounded by bunches of 1's. So I just grab enough 1's from the sides.
Edit: as was pointed out, this does not work... The "important point" is actually wrong.
Try something like this:
/* bit(n) is a macro that returns the nth bit, 0 or 1. len is number of bits */
int c[2] = {0,0};
int d, i, a, b, p;
for(i=0; i<len; i++) c[bit(i)]++;
d = c[1] < c[0];
if (c[d] == 0) return; /* all bits identical; fail */
for(i=0; bit(i)!=d; i++);
a = b = i;
for(p=0; i<len; i++) {
p += 2*bit(i)-1;
if (!p) b = i;
}
if (a == b) { /* account for case where we need bits before the first d */
b = len - 1;
a -= abs(p);
}
printf("maximal subsequence consists of bits %d through %d\n", a, b);
Completely untested but modulo stupid mistakes it should work. Based on my reply to Thomas's answer which failed in certain cases.
New Solution:
Space complexity of O(1) and time complexity O(n^2)
int iStart = 0, iEnd = 0;
int[] arrInput = { 1, 0, 1, 1, 1,0,0,1,0,1,0,0 };
for (int i = 0; i < arrInput.Length; i++)
{
int iCurrEndIndex = i;
int iSum = 0;
for (int j = i; j < arrInput.Length; j++)
{
iSum = (arrInput[j] == 1) ? iSum+1 : iSum-1;
if (iSum == 0)
{
iCurrEndIndex = j;
}
}
if ((iEnd - iStart) < (iCurrEndIndex - i))
{
iEnd = iCurrEndIndex;
iStart = i;
}
}
I am not sure whether the array you are referring is int array of 0's and 1's or bitarray??
If its about bitarray, here is my approach:
int isEvenBitCount(int n)
{
//n ... //Decimal equivalent of the input binary sequence
int cnt1 = 0, cnt0 = 0;
while(n){
if(n&0x01) { printf("1 "); cnt1++;}
else { printf("0 "); cnt0++; }
n = n>>1;
}
printf("\n");
return cnt0 == cnt1;
}
int main()
{
int i = 40, j = 25, k = 35;
isEvenBitCount(i)?printf("-->Yes\n"):printf("-->No\n");
isEvenBitCount(j)?printf("-->Yes\n"):printf("-->No\n");
isEvenBitCount(k)?printf("-->Yes\n"):printf("-->No\n");
}
with use of bitwise operations the time complexity is almost O(1) also.

Resources