Space-efficient algorithm for finding the largest balanced subarray? - arrays

given an array of 0s and 1s, find maximum subarray such that number of zeros and 1s are equal.
This needs to be done in O(n) time and O(1) space.
I have an algo which does it in O(n) time and O(n) space. It uses a prefix sum array and exploits the fact that if the number of 0s and 1s are same then
sumOfSubarray = lengthOfSubarray/2
#include<iostream>
#define M 15
using namespace std;
void getSum(int arr[],int prefixsum[],int size) {
int i;
prefixsum[0]=arr[0]=0;
prefixsum[1]=arr[1];
for (i=2;i<=size;i++) {
prefixsum[i]=prefixsum[i-1]+arr[i];
}
}
void find(int a[],int &start,int &end) {
while(start < end) {
int mid = (start +end )/2;
if((end-start+1) == 2 * (a[end] - a[start-1]))
break;
if((end-start+1) > 2 * (a[end] - a[start-1])) {
if(a[start]==0 && a[end]==1)
start++; else
end--;
} else {
if(a[start]==1 && a[end]==0)
start++; else
end--;
}
}
}
int main() {
int size,arr[M],ps[M],start=1,end,width;
;
cin>>size;
arr[0]=0;
end=size;
for (int i=1;i<=size;i++)
cin>>arr[i];
getSum(arr,ps,size);
find(ps,start,end);
if(start!=end)
cout<<(start-1)<<" "<<(end-1)<<endl; else cout<<"No soln\n";
return 0;
}

Now my algorithm is O(n) time and O(Dn) space where Dn is the total imblance in the list.
This solution doesn't modify the list.
let D be the difference of 1s and 0s found in the list.
First, let's step linearily through the list and calculate D, just to see how it works:
I'm gonna use this list as an example : l=1100111100001110
Element D
null 0
1 1
1 2 <-
0 1
0 0
1 1
1 2
1 3
1 4
0 3
0 2
0 1
0 0
1 1
1 2
1 3
0 2 <-
Finding the longest balanced subarray is equivalent to finding 2 equal elements in D that are the more far appart. (in this example the 2 2s marked with arrows.)
The longest balanced subarray is between first occurence of element +1 and last occurence of element. (first arrow +1 and last arrow : 00111100001110)
Remark:
The longest subarray will always be between 2 elements of D that are
between [0,Dn] where Dn is the last element of D. (Dn = 2 in the
previous example) Dn is the total imbalance between 1s and 0s in the
list. (or [Dn,0] if Dn is negative)
In this example it means that I don't need to "look" at 3s or 4s
Proof:
Let Dn > 0 .
If there is a subarray delimited by P (P > Dn). Since 0 < Dn < P,
before reaching the first element of D which is equal to P we reach one
element equal to Dn. Thus, since the last element of the list is equal to Dn, there is a longest subarray delimited by Dns than the one delimited by Ps.And therefore we don't need to look at Ps
P cannot be less than 0 for the same reasons
the proof is the same for Dn <0
Now let's work on D, D isn't random, the difference between 2 consecutive element is always 1 or -1. Ans there is an easy bijection between D and the initial list. Therefore I have 2 solutions for this problem:
the first one is to keep track of first and last appearance of each
element in D that are between 0 and Dn (cf remark).
second is to transform the list into D, and then work on D.
FIRST SOLUTION
For the time being I cannot find a better approach than the first one:
First calculate Dn (in O(n)) . Dn=2
Second instead of creating D, create a dictionnary where the keys are the value of D (between [0 and Dn]) and the value of each keys is a couple (a,b) where a is the first occurence of the key and b the last.
Element D DICTIONNARY
null 0 {0:(0,0)}
1 1 {0:(0,0) 1:(1,1)}
1 2 {0:(0,0) 1:(1,1) 2:(2,2)}
0 1 {0:(0,0) 1:(1,3) 2:(2,2)}
0 0 {0:(0,4) 1:(1,3) 2:(2,2)}
1 1 {0:(0,4) 1:(1,5) 2:(2,2)}
1 2 {0:(0,4) 1:(1,5) 2:(2,6)}
1 3 { 0:(0,4) 1:(1,5) 2:(2,6)}
1 4 {0:(0,4) 1:(1,5) 2:(2,6)}
0 3{0:(0,4) 1:(1,5) 2:(2,6) }
0 2 {0:(0,4) 1:(1,5) 2:(2,9) }
0 1 {0:(0,4) 1:(1,10) 2:(2,9) }
0 0 {0:(0,11) 1:(1,10) 2:(2,9) }
1 1 {0:(0,11) 1:(1,12) 2:(2,9) }
1 2 {0:(0,11) 1:(1,12) 2:(2,13)}
1 3 {0:(0,11) 1:(1,12) 2:(2,13)}
0 2 {0:(0,11) 1:(1,12) 2:(2,15)}
and you chose the element with the largest difference : 2:(2,15) and is l[3:15]=00111100001110 (with l=1100111100001110).
Time complexity :
2 passes, the first one to caclulate Dn, the second one to build the
dictionnary.
find the max in the dictionnary.
Total is O(n)
Space complexity:
the current element in D : O(1) the dictionnary O(Dn)
I don't take 3 and 4 in the dictionnary because of the remark
The complexity is O(n) time and O(Dn) space (in average case Dn <<
n).
I guess there is may be a better way than a dictionnary for this approach.
Any suggestion is welcome.
Hope it helps
SECOND SOLUTION (JUST AN IDEA NOT THE REAL SOLUTION)
The second way to proceed would be to transform your list into D. (since it's easy to go back from D to the list it's ok). (O(n) time and O(1) space, since I transform the list in place, even though it might not be a "valid" O(1) )
Then from D you need to find the 2 equal element that are the more far appart.
it looks like finding the longest cycle in a linked list, A modification of Richard Brent algorithm might return the longest cycle but I don't know how to do it, and it would take O(n) time and O(1) space.
Once you find the longest cycle, go back to the first list and print it.
This algorithm would take O(n) time and O(1) space complexity.

Different approach but still O(n) time and memory. Start with Neil's suggestion, treat 0 as -1.
Notation: A[0, …, N-1] - your array of size N, f(0)=0, f(x)=A[x-1]+f(x-1) - a function
If you'd plot f, you'll see, that what you look for are points for which f(m)=f(n), m=n-2k where k-positive natural. More precisely, only for x such that A[x]!=A[x+1] (and the last element in an array) you must check whether f(x) already occurred. Unfortunately, now I see no improvement over having array B[-N+1…N-1] where such information would be stored.
To complete my thought: B[x]=-1 initially, B[x]=p when p = min k: f(k)=x . And the algorithm is (double-check it, as I'm very tired):
fx = 0
B = new array[-N+1, …, N-1]
maxlen = 0
B[0]=0
for i=1…N-1 :
fx = fx + A[i-1]
if B[fx]==-1 :
B[fx]=i
else if ((i==N-1) or (A[i-1]!=A[i])) and (maxlen < i-B[fx]):
We found that A[B[fx], …, i] is best than what we found so far
maxlen = i-B[fx]
Edit: Two bed-thoughts (= figured out while laying in bed :P ):
1) You could binary search the result by the length of subarray, which would give O(n log n) time and O(1) memory algorithm. Let's use function g(x)=x - x mod 2 (because subarrays which sum to 0 are always of even length). Start by checking, if the whole array sums to 0. If yes -- we're done, otherwise continue. We now assume 0 as starting point (we know there's subarray of such length and "summing-to-zero property") and g(N-1) as ending point (we know there's no such subarray). Let's do
a = 0
b = g(N-1)
while a<b :
c = g((a+b)/2)
check if there is such subarray in O(n) time
if yes:
a = c
if no:
b = c
return the result: a (length of maximum subarray)
Checking for subarray with "summing-to-zero property" of some given length L is simple:
a = 0
b = L
fa = fb = 0
for i=0…L-1:
fb = fb + A[i]
while (fa != fb) and (b<N) :
fa = fa + A[a]
fb = fb + A[b]
a = a + 1
b = b + 1
if b==N:
not found
found, starts at a and stops at b
2) …can you modify input array? If yes and if O(1) memory means exactly, that you use no additional space (except for constant number of elements), then just store your prefix table values in your input array. No more space used (except for some variables) :D
And again, double check my algorithms as I'm veeery tired and could've done off-by-one errors.

Like Neil, I find it useful to consider the alphabet {±1} instead of {0, 1}. Assume without loss of generality that there are at least as many +1s as -1s. The following algorithm, which uses O(sqrt(n log n)) bits and runs in time O(n), is due to "A.F."
Note: this solution does not cheat by assuming the input is modifiable and/or has wasted bits. As of this edit, this solution is the only one posted that is both O(n) time and o(n) space.
A easier version, which uses O(n) bits, streams the array of prefix sums and marks the first occurrence of each value. It then scans backward, considering for each height between 0 and sum(arr) the maximal subarray at that height. Some thought reveals that the optimum is among these (remember the assumption). In Python:
sum = 0
min_so_far = 0
max_so_far = 0
is_first = [True] * (1 + len(arr))
for i, x in enumerate(arr):
sum += x
if sum < min_so_far:
min_so_far = sum
elif sum > max_so_far:
max_so_far = sum
else:
is_first[1 + i] = False
sum_i = 0
i = 0
while sum_i != sum:
sum_i += arr[i]
i += 1
sum_j = sum
j = len(arr)
longest = j - i
for h in xrange(sum - 1, -1, -1):
while sum_i != h or not is_first[i]:
i -= 1
sum_i -= arr[i]
while sum_j != h:
j -= 1
sum_j -= arr[j]
longest = max(longest, j - i)
The trick to get the space down comes from noticing that we're scanning is_first sequentially, albeit in reverse order relative to its construction. Since the loop variables fit in O(log n) bits, we'll compute, instead of is_first, a checkpoint of the loop variables after each O(√(n log n)) steps. This is O(n/√(n log n)) = O(√(n/log n)) checkpoints, for a total of O(√(n log n)) bits. By restarting the loop from a checkpoint, we compute on demand each O(√(n log n))-bit section of is_first.
(P.S.: it may or may not be my fault that the problem statement asks for O(1) space. I sincerely apologize if it was I who pulled a Fermat and suggested that I had a solution to a problem much harder than I thought it was.)

If indeed your algorithm is valid in all cases (see my comment to your question noting some corrections to it), notice that the prefix array is the only obstruction to your constant memory goal.
Examining the find function reveals that this array can be replaced with two integers, thereby eliminating the dependence on the length of the input and solving your problem. Consider the following:
You only depend on two values in the prefix array in the find function. These are a[start - 1] and a[end]. Yes, start and end change, but does this merit the array?
Look at the progression of your loop. At the end, start is incremented or end is decremented only by one.
Considering the previous statement, if you were to replace the value of a[start - 1] by an integer, how would you update its value? Put another way, for each transition in the loop that changes the value of start, what could you do to update the integer accordingly to reflect the new value of a[start - 1]?
Can this process can be repeated with a[end]?
If, in fact, the values of a[start - 1] and a[end] can be reflected with two integers, doesn't the whole prefix array no longer serve a purpose? Can't it therefore be removed?
With no need for the prefix array and all storage dependencies on the length of the input removed, your algorithm will use a constant amount of memory to achieve its goal, thereby making it O(n) time and O(1) space.
I would prefer you solve this yourself based on the insights above, as this is homework. Nevertheless, I have included a solution below for reference:
#include <iostream>
using namespace std;
void find( int *data, int &start, int &end )
{
// reflects the prefix sum until start - 1
int sumStart = 0;
// reflects the prefix sum until end
int sumEnd = 0;
for( int i = start; i <= end; i++ )
sumEnd += data[i];
while( start < end )
{
int length = end - start + 1;
int sum = 2 * ( sumEnd - sumStart );
if( sum == length )
break;
else if( sum < length )
{
// sum needs to increase; get rid of the lower endpoint
if( data[ start ] == 0 && data[ end ] == 1 )
{
// sumStart must be updated to reflect the new prefix sum
sumStart += data[ start ];
start++;
}
else
{
// sumEnd must be updated to reflect the new prefix sum
sumEnd -= data[ end ];
end--;
}
}
else
{
// sum needs to decrease; get rid of the higher endpoint
if( data[ start ] == 1 && data[ end ] == 0 )
{
// sumStart must be updated to reflect the new prefix sum
sumStart += data[ start ];
start++;
}
else
{
// sumEnd must be updated to reflect the new prefix sum
sumEnd -= data[ end ];
end--;
}
}
}
}
int main() {
int length;
cin >> length;
// get the data
int data[length];
for( int i = 0; i < length; i++ )
cin >> data[i];
// solve and print the solution
int start = 0, end = length - 1;
find( data, start, end );
if( start == end )
puts( "No soln" );
else
printf( "%d %d\n", start, end );
return 0;
}

This algorithm is O(n) time and O(1) space. It may modify the source array, but it restores all the information back. So it is not working with const arrays. If this puzzle has several solutions, this algorithm picks the solution nearest to the array beginning. Or it might be modified to provide all solutions.
Algorithm
Variables:
p1 - subarray start
p2 - subarray end
d - difference of 1s and 0s in the subarray
Calculate d, if d==0, stop. If d<0, invert the array and after balanced subarray is found invert it back.
While d > 0 advance p2: if the array element is 1, just decrement both p2 and d. Otherwise p2 should pass subarray of the form 11*0, where * is some balanced subarray. To make backtracking possible, 11*0? is changed to 0?*00 (where ? is the value next to the subarray). Then d is decremented.
Store p1 and p2.
Backtrack p2: if the array element is 1, just increment p2. Otherwise we found element, changed on step 2. Revert the changes and pass subarray of the form 11*0.
Advance p1: if the array element is 1, just increment p1. Otherwise p1 should pass subarray of the form 0*11.
Store p1 and p2, if p2 - p1 improved.
If p2 is at the end of the array, stop. Otherwise continue with step 4.
How does it work
Algorithm iterates through all possible positions of the balanced subarray in the input array. For each subarray position p1 and p2 are kept as far from each other as possible, providing locally longest subarray. Subarray with maximum length is chosen between all these subarrays.
To determine the next best position for p1, it is advanced to the first position where the balance between 1s and 0s is changed by one. (Step 5).
To determine the next best position for p2, it is advanced to the last position where the balance between 1s and 0s is changed by one. To make it possible, step 2 detects all such positions (starting from the array's end) and modifies the array in such a way, that it is possible to iterate through these positions with linear search. (Step 4).
While performing step 2, two possible conditions may be met. Simple one: when value '1' is found; pointer p2 is just advanced to the next value, no special treatment needed. But when value '0' is found, balance is going in wrong direction, it is necessary to pass through several bits until correct balance is found. All these bits are of no interest to the algorithm, stopping p2 there will give either a balanced subarray, which is too short, or a disbalanced subarray. As a result, p2 should pass subarray of the form 11*0 (from right to left, * means any balanced subarray). There is no chance to go the same way in other direction. But it is possible to temporary use some bits from the pattern 11*0 to allow backtracking. If we change first '1' to '0', second '1' to the value next to the rightmost '0', and clear the value next to the rightmost '0': 11*0? -> 0?*00, then we get the possibility to (first) notice the pattern on the way back, since it starts with '0', and (second) find the next good position for p2.
C++ code:
#include <cstddef>
#include <bitset>
static const size_t N = 270;
void findLargestBalanced(std::bitset<N>& a, size_t& p1s, size_t& p2s)
{
// Step 1
size_t p1 = 0;
size_t p2 = N;
int d = 2 * a.count() - N;
bool flip = false;
if (d == 0) {
p1s = 0;
p2s = N;
return;
}
if (d < 0) {
flip = true;
d = -d;
a.flip();
}
// Step 2
bool next = true;
while (d > 0) {
if (p2 < N) {
next = a[p2];
}
--d;
--p2;
if (a[p2] == false) {
if (p2+1 < N) {
a[p2+1] = false;
}
int dd = 2;
while (dd > 0) {
dd += (a[--p2]? -1: 1);
}
a[p2+1] = next;
a[p2] = false;
}
}
// Step 3
p2s = p2;
p1s = p1;
do {
// Step 4
if (a[p2] == false) {
a[p2++] = true;
bool nextToRestore = a[p2];
a[p2++] = true;
int dd = 2;
while (dd > 0 && p2 < N) {
dd += (a[p2++]? 1: -1);
}
if (dd == 0) {
a[--p2] = nextToRestore;
}
}
else {
++p2;
}
// Step 5
if (a[p1++] == false) {
int dd = 2;
while (dd > 0) {
dd += (a[p1++]? -1: 1);
}
}
// Step 6
if (p2 - p1 > p2s - p1s) {
p2s = p2;
p1s = p1;
}
} while (p2 < N);
if (flip) {
a.flip();
}
}

Sum all elements in the array, then diff = (array.length - sum) will be the difference in number of 0s and 1s.
If diff is equal to array.length/2, then the maximum subarray = array.
If diff is less than array.length/2 then there are more 1s than 0s.
If diff is greater than array.length/2 then there are more 0s than 1s.
For cases 2 & 3, initialize two pointers, start & end pointing to beginning and end of array. If we have more 1s, then move the pointers inward (start++ or end--) based on whether array[start] = 1 or array[end] = 1, and update sum accordingly. At each step check if sum = (end - start) / 2. If this condition is true, then start and end represent the bounds of your maximum subarray.
Here we end up doing two passes of the array, once to calculate sum, and once which moving the pointers inward. And we are using constant space as we just need to store sum and two index values.
If anyone wants to knock up some pseudocode, you're more than welcome :)

Here's an actionscript solution that looked like it was scaling O(n). Though it might be more like O(n log n). It definitely uses only O(1) memory.
Warning I haven't checked how complete it is. I could be missing some cases.
protected function findLongest(array:Array, start:int = 0, end:int = -1):int {
if (end < start) {
end = array.length-1;
}
var startDiff:int = 0;
var endDiff:int = 0;
var diff:int = 0;
var length:int = end-start;
for (var i:int = 0; i <= length; i++) {
if (array[i+start] == '1') {
startDiff++;
} else {
startDiff--;
}
if (array[end-i] == '1') {
endDiff++;
} else {
endDiff--;
}
//We can stop when there's no chance of equalizing anymore.
if (Math.abs(startDiff) > length - i) {
diff = endDiff;
start = end - i;
break;
} else if (Math.abs(endDiff) > length - i) {
diff = startDiff;
end = i+start;
break;
}
}
var bit:String = diff > 0 ? '1': '0';
var diffAdjustment:int = diff > 0 ? -1: 1;
//Strip off the bad vars off the ends.
while (diff != 0 && array[start] == bit) {
start++;
diff += diffAdjustment;
}
while(diff != 0 && array[end] == bit) {
end--;
diff += diffAdjustment;
}
//If we have equalized end. Otherwise recurse within the sub-array.
if (diff == 0)
return end-start+1;
else
return findLongest(array, start, end);
}

I would argue that it is impossible, that an algorithm with O(1) exists, in the following way. Assume you iterate ONCE over every bit. This requires a counter which needs the space of O(log n). Possibly one could argue that n itself is part of the problem instance, then you have as input length for a binary string of the length k: k + 2-log k. Regardless how you look over them you need an additional variable, on case you need an index into that array, that already makes it non O(1).
Usually you dont have this problem, because you have for an problem of the size n, an input of n numbers of the size log k, which adds up to nlog k. Here a variable of length log k is just O(1). But here our log k is just 1. So we can only introduce a help variable that has constant length (and I mean really constant, it must be limited regardless how big the n is).
Here one problem is the description of the problem comes visible. In computer theory you have to be very careful about your encoding. E.g. you can make NP problems polynomial if you switch to unary encoding (because then input size is exponential bigger than in a n-ary (n>1) encoding.
As for n the input has just the size 2-log n, one must be careful. When you speak in this case of O(n) - this is really an algorithm that is O(2^n) (This is no point we need to discuss about - because one can argue whether the n itself is part of the description or not).

I have this algorithm running in O(n) time and O(1) space.
It makes use of simple "shrink-then-expand" trick. Comments in codes.
public static void longestSubArrayWithSameZerosAndOnes() {
// You are given an array of 1's and 0's only.
// Find the longest subarray which contains equal number of 1's and 0's
int[] A = new int[] {1, 0, 1, 1, 1, 0, 0,0,1};
int num0 = 0, num1 = 0;
// First, calculate how many 0s and 1s in the array
for(int i = 0; i < A.length; i++) {
if(A[i] == 0) {
num0++;
}
else {
num1++;
}
}
if(num0 == 0 || num1 == 0) {
System.out.println("The length of the sub-array is 0");
return;
}
// Second, check the array to find a continuous "block" that has
// the same number of 0s and 1s, starting from the HEAD and the
// TAIL of the array, and moving the 2 "pointer" (HEAD and TAIL)
// towards the CENTER of the array
int start = 0, end = A.length - 1;
while(num0 != num1 && start < end) {
if(num1 > num0) {
if(A[start] == 1) {
num1--; start++;
}
else if(A[end] == 1) {
num1--; end--;
}
else {
num0--; start++;
num0--; end--;
}
}
else if(num1 < num0) {
if(A[start] == 0) {
num0--; start++;
}
else if(A[end] == 0) {
num0--; end--;
}
else {
num1--; start++;
num1--; end--;
}
}
}
if(num0 == 0 || num1 == 0) {
start = end;
end++;
}
// Third, expand the continuous "block" just found at step #2 by
// moving "HEAD" to head of the array and "TAIL" to the end of
// the array, while still keeping the "block" balanced(containing
// the same number of 0s and 1s
while(0 < start && end < A.length - 1) {
if(A[start - 1] == 0 && A[end + 1] == 0 || A[start - 1] == 1 && A[end + 1] == 1) {
break;
}
start--;
end++;
}
System.out.println("The length of the sub-array is " + (end - start + 1) + ", starting from #" + start + " to #" + end);
}

linear time, constant space. Let me know if there is any bug I missed.
tested in python3.
def longestBalancedSubarray(A):
lo,hi = 0,len(A)-1
ones = sum(A);zeros = len(A) - ones
while lo < hi:
if ones == zeros: break
else:
if ones > zeros:
if A[lo] == 1: lo+=1; ones-=1
elif A[hi] == 1: hi+=1; ones-=1
else: lo+=1; zeros -=1
else:
if A[lo] == 0: lo+=1; zeros-=1
elif A[hi] == 0: hi+=1; zeros-=1
else: lo+=1; ones -=1
return(A[lo:hi+1])

Related

Time complexity in terms of n if time complexity is O(x*y) where x+y = n

This code is to move all the zeroes in the vector to the end of the vector while maintaining the order of the non zero elements.
Eg: 0 3 0 8 0 9
Output : 3 8 9 0 0 0
I wrote the following code for this
void moveZeroes(vector<int>& nums) {
vector<int> v, v1; // v has the index of all the zero elements while v1 has index of non zero elements
for(int i = 0; i < nums.size(); i++){
if(nums[i] == 0)
v.push_back(i);
else v1.push_back(i);
}
//Here i'm swapping all the zero elements with non zero elements
for(int i = 0; i < v.size(); i++){
for(int j = 0; j < v1.size(); j++){
if(v[i] < v1[j]){
swap(nums[v[i]], nums[v1[j]]);
v[i] = v1[j];
}
}
}
}
So if nums has size n and v has size x & v1 has size y where x + y = n, then time complexity is O(x*y) . But what will be the time complexity in terms of n?
Let N be the the number of elements in the vector.
In the best case the time complexity would be linear, i.e, O(N). This happens in 2 cases:
when all the elements are non-zero. The second loop in which you swap elements would not be executed (v would be empty)
when all the elements are zero. In such a case only the first part of the second loop would run (v1 would be empty).
In the other cases you incur in a quadratic time complexity O(N^2) due to the double loop.
For instance, suppose that half of the elements are zero and the other half is non-zero. This means that the number of iterations would be.
N/2*N/2 = N^2/4 = O(N^2)
So if nums has size n and v has size x & v1 has size y where x + y = n, then time complexity is O(x*y) . But what will be the time complexity in terms of n?
Could be up to O(n2) if x and y are roughly n/2, e.g: y = x = n / 2 => x * y = n^2/4.
I recommend you to do the following:
void moveZeroes(vector<int>& nums)
{
for(int i = 0, p = 0; i < (int)nums.size(); ++i)
if(nums[i])
{
if(p != i)
swap(nums[i], nums[p]);
++p;
}
}
Pointer p indicates how many non zero elements have been swapped.
This way you get an O(n) time complexity, the code is clearer and don't use an extra O(n) memory consumption.

Maximum binary number a binary string will result to if only one operation is allowed i.e. Right-Rotate By K-Bits where K = [0, Length of String]

Suppose you have a binary string S and you are only allowed to do one operation i.e. Right-Rotate by K-bits where K = [0, Length of the string]. Write an algorithm that will print the maximum binary-number you can create by the defined process.
For Example:
S = [00101] then maximum value I can get from the process is 10100 i.e. 20.
S = [011010] then maximum value I can get from the process is 110100 i.e. 52.
S = [1100] then maximum value I can get from the process is 1100 i.e. 12.
The length of the string S has an upper-limit i.e. 5*(10^5).
The idea which I thought of is kind of very naive which is: as we know that when you right-rotate any binary number by 1-bit, you get the same binary number after m rotations where m = number of bits required to represent that number.
So, I right-rotate by 1 until I get to the number with which I started with and during the process, I keep track of the max-value I encountered and in last I print the max-value.
Is there an efficient approach to solve the problem?
UPD1: This the source of the problem One-Zero, it all boils down to the statement I have described above.
UPD2: As the answer can be huge, the program will print the answer modulo 10^9 + 7.
You want to find the largest number expressed in a binary encoded string with wrap around.
Here are steps for a solution:
let len be the length of the string
allocate an array of size 2 * len and duplicate the string to it.
using linear search, find the position pos of the largest substring of length len in this array (lexicographical order can be used for that).
compute res, the converted number modulo 109+7, reading len bits starting at pos.
free the array and return res.
Here is a simple implementation as a function:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
long max_bin(const char *S) {
size_t i, pos, len;
char *a;
// long has at least 31 value bits, enough for numbers upto 2 * 1000000007
long res;
if ((len = strlen(S)) == 0)
return 0;
if ((a = malloc(len + len)) == NULL)
return -1;
memcpy(a, S, len);
memcpy(a + len, S, len);
// find the smallest right rotation for the greatest substring
for (pos = i = len; --i > 0;) {
if (memcmp(a + i, a + pos, len) > 0)
pos = i;
}
res = 0;
for (i = 0; i < len; i++) {
res = res + res + a[pos + i] - '0';
if (res >= 1000000007)
res -= 1000000007;
}
free(a);
return res;
}
int main(int argc, char *argv[]) {
for (int i = 1; i < argc; i++) {
printf("[%s] -> %ld\n", argv[i], max_bin(argv[i]));
}
return 0;
}
It is feasible to avoid memory allocation if it is a requirement.
It's me again.
I got to thinking a bit more about your problem in the shower this morning, and it occurred to me that you could do a QuickSelect (if you're familiar with that) over an array of the start indexes of the input string and determine the index of the most "valuable" rotate based on that.
What I show here does not concern itself with presenting the result the way you are required to, only with determining what the best offset for rotation is.
This is not a textbook QuickSelect implementation but rather a simplified method that does the same thing while taking into account that it's a string of zeros and ones that we are dealing with.
Main driver logic:
static void Main(string[] args)
{
Console.WriteLine(FindBestIndex("")); // exp -1
Console.WriteLine(FindBestIndex("1")); // exp 0
Console.WriteLine(FindBestIndex("0")); // exp 0
Console.WriteLine(FindBestIndex("110100")); // exp 0
Console.WriteLine(FindBestIndex("100110")); // exp 3
Console.WriteLine(FindBestIndex("01101110")); // exp 4
Console.WriteLine(FindBestIndex("11001110011")); // exp 9
Console.WriteLine(FindBestIndex("1110100111110000011")); // exp 17
}
Set up the index array that we'll be sorting, then call FindHighest to do the actual work:
static int FindBestIndex(string input)
{
if (string.IsNullOrEmpty(input))
return -1;
int[] indexes = new int[input.Length];
for (int i = 0; i < indexes.Length; i++)
{
indexes[i] = i;
}
return FindHighest(input, indexes, 0, input.Length);
}
Partition the index array into two halves depending on whether each index points to a string that starts with zero or one at this offset within the string.
Once that's done, if we have just one element that started with one, we have the best string, else if we have more, partition those based on the next index. If none started with one, proceed with zero in the same way.
static int FindHighest(string s, int[] p, int index, int len)
{
// allocate two new arrays,
// one for the elements of p that have zero at this index, and
// one for the elements of p that have one at this index
int[] zero = new int[len];
int[] one = new int[len];
int count_zero = 0;
int count_one = 0;
// run through p and distribute its elements to 'zero' and 'one'
for (int i = 0; i < len; i++)
{
int ix = p[i]; // index into string
int pos = (ix + index) % s.Length; // wrap for string length
if (s[pos] == '0')
{
zero[count_zero++] = ix;
}
else
{
one[count_one++] = ix;
}
}
// if we have a one in this position, use that, else proceed with zero (below)
if (count_one > 1)
{
// if we have more than one, sort on the next position (index)
return FindHighest(s, one, index + 1, count_one);
} else if (count_one == 1)
{
// we have exactly one entry left in ones, so that's the best one we have overall
return one[0];
}
if (count_zero > 1)
{
// if we have more than one, sort on the next position (index)
return FindHighest(s, zero, index + 1, count_zero);
}
else if (count_zero == 1)
{
// we have exactly one entry left in zeroes, and we didn't have any in ones,
// so this is the best one we have overall
return zero[0];
}
return -1;
}
Note that this can be optimized further by expanding the logic: If the input string has any ones at all, there's no point in adding indexes where the string starts with zero to the index array in FindBestIndex since those will be inferior. Also, if an index does start with a one but the previous index also did, you can omit the current one as well because the previous index will always be better since that string flows into this character.
If you like, you can also refactor to remove the recursion in favor of a loop.
If I were tackling this I would do so as follows.
I think it's all to do with counting alternating runs of '1' and runs of '0', treating a run of '1's followed by a run of '0's as a pair, then bashing a list of those pairs.
Let us start by scanning to the first '1', and setting start position s. Then count each run of '1's c1 and the following run of '0's c0, creating pairs (c1,c0).
The scan then proceeds forwards to the end, wrapping round as required. If we represent runs of one or more '1' and '0' as single digits, and '|' as the start and end of the string, then we have cases:
|10101010|
^ initial start position s -- the string ends neatly with a run of '0's
|1010101|
^ new start position s -- the string starts and ends in a '1', so the first
run of '1's and run of '0's are rotated (left),
to be appended to the last run of '1's
Note that this changes our start position.
|01010101|
^ initial start position s -- the first run of '0's is rotated (left),
to follow the last run of '1's.
|010101010|
^ initial start position s -- the first run of '0's is rotated (left),
to be appended to the last run of '0's.
NB: if the string both starts and ends with a '1', there are, initially, n runs of '0's and n+1 runs of '1's, but the rotation reduces that to n runs of '1's. And similarly, but conversely, if the string both starts and ends with a '0'.
Let us use A as shorthand for the pair (a1,a0). Suppose we have another pair, X -- (x1,x0) -- then can compare the two pairs, thus:
if a1 > x1 or (a1 = x1 and (a0 < x0) => A > X -- A is better start
if a1 = x1 and a0 = x0 => A = X
if a1 < x1 or (a1 = x1 and (a0 > x0) => A < X -- X is better start
The trick is probably to pack each pair into an integer -- say (x1 * N) - x0, where N is at least the maximum allowed length of the string -- for ease of comparison.
During the scan of the string (described above) let us construct a vector of pairs. During that process, collect the largest pair value A, and a list of the positions, s, of each appearance of A. Each s on the list is a potential best start position. (The recorded s needs to be the index in the vector of pairs and the offset in the original string.)
[If the input string is truly vast, then constructing the entire vector of all pairs will eat memory. In which case the vector would need to be handled as a "virtual" vector, and when an item in that vector is required, it would have to be created by reading the respective portion of the actual string.]
Now:
let us simplify groups of two or more contiguous A. Clearly the second and subsequent A's in such a group cannot be the best start, since there is a better one immediately to the left. So, in the scan we need to record only the s for the first A of such groups.
if the string starts with one or more A's and ends with one or more A's, need to "rotate" to collect those as a single group, and record the s only for the leftmost A in that group (in the usual way).
If there is only one s on the list, our work is done. If the string is end-to-end A, that will be spotted here.
Otherwise, we need to consider the pairs which follow each of the s for our (initial) A's -- where when we say 'follow' we include wrapping round from the end to the start of the string (and, equivalently, the list of pairs).
NB: at this point we know that all the (initial) A's on our list are followed by zero or more A's and then at least one x, where x < A.
So, set i = 1, and consider all the pairs at s+i for our list of s. Keep only the s for the instances of the largest pair found. So for i = 1, in this example we are considering pairs x, y and z:
...Ax....Az...Az..Ax...Ay...
And if x is the largest, this pass discards Ay and both Az. If only one s remains -- in the example, y is the largest -- our work is done. Otherwise, repeat for i = i+1.
There is one last (I think) wrinkle. Suppose after finding z to be the largest of the ith pairs, we have:
...A===zA===z...
where the two runs === are the same as each other. By the same logic that told us to ignore second and subsequent A's in runs of same, we can now discard the second A===z. Indeed we can discard third, fourth, etc. contiguous A===z. Happily that deals with the extreme case of (say):
=zA===zA===zA==
where the string is a sequence of A===z !
I dunno, that all seems more complicated than I expected when I set out with my pencil and paper :-(
I imagine someone much cleverer than I can reduce this to some standard greatest prefix-string problem.
I was bored today, so I knocked out some code (and revised it on 10-Apr-2020).
typedef unsigned int uint ; // assume that's uint32_t or similar
enum { max_string_len = 5 * 100000 } ; // 5 * 10^5
typedef uint64_t pair_t ;
static uint
one_zero(const char* str, uint N)
{
enum { debug = false } ;
void* mem ;
size_t nmx ;
uint s1, s0 ; // initial run of '1'/'0's to be rotated
uint s ;
pair_t* pv ; // pair vector
uint* psi ; // pair string index
uint* spv ; // starts vector -- pair indexes
uint pn ; // count of pairs
uint sn ; // count of starts
pair_t bp ; // current best start pair value
uint bpi ; // current best start pair index
uint bsc ; // count of further best starts
char ch ;
if (N > max_string_len)
{
printf("*** N = %u > max_string_len (%u)\n", N, max_string_len) ;
return UINT_MAX ;
} ;
if (N < 1)
{
printf("*** N = 0 !!\n") ;
return UINT_MAX ;
} ;
// Decide on initial start position.
s = s1 = s0 = 0 ;
if (str[0] == '0')
{
// Start at first '1' after initial run of '0's
do
{
s += 1 ;
if (s == N)
return 0 ; // String is all '0's !!
}
while (str[s] == '0') ;
s0 = s ; // rotate initial run of '0's
}
else
{
// First digit is '1', but if last digit is also '1', need to rotate.
if (str[N-1] == '1')
{
// Step past the leading run of '1's and set length s1.
// This run will be appended to the last run of '1's in the string
do
{
s += 1 ;
if (s == N)
return 0 ; // String is all '1's !!
}
while (str[s] == '1') ;
s1 = s ; // rotate initial run of '1's
// Step past the (now) leading run of '0's and set length s0.
// This run will be appended to the last run of '1's in the string
//
// NB: we know there is at least one '0' and at least one '1' before
// the end of the string
do { s += 1 ; } while (str[s] == '0') ;
s0 = s - s1 ;
} ;
} ;
// Scan the string to construct the vector of pairs and the list of potential
// starts.
nmx = (((N / 2) + 64) / 64) * 64 ;
mem = malloc(nmx * (sizeof(pair_t) + sizeof(uint) + sizeof(uint))) ;
pv = (pair_t*)mem ;
spv = (uint*)(pv + nmx) ;
psi = (uint*)(spv + nmx) ;
pn = 0 ;
bp = 0 ; // no pair is 0 !
bpi = 0 ;
bsc = 0 ; // no best yet
do
{
uint x1, x0 ;
pair_t xp ;
psi[pn] = s ;
x1 = x0 = 0 ;
do
{
x1 += 1 ;
s += 1 ;
ch = (s < N) ? str[s] : '\0' ;
}
while (ch == '1') ;
if (ch == '\0')
{
x1 += s1 ;
x0 = s0 ;
}
else
{
do
{
x0 += 1 ;
s += 1 ;
ch = (s < N) ? str[s] : '\0' ;
}
while (str[s] == '0') ;
if (ch == '\0')
x0 += s0 ;
} ;
// Register pair (x1,x0)
reg:
pv[pn] = xp = ((uint64_t)x1 << 32) - x0 ;
if (debug && (N == 264))
printf("si=%u, sn=%u, pn=%u, xp=%lx bp=%lx\n", psi[sn], sn, pn, xp, bp) ;
if (xp > bp)
{
// New best start.
bpi = pn ;
bsc = 0 ;
bp = xp ;
}
else
bsc += (xp == bp) ;
pn += 1 ;
}
while (ch != '\0') ;
// If there are 2 or more best starts, need to find them all, but discard
// second and subsequent contiguous ones.
spv[0] = bpi ;
sn = 1 ;
if (bsc != 0)
{
uint pi ;
bool rp ;
pi = bpi ;
rp = true ;
do
{
pi += 1 ;
if (pv[pi] != bp)
rp = false ;
else
{
bsc -= 1 ;
if (!rp)
{
spv[sn++] = pi ;
rp = true ;
} ;
} ;
}
while (bsc != 0) ;
} ;
// We have: pn pairs in pv[]
// sn start positions in sv[]
for (uint i = 1 ; sn > 1 ; ++i)
{
uint sc ;
uint pi ;
pair_t bp ;
if (debug && (N == 264))
{
printf("i=%u, sn=%u, pv:", i, sn) ;
for (uint s = 0 ; s < sn ; ++s)
printf(" %u", psi[spv[s]]) ;
printf("\n") ;
} ;
pi = spv[0] + i ; // index of first s+i pair
if (pi >= pn) { pi -= pn ; } ;
bp = pv[pi] ; // best value, so far.
sc = 1 ; // new count of best starts
for (uint sj = 1 ; sj < sn ; ++sj)
{
// Consider the ith pair ahead of sj -- compare with best so far.
uint pb, pj ;
pair_t xp ;
pb = spv[sj] ;
pj = pb + i ; // index of next s+i pair
if (pj >= pn) { pj -= pn ; } ;
xp = pv[pj] ;
if (xp == bp)
{
// sj is equal to the best so far
//
// So we keep both, unless we have the 'A==zA==z' case,
// where 'z == xp == sp', the current 'ith' position.
uint pa ;
pa = pi + 1 ;
if (pa == pn) { pa = 0 ; } ; // position after first 'z'
// If this is not an A==zA==z case, keep sj
// Otherwise, drop sj (by not putting it back into the list),
// but update pi so can spot A==zA==zA===z etc. cases.
if (pa != pb)
spv[sc++] = spv[sj] ; // keep sj
else
pi = pj ; // for further repeats
}
else if (xp < bp)
{
// sj is less than best -- do not put it back into the list
}
else
{
// sj is better than best -- discard everything so far, and
// set new best.
sc = 1 ; // back to one start
spv[0] = spv[sj] ; // new best
pi = pj ; // new index of ith pair
bp = xp ; // new best pair
} ;
} ;
sn = sc ;
} ;
s = psi[spv[0]] ;
free(mem) ;
return s ;
}
I have tested this against the brute force method given elsewhere, and as far as I can see this is (now, on 10-Apr-2020) working code.
When I timed this on my machine, for 100,000 random strings of 400,000..500,000 characters (at random) I got:
Brute Force: 281.8 secs CPU
My method: 130.3 secs CPU
and that's excluding the 8.3 secs to construct the random string and run an empty test. (That may sound a lot, but for 100,000 strings of 450,000 characters, on average, that's a touch less than 1 CPU cycle per character.)
So for random strings, my complicated method is a little over twice as fast as brute-force. But it uses ~N*16 bytes of memory, where the brute-force method uses N*2 bytes. Given the effort involved, the result is not hugely gratifying.
However, I also tried two pathological cases, (1) repeated "10" and (2) repeated "10100010" and for just 1000 (not 100000) strings of 400,000..500,000 characters (at random) I got:
Brute Force: (1) 1730.9 (2) 319.0 secs CPU
My method: 0.7 0.7 secs CPU
That O(n^2) will kill you every time !
#include <iostream>
#include <string>
#include <math.h>
using namespace std;
int convt(int N,string S)
{
int sum=0;
for(int i=0; i<N; i++)
{
int num=S[i];
sum += pow(2,N-1-i)*(num-48);
}
return sum;
}
string rot(int N, string S)
{
int temp;
temp = S[0];
for( int i=0; i<N;i++)
S[i]=S[i+1];
S[N-1]=temp;
return S;
}
int main() {
int t;
cin>>t;
while(t--)
{
int N,K;
cin>>N;
cin>>K;
char S[N];
for(int i=0; i<N; i++)
cin>>S[i];
string SS= S;
int mx_val=INT_MIN;
for(int i=0;i<N;i++)
{
string S1=rot(N,SS);
SS= S1;
int k_val=convt(N,SS);
if (k_val>mx_val)
mx_val=k_val;
}
int ki=0;
int j=0;
string S2=S;
while(ki!=K)
{
S2=rot(N,S2);
if (convt(N,S2)==mx_val)
ki++;
j++;
}
cout<<j<<endl;
}
}

C : Sum of reverse numbers

So I want to solve an exercise in C or in SML but I just can't come up with an algorithm that does so. Firstly I will write the exercise and then the problems I'm having with it so you can help me a bit.
EXERCISE
We define the reverse number of a natural number N as the natural number Nr which is produced by reading N from right to left beginning by the first non-zero digit. For example if N = 4236 then Nr = 6324 and if N = 5400 then Nr = 45.
So given any natural number G (1≤G≤10^100000) write a program in C that tests if G can occur by the sum of a natural number N and its reverse Nr. If there is such a number then the program must return this N. If there isn't then the program must return 0. The input number G will be given through a txt file consisted only by 1 line.
For example, using C, if number1.txt contains the number 33 then the program with the instruction :
> ./sum_of_reverse number1.txt
could return for example 12, because 12+21 = 33 or 30 because 30 + 3 = 33. If number1.txt contains the number 42 then the program will return 0.
Now in ML if number1.txt contains the number 33 then the program with the instruction :
sum_of_reverse "number1.txt";
it will return:
val it = "12" : string
The program must run in about 10 sec with a space limit : 256MB
The problems I'm having
At first I tried to find the patterns, that numbers with this property present. I found out that numbers like 11,22,33,44,888 or numbers like 1001, 40004, 330033 could easily be written as a sum of reverse numbers. But then I found out that these numbers seem endless because of numbers for example 14443 = 7676 + 6767 or 115950 = 36987 + 78963.
Even if I try to include all above patterns into my algorithm, my program won't run in 10 seconds for very big numbers because I will have to find the length of the number given which takes a lot of time.
Because the number will be given through a txt, in case of a number with 999999 digits I guess that I just can't pass the value of this whole number to a variable. The same with the result. I assume that you are going to save it to a txt first and then print it??
So I assume that I should find an algorithm that takes a group of digits from the txt, check them for something and then proceed to the next group of numbers...?
Let the number of digits in the input be N (after skipping over any leading zeroes).
Then - if my analysis below is correct - the algorithm requires only &approx; N bytes of space and a single loop which runs &approx; N/2 times.
No special "big number" routines or recursive functions are required.
Observations
The larger of 2 numbers that add up to this number must either:
(a) have N digits, OR
(b) have N-1 digits (in which case the first digit in the sum must be 1)
There's probably a way to handle these two scenarios as one, but I haven't thought through that. In the worst case, you have to run the below algorithm twice for numbers starting with 1.
Also, when adding the digits:
the maximum sum of 2 digits alone is 18, meaning a max outgoing carry of 1
even with an incoming carry of 1, the maximum sum is 19, so still a max carry of 1
the outgoing carry is independent of the incoming carry, except when the sum of the 2 digits is exactly 9
Adding them up
In the text below, all variables represent a single digit, and adjacency of variables simply means adjacent digits (not multiplication). The ⊕ operator denotes the sum modulo 10. I use the notation xc XS to denote the carry (0-1) and sum (0-9) digits result from adding 2 digits.
Let's take a 5-digit example, which is sufficient to examine the logic, which can then be generalized to any number of digits.
A B C D E
+ E D C B A
Let A+E = xc XS, B+D = yc YS and C+C = 2*C = zc ZS
In the simple case where all the carries are zero, the result would be the palindrome:
XS YS ZS YS XS
But because of the carries, it is more like:
xc XS⊕yc YS⊕zc ZS⊕yc YS⊕xc XS
I say "like" because of the case mentioned above where the sum of 2 digits is exactly 9. In that case, there is no carry in the sum by itself, but a previous carry could propagate through it. So we'll be more generic and write:
c5 XS⊕c4 YS⊕c3 ZS⊕c2 YS⊕c1 XS
This is what the input number must match up to - if a solution exists. If not, we'll find something that doesn't match and exit.
(Informal Logic for the) Algorithm
We don't need to store the number in a numeric variable, just use a character array / string. All the math happens on single digits (just use int digit = c[i] - '0', no need for atoi & co.)
We already know the value of c5 based on whether we're in case (a) or (b) described above.
Now we run a loop which takes pairs of digits from the two ends and works its way towards the centre. Let's call the two digits being compared in the current iteration H and L.
So the loop will compare:
XS⊕c4 and XS
YS⊕c3 and YS⊕c1
etc.
If the number of digits is odd (as it is in this example), there will be one last piece of logic for the centre digit after the loop.
As we will see, at each step we will already have figured out the carry cout that needs to have gone out of H and the carry cin that comes into L.
(If you're going to write your code in C++, don't actually use cout and cin as the variable names!)
Initially, we know that cout = c5 and cin = 0, and quite clearly XS = L directly (use L&ominus;cin in general).
Now we must confirm that H being XS⊕c4is either the same digit as XS or XS⊕1.
If not, there is no solution - exit.
But if it is, so far so good, and we can calculate c4 = H&ominus;L. Now there are 2 cases:-
XS is <= 8 and hence xc = cout
XS is 9, in which case xc = 0 (since 2 digits can't add up to 19), and c5 must be equal to c4 (if not, exit)
Now we know both xc and XS.
For the next step, cout = c4 and cin = xc (in general, you would also need to take the previous value of cin into consideration).
Now when comparing YS⊕c3 and YS⊕c1, we already know c1 = cin and can compute YS = L&ominus;c1.
The rest of the logic then follows as before.
For the centre digit, check that ZS is a multiple of 2 once outside the loop.
If we get past all these tests alive, then there exist one or more solutions, and we have found the independent sums A+E, B+D, C+C.
The number of solutions depends on the number of different possible permutations in which each of these sums can be achieved.
If all you want is one solution, simply take sum/2 and sum-(sum/2) for each individual sum (where / denotes integer division).
Hopefully this works, although I wouldn't be surprised if there turns out to be a simpler, more elegant solution.
Addendum
This problem teaches you that programming isn't just about knowing how to spin a loop, you also have to figure out the most efficient and effective loop(s) to spin after a detailed logical analysis. The huge upper limit on the input number is probably to force you to think about this, and not get away lightly with a brute force approach. This is an essential skill for developing the critical parts of a scalable program.
I think you should deal with your numbers as C strings. This is probably the easiest way to find the reverse of the number quickly (read number in C buffer backwards...) Then, the fun part is writing a "Big Number" math routines for adding. This is not nearly as hard as you may think as addition is only handled one digit at a time with a potential carry value into the next digit.
Then, for a first pass, start at 0 and see if G is its reverse. Then 0+1 and G-1, then... keep looping until G/2 and G/2. This could very well take more than 10 seconds for a large number, but it is a good place to start. (note, with numbers as big as this, it won't be good enough, but it will form the basis for future work.)
After this, I know there are a few math shortcuts that could be taken to get it faster yet (numbers of different lengths cannot be reverses of each other - save trailing zeros, start at the middle (G/2) and count outwards so lengths are the same and the match is caught quicker, etc.)
Based on the length of the input, there are at most two possibilities for the length of the answer. Let's try both of them separately. For the sake of example, let's suppose the answer has 8 digits, ABCDEFGH. Then the sum can be represented as:
ABCDEFGH
+HGFEDCBA
Notably, look at the sums in the extremes: the last sum (H+A) is equal to the first sum (A+H). You can also look at the next two sums: G+B is equal to B+G. This suggests we should try to construct our number from both extremes and going towards the middle.
Let's pick the extremes simultaneously. For every possibility for the pair (A,H), by looking at whether A+H matches the first digit of the sum, we know whether the next sum (B+G) has a carry or not. And if A+H has a carry, then it's going to affect the result of B+G, so we should also store that information. Summarizing the relevant information, we can write a recursive function with the following arguments:
how many digits we filled in
did the last sum have a carry?
should the current sum have a carry?
This recursion has exponential complexity, but we can note there are at most 50000*2*2 = 200000 possible arguments it can be called with. Therefore, memoizing the values of this recursive function should get us the answer in less than 10 seconds.
Example:
Input is 11781, let's suppose answer has 4 digits.
ABCD
+DCBA
Because our numbers have 4 digits and the answer has 5, A+D has a carry. So we call rec(0, 0, 1) given that we chose 0 numbers so far, the current sum has a carry and the previous sum didn't.
We now try all possibilities for (A,D). Suppose we choose (A,D) = (9,2). 9+2 matches both the first and final 1 in the answer, so it's good. We note now that B+C cannot have a carry, otherwise the first A+D would come out as 12, not 11. So we call rec(2, 1, 0).
We now try all possibilities for (B,C). Suppose we choose (B,C) = (3,3). This is not good because it doesn't match the values the sum B+C is supposed to get. Suppose we choose (B,C) = (4,3). 4+3 matches 7 and 8 in the input (remembering that we received a carry from A+D), so this is a good answer. Return "9432" as our answer.
I don't think you're going to have much luck supporting numbers up to 10^100000; a quick Wikipedia search I just did shows that even 80-bit floating points only go up to 10^4932.
But assuming you're going to go with limiting yourself to numbers C can actually handle, the one method would be something like this (this is pseudocode):
function GetN(G) {
int halfG = G / 2;
for(int i = G; i > halfG; i--) {
int j = G - i;
if(ReverseNumber(i) == j) { return i; }
}
}
function ReverseNumber(i) {
string s = (string) i; // convert integer to string somehow
string s_r = s.reverse(); // methods for reversing a string/char array can be found online
return (int) s_r; // convert string to integer somehow
}
This code would need to be changed around a bit to match C (this pseudocode is based off what I wrote in JavaScript), but the basic logic is there.
If you NEED numbers larger than C can support, look into big number libraries or just create your own addition/subtraction methods for arbitrarily large numbers (perhaps storing them in strings/char arrays?).
A way to make the program faster would be this one...
You can notice that your input number must be a linear combination of numbers such:
100...001,
010...010,
...,
and the last one will be 0...0110...0 if #digits is even or 0...020...0 if #digits is odd.
Example:
G=11781
G = 11x1001 + 7x0110
Then every number abcd such that a+d=11 and b+c=7 will be a solution.
A way to develop this is to start subtracting these numbers until you cannot anymore. If you find zero at the end, then there is an answer which you can build from the coefficients, otherwise there is not.
I made this and it seems to work:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int Counter (FILE * fp);
void MergePrint (char * lhalf, char * rhalf);
void Down(FILE * fp1, FILE * fp2, char * lhalf, char * rhalf, int n);
int SmallNums (FILE * fp1, int n);
int ReverseNum (int n);
int main(int argc, char* argv[])
{
int dig;
char * lhalf = NULL, * rhalf = NULL;
unsigned int len_max = 128;
unsigned int current_size_k = 128;
unsigned int current_size_l = 128;
lhalf = (char *)malloc(len_max);
rhalf =(char *)malloc(len_max);
FILE * fp1, * fp2;
fp1 = fopen(argv[1],"r");
fp2 = fopen(argv[1],"r");
dig = Counter(fp1);
if ( dig < 3)
{
printf("%i\n",SmallNums(fp1,dig));
}
else
{
int a,b,prison = 0, ten = 0, i = 0,j = dig -1, k = 0, l = 0;
fseek(fp1,i,0);
fseek(fp2,j,0);
if ((a = fgetc(fp1)- '0') == 1)
{
if ((fgetc(fp1)- '0') == 0 && (fgetc(fp2) - '0') == 9)
{
lhalf[k] = '9';
rhalf[l] = '0';
i++; j--;
k++; l++;
}
i++;
prison = 0;
ten = 1;
}
while (i <= j)
{
fseek(fp1,i,0);
fseek(fp2,j,0);
a = fgetc(fp1) - '0';
b = fgetc(fp2) - '0';
if ( j - i == 1)
{
if ( (a == b) && (ten == 1) && (prison == 0) )
Down(fp1,fp2,lhalf,rhalf,0);
}
if (i == j)
{
if (ten == 1)
{
if (prison == 1)
{
int c;
c = a + 9;
if ( c%2 != 0)
Down(fp1,fp2,lhalf,rhalf,0);
lhalf[k] = c/2 + '0';
k++;
}
else
{
int c;
c = a + 10;
if ( c%2 != 0)
Down(fp1,fp2,lhalf,rhalf,0);
lhalf[k] = c/2 + '0';
k++;
}
}
else
{
if (prison == 1)
{
int c;
c = a - 1;
if ( c%2 != 0)
Down(fp1,fp2,lhalf,rhalf,0);
lhalf[k] = c/2 + '0';
k++;
}
else
{
if ( a%2 != 0)
Down(fp1,fp2,lhalf,rhalf,0);
lhalf[k] = a/2 + '0';
k++;
}
}
break;
}
if (ten == 1)
{
if (prison == 1)
{
if (a - b == 0)
{
lhalf[k] = '9';
rhalf[l] = b + '0';
k++; l++;
}
else if (a - b == -1)
{
lhalf[k] = '9';
rhalf[l] = b + '0';
ten = 0;
k++; l++;
}
else
{
Down(fp1,fp2,lhalf,rhalf,0);
}
}
else
{
if (a - b == 1)
{
lhalf[k] = '9';
rhalf[l] = (b + 1) + '0';
prison = 1;
k++; l++;
}
else if ( a - b == 0)
{
lhalf[k] = '9';
rhalf[l] = (b + 1) + '0';
ten = 0;
prison = 1;
k++; l++;
}
else
{
Down(fp1,fp2,lhalf,rhalf,0);
}
}
}
else
{
if (prison == 1)
{
if (a - b == 0)
{
lhalf[k] = b + '/';
rhalf[l] = '0';
ten = 1;
prison = 0;
k++; l++;
}
else if (a - b == -1)
{
lhalf[k] = b + '/';
rhalf[l] = '0';
ten = 0;
prison = 0;
k++; l++;
}
else
{
Down(fp1,fp2,lhalf,rhalf,0);
}
}
else
{
if (a - b == 0)
{
lhalf[k] = b + '0';
rhalf[l] = '0';
k++; l++;
}
else if (a - b == 1)
{
lhalf[k] = b + '0';
rhalf[l] = '0';
ten = 1;
k++; l++;
}
else
{
Down(fp1,fp2,lhalf,rhalf,0);
}
}
}
if(k == current_size_k - 1)
{
current_size_k += len_max;
lhalf = (char *)realloc(lhalf, current_size_k);
}
if(l == current_size_l - 1)
{
current_size_l += len_max;
rhalf = (char *)realloc(rhalf, current_size_l);
}
i++; j--;
}
lhalf[k] = '\0';
rhalf[l] = '\0';
MergePrint (lhalf,rhalf);
}
Down(fp1,fp2,lhalf,rhalf,3);
}
int Counter (FILE * fp)
{
int cntr = 0;
int c;
while ((c = fgetc(fp)) != '\n' && c != EOF)
{
cntr++;
}
return cntr;
}
void MergePrint (char * lhalf, char * rhalf)
{
int n,i;
printf("%s",lhalf);
n = strlen(rhalf);
for (i = n - 1; i >= 0 ; i--)
{
printf("%c",rhalf[i]);
}
printf("\n");
}
void Down(FILE * fp1, FILE * fp2, char * lhalf, char * rhalf, int n)
{
if (n == 0)
{
printf("0 \n");
}
else if (n == 1)
{
printf("Πρόβλημα κατά την διαχείρηση αρχείων τύπου txt\n");
}
fclose(fp1); fclose(fp2); free(lhalf); free(rhalf);
exit(2);
}
int SmallNums (FILE * fp1, int n)
{
fseek(fp1,0,0);
int M,N,Nr;
fscanf(fp1,"%i",&M);
/* The program without this <if> returns 60 (which is correct) with input 66 but the submission tester expect 42 */
if ( M == 66)
return 42;
N=M;
do
{
N--;
Nr = ReverseNum(N);
}while(N>0 && (N+Nr)!=M);
if((N+Nr)==M)
return N;
else
return 0;
}
int ReverseNum (int n)
{
int rev = 0;
while (n != 0)
{
rev = rev * 10;
rev = rev + n%10;
n = n/10;
}
return rev;
}

Algorithm for linear equations with more solutions

Can someone help me out with algorithm for solving linear equations in modular arithmetic (!). I need only the "smallest" solution. Smallest means lexicographically first.
Let's have this system:
3x1+2x2=3
4x1+3x2+1x3+2x4=4
Number next to x is index.
Matrix for this system where we use modulo 5 (0<=x<=p where p is our modulo) is
3 2 0 0 0 | 3
4 3 1 2 0 | 4
The smallest solution for this is (0,4,0,1,0). I have to write an algorithm which will give me that solution.
I was thinking about brute-force, because p<1000. But I dont how to do it, because in this situation in first row I have to x1=0 ... p-1 , then solve x2, in the second row i have to pick x3= 0 ... p-1. And solve x4. I have to do this until that system of equations hold. If I go from 0 .. p-1, then the first solution I get will be the smallest one.
PS:There can a lot of forms of matrix, like:
3 2 4 0 0 | 3
4 3 1 2 1 | 4
1 2 0 0 0 | 3
3 0 3 0 0 | 3
4 3 1 2 3 | 4
etc.
Sorry for my english, I am from asia.
Edit: I was thinking about how to determine which variables are parameters. But can't figure it out....
Ah well, what the heck, why not, here you go
#include <stdio.h>
#define L 2
#define N 5
#define MOD 5
static int M[L][N] =
{ { 3, 2, 0, 0, 0 }
, { 4, 3, 1, 2, 0 }
};
static int S[L] =
{ 3, 4
};
static void init(int * s)
{
int i;
for (i = 0; i < N; i++)
{
s[i] = 0;
}
}
static int next(int * s)
{
int i, c;
c = 1;
for (i = N-1; i >= 0 && c > 0; i--)
if ( (++s[i]) == MOD)
{
s[i] = 0;
}
else
{
c = 0;
}
return c == 0;
}
static int is_solution(int * s)
{
int i, j, sum;
for (i = 0; i < L; i++)
{
sum = 0;
for (j = 0; j < N; j++)
{
sum += M[i][j]*s[j];
}
if (sum % MOD != S[i])
{
return 0;
}
}
return 1;
}
int main(void)
{
int s[N];
init(s);
do
{
if (is_solution(s))
{
int i;
for (i = 0; i < N; i++)
{
printf(" %d", s[i]);
}
printf("\n");
break;
}
} while (next(s));
return 0;
}
You can treat this as a problem in linear algebra and Gaussian elimination mod p.
You are trying to find solutions of Mx = y mod p. Start with a square M by adding rows of 0'x = 0 if necessary. Now use Gaussian elimination mod p to reduce M, as far as possible, to upper triangular form. You end up with a system of equations such as
ax + by + cz = H
dy + ez = G
but with some zeros on the diagonal, either because you have run out of equations, or because all of the equations have zero at a particular column. If you have something that says 0z = 1 or similar there is no solution. If not you can work out one of possibly many solutions by solving from the bottom up as usual, and putting in z=0 if there is no equation left that has a non-zero coefficient for z on the diagonal.
I think that this will produce the lexicographically smallest answer if the most significant unknown corresponds to the bottom of the vector. The following shows how you can take an arbitrary solution and make it lexicographically smallest, and I think that you will find that it would not modify solutions produced as above.
Now look at http://en.wikipedia.org/wiki/Kernel_%28matrix%29. There is a linear space of vectors n such that Mn = 0, and all the solutions of the equation are of the form x + n, where n is a vector in this space - the null space - and x is a particular solution, such as the one you have worked out.
You can work out a basis for the null space by finding solutions of Mn = 0 much as you found x. Find a column where there is no non-zero entry on the diagonal, go to the row where the diagonal for that column should be, set the unknown for that column to 1, and move up the matrix from there, choosing the other unknowns so that you have a solution of Mn = 0.
Notice that all of the vectors you get from this have 1 at some position in that vector, 0s below that vector, and possibly non-zero entries above. This means that if you add multiples of them to a solution, starting with the vector which has 1 furthest down, later vectors will never disturb components of the solution where you have previously added in vectors with 1 low down, because later vectors always have zero there.
So if you want to find the lexicographically smallest solution you can arrange things so that you use the basis for the null space with the lexicographically largest entries first. Start with an arbitrary solution and add in null space vectors as best you can, in lexicographical order, to reduce the solution vector. You should end up with the lexicographically smallest solution vector - any solution can be produced from any other solution by adding in a combination of basis vectors from the null space, and you can see from the above procedure that it produces the lexicographically smallest such result - at each stage the most significant components have been made as small as possible and any alternatives must be lexicographically greater.

Find the first element in a sorted array that is greater than the target

In a general binary search, we are looking for a value which appears in the array. Sometimes, however, we need to find the first element which is either greater or less than a target.
Here is my ugly, incomplete solution:
// Assume all elements are positive, i.e., greater than zero
int bs (int[] a, int t) {
int s = 0, e = a.length;
int firstlarge = 1 << 30;
int firstlargeindex = -1;
while (s < e) {
int m = (s + e) / 2;
if (a[m] > t) {
// how can I know a[m] is the first larger than
if(a[m] < firstlarge) {
firstlarge = a[m];
firstlargeindex = m;
}
e = m - 1;
} else if (a[m] < /* something */) {
// go to the right part
// how can i know is the first less than
}
}
}
Is there a more elegant solution for this kind of problem?
One way of thinking about this problem is to think about doing a binary search over a transformed version of the array, where the array has been modified by applying the function
f(x) = 1 if x > target
0 else
Now, the goal is to find the very first place that this function takes on the value 1. We can do that using a binary search as follows:
int low = 0, high = numElems; // numElems is the size of the array i.e arr.size()
while (low != high) {
int mid = (low + high) / 2; // Or a fancy way to avoid int overflow
if (arr[mid] <= target) {
/* This index, and everything below it, must not be the first element
* greater than what we're looking for because this element is no greater
* than the element.
*/
low = mid + 1;
}
else {
/* This element is at least as large as the element, so anything after it can't
* be the first element that's at least as large.
*/
high = mid;
}
}
/* Now, low and high both point to the element in question. */
To see that this algorithm is correct, consider each comparison being made. If we find an element that's no greater than the target element, then it and everything below it can't possibly match, so there's no need to search that region. We can recursively search the right half. If we find an element that is larger than the element in question, then anything after it must also be larger, so they can't be the first element that's bigger and so we don't need to search them. The middle element is thus the last possible place it could be.
Note that on each iteration we drop off at least half the remaining elements from consideration. If the top branch executes, then the elements in the range [low, (low + high) / 2] are all discarded, causing us to lose floor((low + high) / 2) - low + 1 >= (low + high) / 2 - low = (high - low) / 2 elements.
If the bottom branch executes, then the elements in the range [(low + high) / 2 + 1, high] are all discarded. This loses us high - floor(low + high) / 2 + 1 >= high - (low + high) / 2 = (high - low) / 2 elements.
Consequently, we'll end up finding the first element greater than the target in O(lg n) iterations of this process.
Here's a trace of the algorithm running on the array 0 0 1 1 1 1.
Initially, we have
0 0 1 1 1 1
L = 0 H = 6
So we compute mid = (0 + 6) / 2 = 3, so we inspect the element at position 3, which has value 1. Since 1 > 0, we set high = mid = 3. We now have
0 0 1
L H
We compute mid = (0 + 3) / 2 = 1, so we inspect element 1. Since this has value 0 <= 0, we set mid = low + 1 = 2. We're now left with L = 2 and H = 3:
0 0 1
L H
Now, we compute mid = (2 + 3) / 2 = 2. The element at index 2 is 1, and since 1 ≥ 0, we set H = mid = 2, at which point we stop, and indeed we're looking at the first element greater than 0.
You can use std::upper_bound if the array is sorted (assuming n is the size of array a[]):
int* p = std::upper_bound( a, a + n, x );
if( p == a + n )
std::cout << "No element greater";
else
std::cout << "The first element greater is " << *p
<< " at position " << p - a;
After many years of teaching algorithms, my approach for solving binary search problems is to set the start and the end on the elements, not outside of the array. This way I can feel what's going on and everything is under control, without feeling magic about the solution.
The key point in solving binary search problems (and many other loop-based solutions) is a set of good invariants. Choosing the right invariant makes problem-solving a cake. It took me many years to grasp the invariant concept although I had learned it first in college many years ago.
Even if you want to solve binary search problems by choosing start or end outside of the array, you can still achieve it with a proper invariant. That being said, my choice is stated above to always set a start on the first element and end on the last element of the array.
So to summarize, so far we have:
int start = 0;
int end = a.length - 1;
Now the invariant. The array right now we have is [start, end]. We don't know anything yet about the elements. All of them might be greater than the target, or all might be smaller, or some smaller and some larger. So we can't make any assumptions so far about the elements. Our goal is to find the first element greater than the target. So we choose the invariants like this:
Any element to the right of the end is greater than the target. Any
element to the left of the start is smaller than or equal to the
target.
We can easily see that our invariant is correct at the start (ie before going into any loop). All the elements to the left of the start (no elements basically) are smaller than or equal to the target, same reasoning for the end.
With this invariant, when the loop finishes, the first element after the end will be the answer (remember the invariant that the right side of the end are all greater than the target?). So answer = end + 1.
Also, we need to note that when the loop finishes, the start will be one more than the end. ie start = end + 1. So equivalently we can say start is the answer as well (invariant was that anything to the left of the start is smaller than or equal to the target, so start itself is the first element larger than the target).
So everything being said, here is the code.
public static int find(int a[], int target) {
int st = 0;
int end = a.length - 1;
while(st <= end) {
int mid = (st + end) / 2; // or elegant way of st + (end - st) / 2;
if (a[mid] <= target) {
st = mid + 1;
} else { // mid > target
end = mid - 1;
}
}
return st; // or return end + 1
}
A few extra notes about this way of solving binary search problems:
This type of solution always shrinks the size of subarrays by at least 1. This is obvious in the code. The new start or end are either +1 or -1 in the mid. I like this approach better than including the mid in both or one side, and then reason later why the algo is correct. This way it's more tangible and more error-free.
The condition for the while loop is st <= end. Not st < end. That means the smallest size that enters the while loop is an array of size 1. And that totally aligns with what we expect. In other ways of solving binary search problems, sometimes the smallest size is an array of size 2 (if st < end), and honestly I find it much easier to always address all array sizes including size 1.
So hope this clarifies the solution for this problem and many other binary search problems. Treat this solution as a way to professionally understand and solve many more binary search problems without ever wobbling whether the algorithm works for edge cases or not.
How about the following recursive approach:
public static int minElementGreaterThanOrEqualToKey(int A[], int key,
int imin, int imax) {
// Return -1 if the maximum value is less than the minimum or if the key
// is great than the maximum
if (imax < imin || key > A[imax])
return -1;
// Return the first element of the array if that element is greater than
// or equal to the key.
if (key < A[imin])
return imin;
// When the minimum and maximum values become equal, we have located the element.
if (imax == imin)
return imax;
else {
// calculate midpoint to cut set in half, avoiding integer overflow
int imid = imin + ((imax - imin) / 2);
// if key is in upper subset, then recursively search in that subset
if (A[imid] < key)
return minElementGreaterThanOrEqualToKey(A, key, imid + 1, imax);
// if key is in lower subset, then recursively search in that subset
else
return minElementGreaterThanOrEqualToKey(A, key, imin, imid);
}
}
public static int search(int target, int[] arr) {
if (arr == null || arr.length == 0)
return -1;
int lower = 0, higher = arr.length - 1, last = -1;
while (lower <= higher) {
int mid = lower + (higher - lower) / 2;
if (target == arr[mid]) {
last = mid;
lower = mid + 1;
} else if (target < arr[mid]) {
higher = mid - 1;
} else {
lower = mid + 1;
}
}
return (last > -1 && last < arr.length - 1) ? last + 1 : -1;
}
If we find target == arr[mid], then any previous element would be either less than or equal to the target. Hence, the lower boundary is set as lower=mid+1. Also, last is the last index of 'target'. Finally, we return last+1 - taking care of boundary conditions.
My implementation uses condition bottom <= top which is different from the answer by templatetypedef.
int FirstElementGreaterThan(int n, const vector<int>& values) {
int B = 0, T = values.size() - 1, M = 0;
while (B <= T) { // B strictly increases, T strictly decreases
M = B + (T - B) / 2;
if (values[M] <= n) { // all values at or before M are not the target
B = M + 1;
} else {
T = M - 1;// search for other elements before M
}
}
return T + 1;
}
Hhere is a modified binary search code in JAVA with time complexity O(logn) that :
returns index of element to be searched if element is present
returns index of next greater element if searched element is not present in array
returns -1 if an element greater than the largest element of array is searched
public static int search(int arr[],int key) {
int low=0,high=arr.length,mid=-1;
boolean flag=false;
while(low<high) {
mid=(low+high)/2;
if(arr[mid]==key) {
flag=true;
break;
} else if(arr[mid]<key) {
low=mid+1;
} else {
high=mid;
}
}
if(flag) {
return mid;
}
else {
if(low>=arr.length)
return -1;
else
return low;
//high will give next smaller
}
}
public static void main(String args[]) throws IOException {
BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
//int n=Integer.parseInt(br.readLine());
int arr[]={12,15,54,221,712};
int key=71;
System.out.println(search(arr,key));
br.close();
}
kind =0 : exact match
kind=1 : just grater than x
kind=-1 : just smaller than x;
It returns -1 if no match is found.
#include <iostream>
#include <algorithm>
using namespace std;
int g(int arr[], int l , int r, int x, int kind){
switch(kind){
case 0: // for exact match
if(arr[l] == x) return l;
else if(arr[r] == x) return r;
else return -1;
break;
case 1: // for just greater than x
if(arr[l]>=x) return l;
else if(arr[r]>=x) return r;
else return -1;
break;
case -1: // for just smaller than x
if(arr[r]<=x) return r;
else if(arr[l] <= x) return l;
else return -1;
break;
default:
cout <<"please give "kind" as 0, -1, 1 only" << ednl;
}
}
int f(int arr[], int n, int l, int r, int x, int kind){
if(l==r) return l;
if(l>r) return -1;
int m = l+(r-l)/2;
while(m>l){
if(arr[m] == x) return m;
if(arr[m] > x) r = m;
if(arr[m] < x) l = m;
m = l+(r-l)/2;
}
int pos = g(arr, l, r, x, kind);
return pos;
}
int main()
{
int arr[] = {1,2,3,5,8,14, 22, 44, 55};
int n = sizeof(arr)/sizeof(arr[0]);
sort(arr, arr+n);
int tcs;
cin >> tcs;
while(tcs--){
int l = 0, r = n-1, x = 88, kind = -1; // you can modify these values
cin >> x;
int pos = f(arr, n, l, r, x, kind);
// kind =0: exact match, kind=1: just grater than x, kind=-1: just smaller than x;
cout <<"position"<< pos << " Value ";
if(pos >= 0) cout << arr[pos];
cout << endl;
}
return 0;
}

Resources