Change the minimum number of entries in an array so that the sum of any k consecutive items is even - arrays

We are given an array of integers. We have to change the minimum number of those integers however we'd like so that, for some fixed parameter k, the sum of any k consecutive items in the array is even.
Example:
N = 8; K = 3;
A = {1,2,3,4,5,6,7,8}
We can change 3 elements (4th,5th,6th)
so the array can be {1,2,3,5,6,7,7,8}
then
1+2+3=6 is even
2+3+5=10 is even
3+5+6=14 is even
5+6+7=18 is even
6+7+7=20 is even
7+7+8=22 is even

There's a very nice O(n)-time solution to this problem that, at a high level, works like this:
Recognize that determining which items to flip boils down to determining a pattern that repeats across the array of which items to flip.
Use dynamic programming to determine what that pattern is.
Here's how to arrive at this solution.
First, some observations. Since all we care about here is whether the sums are even or odd, we actually don't care about the numbers' exact values. We just care about whether they're even or odd. So let's begin by replacing each number with either 0 (if the number is even) or 1 (if it's odd). Now, our task is to make each window of k elements have an even number of 1s.
Second, the pattern of 0s and 1s that results after you've transformed the array has a surprising shape: it's simply a repeated copy of the first k elements of the array. For example, suppose k = 5 and we decide that the array should start off as 1 0 1 1 1. What must the sixth array element be? Well, in moving from the first window to the second, we dropped a 1 off the front of the window, changing the parity to odd. We therefore have to have the next array element be a 1, which means that the sixth array element must be a 1, equal to the first array element. The seventh array element then has to be a 0, since in moving from the second window to the third we drop off a zero. This process means that whatever we decide on for the first k elements turns out to determine the entire final sequence of values.
This means that we can reframe the problem in the following way: break the original input array of n items into n/k blocks of size k. We're now asked to pick a sequence of 0s and 1s such that
this sequence differs in as few places as possible from the n/k blocks of k items each, and
the sequence has an even number of 1s.
For example, given the input sequence
0 1 1 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1
and k = 3, we would form the blocks
0 1 1, 0 1 1, 1 0 0, 1 0 1, 1 1 0, 1 1 1
and then try to find a pattern of length three with an even number of 1s in it such that replacing each block with that pattern requires the fewest number of edits.
Let's see how to take that problem on. Let's work one bit at a time. For example, we can ask: what's the cost of making the first bit a 0? What's the cost of making the first bit a 1? The cost of making the first bit a 0 is equal to the number of blocks that have a 1 at the front, and the cost of making the first bit a 1 is equal to the number of blocks that have a 0 at the front. We can work out the cost of setting each bit, individually, to either to zero or to one. That gives us a matrix like this one:
Bit #0 Bit #1 Bit #2 Bit #3 ... Bit #k-1
---------------------+--------+--------+--------+--------+--------+----------
Cost of setting to 0 | | | | | | |
Cost of setting to 1 | | | | | | |
We now need to choose a value for each column with the goal of minimizing the total cost picked, subject to the constraint that we pick an even number of bits to be equal to 1. And this is a nice dynamic programming exercise. We consider subproblems of the form
What is the lowest cost you can make out of the first m columns from the table, provided your choice has parity p of items chosen from the bottom row?
We can store this in an (k + 1) × 2 table T[m][p], where, for example, T[3][even] is the lowest cost you can achieve using the first three columns with an even number of items set to 1, and T[6][odd] is the lowest cost you can achieve using the first six columns with an odd number of items set to 1. This gives the following recurrence:
T[0][even] = 0 (using zero columns costs nothing)
T[0][odd] = ∞ (you cannot have an odd number of bits set to 1 if you use no colums)
T[m+1][p] = min(T[m][p] + cost of setting this bit to 0, T[m][!p] + cost of setting this bit to 1) (either use a zero and keep the same parity, or use a 1 and flip the parity).
This can be evaluated in time O(k), and the resulting minimum cost is given by T[n][even]. You can use a standard DP table walk to reconstruct the optimal solution from this point.
Overall, here's the final algorithm:
create a table costs[k+1][2], all initially zero.
/* Populate the costs table. costs[m][0] is the cost of setting bit m
* to 0; costs[m][1] is the cost of setting bit m to 1. We work this
* out by breaking the input into blocks of size k, then seeing, for
* each item within each block, what its parity is. The cost of setting
* that bit to the other parity then increases by one.
*/
for i = 0 to n - 1:
parity = array[i] % 2
costs[i % k][!parity]++ // Cost of changing this entry
/* Do the DP algorithm to find the minimum cost. */
create array T[k + 1][2]
T[0][0] = 0
T[0][1] = infinity
for m from 1 to k:
for p from 0 to 1:
T[m][p] = min(T[m - 1][p] + costs[m - 1][0],
T[m - 1][!p] + costs[m - 1][1])
return T[m][0]
Overall, we do O(n) work with our initial pass to work out the costs of setting each bit, independently, to 0. We then do O(k) work with the DP step at the end. The overall work is therefore O(n + k), and assuming k ≤ n (otherwise the problem is trivial) the cost is O(n).

Related

Smallest number that cannot be formed from sum of numbers from array

This problem was asked to me in Amazon interview -
Given a array of positive integers, you have to find the smallest positive integer that can not be formed from the sum of numbers from array.
Example:
Array:[4 13 2 3 1]
result= 11 { Since 11 was smallest positive number which can not be formed from the given array elements }
What i did was :
sorted the array
calculated the prefix sum
Treverse the sum array and check if next element is less than 1
greater than sum i.e. A[j]<=(sum+1). If not so then answer would
be sum+1
But this was nlog(n) solution.
Interviewer was not satisfied with this and asked a solution in less than O(n log n) time.
There's a beautiful algorithm for solving this problem in time O(n + Sort), where Sort is the amount of time required to sort the input array.
The idea behind the algorithm is to sort the array and then ask the following question: what is the smallest positive integer you cannot make using the first k elements of the array? You then scan forward through the array from left to right, updating your answer to this question, until you find the smallest number you can't make.
Here's how it works. Initially, the smallest number you can't make is 1. Then, going from left to right, do the following:
If the current number is bigger than the smallest number you can't make so far, then you know the smallest number you can't make - it's the one you've got recorded, and you're done.
Otherwise, the current number is less than or equal to the smallest number you can't make. The claim is that you can indeed make this number. Right now, you know the smallest number you can't make with the first k elements of the array (call it candidate) and are now looking at value A[k]. The number candidate - A[k] therefore must be some number that you can indeed make with the first k elements of the array, since otherwise candidate - A[k] would be a smaller number than the smallest number you allegedly can't make with the first k numbers in the array. Moreover, you can make any number in the range candidate to candidate + A[k], inclusive, because you can start with any number in the range from 1 to A[k], inclusive, and then add candidate - 1 to it. Therefore, set candidate to candidate + A[k] and increment k.
In pseudocode:
Sort(A)
candidate = 1
for i from 1 to length(A):
if A[i] > candidate: return candidate
else: candidate = candidate + A[i]
return candidate
Here's a test run on [4, 13, 2, 1, 3]. Sort the array to get [1, 2, 3, 4, 13]. Then, set candidate to 1. We then do the following:
A[1] = 1, candidate = 1:
A[1] ≤ candidate, so set candidate = candidate + A[1] = 2
A[2] = 2, candidate = 2:
A[2] ≤ candidate, so set candidate = candidate + A[2] = 4
A[3] = 3, candidate = 4:
A[3] ≤ candidate, so set candidate = candidate + A[3] = 7
A[4] = 4, candidate = 7:
A[4] ≤ candidate, so set candidate = candidate + A[4] = 11
A[5] = 13, candidate = 11:
A[5] > candidate, so return candidate (11).
So the answer is 11.
The runtime here is O(n + Sort) because outside of sorting, the runtime is O(n). You can clearly sort in O(n log n) time using heapsort, and if you know some upper bound on the numbers you can sort in time O(n log U) (where U is the maximum possible number) by using radix sort. If U is a fixed constant, (say, 109), then radix sort runs in time O(n) and this entire algorithm then runs in time O(n) as well.
Hope this helps!
Use bitvectors to accomplish this in linear time.
Start with an empty bitvector b. Then for each element k in your array, do this:
b = b | b << k | 2^(k-1)
To be clear, the i'th element is set to 1 to represent the number i, and | k is setting the k-th element to 1.
After you finish processing the array, the index of the first zero in b is your answer (counting from the right, starting at 1).
b=0
process 4: b = b | b<<4 | 1000 = 1000
process 13: b = b | b<<13 | 1000000000000 = 10001000000001000
process 2: b = b | b<<2 | 10 = 1010101000000101010
process 3: b = b | b<<3 | 100 = 1011111101000101111110
process 1: b = b | b<<1 | 1 = 11111111111001111111111
First zero: position 11.
Consider all integers in interval [2i .. 2i+1 - 1]. And suppose all integers below 2i can be formed from sum of numbers from given array. Also suppose that we already know C, which is sum of all numbers below 2i. If C >= 2i+1 - 1, every number in this interval may be represented as sum of given numbers. Otherwise we could check if interval [2i .. C + 1] contains any number from given array. And if there is no such number, C + 1 is what we searched for.
Here is a sketch of an algorithm:
For each input number, determine to which interval it belongs, and update corresponding sum: S[int_log(x)] += x.
Compute prefix sum for array S: foreach i: C[i] = C[i-1] + S[i].
Filter array C to keep only entries with values lower than next power of 2.
Scan input array once more and notice which of the intervals [2i .. C + 1] contain at least one input number: i = int_log(x) - 1; B[i] |= (x <= C[i] + 1).
Find first interval that is not filtered out on step #3 and corresponding element of B[] not set on step #4.
If it is not obvious why we can apply step 3, here is the proof. Choose any number between 2i and C, then sequentially subtract from it all the numbers below 2i in decreasing order. Eventually we get either some number less than the last subtracted number or zero. If the result is zero, just add together all the subtracted numbers and we have the representation of chosen number. If the result is non-zero and less than the last subtracted number, this result is also less than 2i, so it is "representable" and none of the subtracted numbers are used for its representation. When we add these subtracted numbers back, we have the representation of chosen number. This also suggests that instead of filtering intervals one by one we could skip several intervals at once by jumping directly to int_log of C.
Time complexity is determined by function int_log(), which is integer logarithm or index of the highest set bit in the number. If our instruction set contains integer logarithm or any its equivalent (count leading zeros, or tricks with floating point numbers), then complexity is O(n). Otherwise we could use some bit hacking to implement int_log() in O(log log U) and obtain O(n * log log U) time complexity. (Here U is largest number in the array).
If step 1 (in addition to updating the sum) will also update minimum value in given range, step 4 is not needed anymore. We could just compare C[i] to Min[i+1]. This means we need only single pass over input array. Or we could apply this algorithm not to array but to a stream of numbers.
Several examples:
Input: [ 4 13 2 3 1] [ 1 2 3 9] [ 1 1 2 9]
int_log: 2 3 1 1 0 0 1 1 3 0 0 1 3
int_log: 0 1 2 3 0 1 2 3 0 1 2 3
S: 1 5 4 13 1 5 0 9 2 2 0 9
C: 1 6 10 23 1 6 6 15 2 4 4 13
filtered(C): n n n n n n n n n n n n
number in
[2^i..C+1]: 2 4 - 2 - - 2 - -
C+1: 11 7 5
For multi-precision input numbers this approach needs O(n * log M) time and O(log M) space. Where M is largest number in the array. The same time is needed just to read all the numbers (and in the worst case we need every bit of them).
Still this result may be improved to O(n * log R) where R is the value found by this algorithm (actually, the output-sensitive variant of it). The only modification needed for this optimization is instead of processing whole numbers at once, process them digit-by-digit: first pass processes the low order bits of each number (like bits 0..63), second pass - next bits (like 64..127), etc. We could ignore all higher-order bits after result is found. Also this decreases space requirements to O(K) numbers, where K is number of bits in machine word.
If you sort the array, it will work for you. Counting sort could've done it in O(n), but if you think in a practically large scenario, range can be pretty high.
Quicksort O(n*logn) will do the work for you:
def smallestPositiveInteger(self, array):
candidate = 1
n = len(array)
array = sorted(array)
for i in range(0, n):
if array[i] <= candidate:
candidate += array[i]
else:
break
return candidate

Finding row with maximum no. of 1s if each row is sorted using logicalOR approach

Question similar to this may have been discussed before but I want to discuss a different approach to this.
Given a boolen 2D array where each row is sorted, find the rows with maximum number of 1s.
Input Matrix :
0 1 1 1
0 0 1 1
1 1 1 1
0 0 0 0
Output : 2
How about doing this approach...Logical OR for column 0 of each row and if answer is 1, return that row index and stop. Like in this case if I do (0 | 0 | 1 | 0) answer would be one and thereby return that row index. if the input matrix is something like :
Input matrix:
0 1 1 1
0 0 1 1
0 0 0 1
0 0 0 0
Ouput : 0
When I do logicalOR of column 0 of each row, answer would be zero...so I would move to column 1 of each row, the procedure is followed till the LogicalOR is 1.?I know other approaches to solve this problem but I would like to have view on this approach.
If it's:
0 ... 0 1
0 ... 0 0
0 ... 0 0
0 ... 0 0
0 ... 0 0
You'd have to search many columns.
The maximum amount of work involved would be linear in the number of cells (O(mn)), and the other approaches outperform this here.
Specifically the approach where:
You start at the top right and
Repeatedly:
Search left until you find a 0 and
Search down until you find a 1
And return the last row where you found a 1
Is linear in the number of rows plus columns (O(m + n)).
That would work since it's equivalent to finding the row for which the leftmost 1 is before (or at the same point as) any other row's leftmost 1. It would still be O(m * n) in the worst case:
Input Matrix :
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 1
Given that your rows are sorted, I would binary search for the position of the first one for each row, and return the row with the minimum position. This would be O(m * logn), although you might be able to do better.
Your approach is likely to be orders of magnitude slower than the naive "go through the rows, and count the zeros, and remember the row with the fewest zeros." The reason is that, assuming your bits are stored one-row-at-a-time, with the bools packed tightly, then memory for the row will be in cache all at once, and bit-counting will cache beautifully.
Contrast this to your proposed approach, where for each row, the cache line will be loaded, and a single bit will be read from it. By the time you've cycled through all the rows in your array, the memory for the first row will (probably, if you've got any reasonable number of rows), be out of the cache, and the row will have to be loaded again.
Approximately, assuming a 64B cache line, the first approach is going to need (1/64*8) memory accesses per bit in the array, compared to 1 memory access per bit in the array compared to yours. Since counting the bits and remembering the max is just a few cycles, it's reasonable to think that the memory access are going to dominate the running cost, which means the first approach will run approximately 64 * 8 = 512 times faster. Of course, you'll get some of that time back because your approach can terminate early, but the 512 times speed hit is a large cost to overcome.
If your rows are super-long, you may find that a hybrid between these two approaches works excellently: count the number of bits in the first cache-line's worth of data in each row (being careful to cache-line-align each row of your data in memory), and if every row has no bits set in the first cache-line, go to the second and so forth. This combines the cache-efficiency of the first approach with the early termination of the second approach.
As with all optimisations, you should measure results, and be sure that it's important that the code is fast. The efficient solution is likely to impose annoying restrictions (like 64-byte memory alignment for rows), and the code will be harder to read than a straightforward solution.

Find the missing number in a group {0......2^k -1} range

Given an array that has the numbers {0......2^k -1} except for one number ,
find a good algorithm that finds the missing number.
Please notice , you can only use :
for A[i] return the value of bit j.
swap A[i] with A[j].
My answer : use divide & conquer , check the bit number K of all the numbers , if the K bit (now we're on the LSB) is 0 then move the number to the left side, if the K bit is 1 then move the number to the right side.
After the 1st iteration , we'd have two groups , where one of them is bigger than the other , so we continue to do the same thing, with the smaller group , and I think that I need to check the K-1 bit this time.
But from some reason I've tried with 8 numbers , from 0.....7 , and removed 4 (say that I want to find out that 4 is the missing number) , however to algorithm didn't work out so good. So where is my mistake ?
I assume you can build xor bit function using get bit j.
The answer will be (xor of all numbers)
PROOF: a xor (2^k-1-a) = 2^k-1 (a and (2^k-1-a) will have different bits in first k positions).
Then 0 xor 1 xor ... xor 2^k-1 = (0 xor 2^k-1) xor (1 xor 2^k-2).... (2^(k-1) pairs) = 0.
if number n is missing the result will be n, because 0 xor 1 xor 2....xor n-1 xor n+1 xor ... = 0 xor 1 xor 2....xor n-1 xor n+1 xor ... xor n xor n = 0 xor n = n
EDIT: This will not work if k = 1.
Ron,
your solution is correct. This problem smells Quicksort, doesn't it ?
What you do with the Kth bit (all 0's to the left, 1's to the right) is a called a partition - you need to find the misplaced elements in pairs and swap them. It's the process used in Hoare's Selection and in Quicksort, with special element classification - no need to use a pivot element.
You forgot to tell in the problem statement how many elements there are in the array (2^k-2 or more), i.e. if repetitions are allowed.
If repetitions are not allowed, every partition will indeed be imbalanced by one element. The algorithm to use is an instance of Hoare's Selection (only partition the smallest halve). At every partition stage, the number of elements to be considered is halved, hence O(N) running time. This is optimal since every element needs to be known before the solution can be found.
[If repetitions are allowed, use modified Quicksort (recursively partition both halves) until you arrive at an empty half. The running time is probably O(N Lg(N)) then, but this needs to be checked.]
You say that the algorithm failed on your test case: you probably mis-implemented some detail.
An example:
Start with
5132670 (this is range {0..7})
After partitioning on bit weight=4 you get
0132|675
where the shortest half is
675 (this is range {4..7})
After partitioning on bit weight=2, you get
5|67
where the shortest half is
5 (this is range {4..5})
After partitioning on bit weight=1, you get
|5
where the shortest half is empty (this is range {4}).
Done.
for n just add them all and subtract the result from n*(n+1)/2
n*(n+1)/2 is sum of 1...n all numbers. If one of them is missing, then sum of those n-1 numbers will be n*(n+1)/2-missingNumber
Your answer is: n*(n+1)/2-missingNumber where n is 2^k-1
Given the fact that for a given bit position j, there are exactly 2^(k-1) numbers which have it set to 0, and 2^(k-1) which have it set to 1 use the following algorithm.
start with an array B of boolean of size k
init the array to false everywhere
for each number A[i]
for each position j
get the value v
if v is 1 invert the boolean at position j
end for
end for
If a position is false at the end then the missing number does have a zero at
this position, otherwise it has a one (for k >1, If k = 1 then it is the inverse). Now to implement your array of booleans
create a number of size 2k, where the lower k are set to 0, and the upper
are set to 1. Then
invert the boolean at position j
is simply *
swap B[j] with B[j+k].
With this representation the missing number is the lower k elements of the array
B. Well this algorithm is still O(k*2^k) but you can say it is O(n*log(n))
of the input.
you can consider elements as string of k bits and at each step i if the number of ones or zeros in position i is 2^(k-i) you should remove all those strings an continue for example
100 111 010
101 110 000 011
so
100 111 101 110 all will be removed
and between 010 000 011 , 010 and 011 will be removed because their second bit is 1
000 remain and its rightmost bit is zero so 001 is the missing number

Finding FORTRAN array location, 4-dimensional array

Hey guys, I have a question.
If given a four dimensional array in FORTRAN, and told to find a location of a certain part of it (with a starting location of 200 and 4 bytes per integer). Is there a formula to find the location if is stored in row-major and column-major order.
Basiically given array A(x:X, y:Y, z:Z, q:q) and told to find the location at A(a,b,c,d) what is the formula for finding the location
This comes up all the time when using C libraries with Fortran -- eg, calling MPI routines trying to send particular subsets of Fortran arrays.
Fortran is row-major, or more usefully, the first index moves fastest. That is, the item after A(1,2,3,4) in linear order in memory is A(2,2,3,4). So in your example above, an increase in a by one is a jump of 1 index in the array; a jump in b by one corresponds to a jump of (X-x+1); a jump in c by one corresponds to a jump of (X-x+1)x(Y-y+1), and a jump in d by one is a jump of (X-x+1)x(Y-y+1)x(Z-z+1). In C-based languages, it would be just the opposite; a jump of 1 in the d index would move you 1 index in memory; a jump in c would be a jump of (Q-q+1), etc.
If you have m indicies, and ni is the (zero-based) index in the ith index from the left, and that index has a range of Ni, then the (zero-based) index from the starting position is something like this:
where the product is 1 if the upper index is less than the lower index. To find the number of bytes from the start of the array, you'd multiply that by the size of the object, eg 4 bytes for 32-bit integers.
Been over 25 years since I did any FORTRAN.
I believe FORTRAN, unlike many other languages, lays arrays out in
column major order. That means the leftmost index is the
one that changes most frequently when processing a multi
dimensional array in linear order. Once
the maximum dimension of the leftmost index is reached, set it back to 1, assuming 1 based
indexing, and increment the next level index by 1 and start the process over again.
To calculate the index configuration for any given address offset
you need to know the value of each of the 4 array dimensions. Without this
you can't do it.
Example:
Suppose your array has dimensions 2 by 3 by 4 by 5. This implies a
total of 2 * 3 * 4 * 5 = 120 cells in the matrix. You want the index corresponding
to the 200th byte.
This would be the (200 / 4) - 1 = 49th cell (this assumes 4 bytes per cell and offset zero
is the first cell).
First observe how specific indices translate into offsets...
What cell number does the element X(1,1,1,1) occur at? Simple answer: 1
What cell number does element X(1, 2, 1, 1) occur at? Since we cycled through
the leftmost dimension it must be that dimension plus 1. In other words,
2 + 1 = 3. How about X(1, 1, 2, 1)? We cycled trough the first two dimensions
which is 2 * 3 = 6 plus 1 to give us 7. Finally X(1, 1, 1, 2) must be:
2 * 3 * 4 = 24 plus 1 gives the 25th cell.
Notice that the next righmost index does not increment until the cell number
exceeds the product of the indices to its left. Using this observation you can
calculate the indices for any given cell number by working from the rightmost
index to the left most as follows:
Right most index increments every (2 * 3 * 4 = 24) cells. 24 goes into 49 (the cell number
we want to find the indexing for) twice
leaving 1 left over. Add 1 (for 1 based indexing) that gives us a rightmost
index value of 2 + 1 = 3. Next index (moving left) changes every (2 * 3 = 12) cells. One goes into 12
zero times, this gives us index 0 + 1 = 1. Next index changes every 2 cells. One goes into 2 zero
times giving an incex value of 1. For the last (leftmost index) just add 1 to whatever is
left over, 1 + 1 = 2. This gives us the following reference X(2, 1, 1, 2).
Double check by working it back to an offset:
((2 - 1) + ((1 - 1) * 2) + ((1 - 1) * 2 * 3) + ((3 - 1) * 2 * 3 * 4) = 49.
Just change the numbers and use the same process for any number of dimensions
and/or offsets.
Fortran has column-major order for arrays. This is described at http://en.wikipedia.org/wiki/Row-major_order#Column-major_order. Further down in that article there is the equation for the memory offset of a higher dimensional array.

Finding the maximum area in given binary data

I have a problem with describing algorithm for finding maximum rectangular area of binary data, where 1 occurs k-times more often than 0. Data is always n^2 bits like this:
For example data for n = 4 looks like:
1 0 1 0
0 0 1 1
0 1 1 1
1 1 0 1
Value of k can be 1 .. j (k = 1 means, that number of 0 and 1 is equal).
For above example of data and for k = 1 solution is:
1 0 1 0 <- 4 x '0' and 4 x '1'
0 0 1 1
0 1 1 1
1 1 0 1
But in this example:
1 1 1 0
0 1 0 0
0 0 0 0
0 1 1 1
Solution would be:
1 1 1 0
0 1 0 0
0 0 0 0
0 1 1 1
I tried with few brute force algorithms but for n > 20 it is getting too slow. Can you advise me how I should solve this problem?
As RBerteig proposed - the problem can be also described like that: "In a given square bitmap with cells set to 1 or 0 by some arbitrary process, find the largest rectangular area where the 1's and 0's occur in a specified ratio, k."
Bruteforce should do just fine here for n < 100, if properly implemented: solution below has O(n^4) time and O(n^2) memory complexity. 10^8 operations should be well under 1 second on modern PC (especially considering that each operation is very cheap: few additions and subtractions).
Some observations
There're O(n^4) sub-rectangles to consider and each of them can be a solution.
If we can find number of 1's and 0's in each sub-rectangle in O(1) (constant time), we'll solve problem in O(n^4) time.
If we know number of 1's in some sub-rectangle, we can find number of zeroes (through area).
So, the problem is reduced to following: create data structure allowing to find number of 1's in each sub-rectangle in constant time.
Now, imagine we have sub-rectangle [i0..i1]x[j0..j1]. I.e., it occupies rows between i0 and i1 and columns between j0 and j1. And let count_ones be the function to count number of 1's in subrectangle.
This is the main observation:
count_ones([i0..i1]x[j0..j1]) = count_ones([0..i1]x[0..j1]) - count_ones([0..i0 - 1]x[0..j1]) - count_ones([0..i1]x[0..j0 - 1]) + count_ones([0..i0 - 1]x[0..j0 - 1])
Same observation with practical example:
AAAABBB
AAAABBB
CCCCDDD
CCCCDDD
CCCCDDD
CCCCDDD
If we need to find number of 1's in D sub-rectangle (3x4), we can do it by taking number of 1's in the whole rectangle (A + B + C + D), subtracting number of 1's in (A + B) rectangle, subtracting number of 1's in (A + C) rectangle, and adding number of 1's in (A) rectangle. (A + B + C + D) - (A + B) - (A + C) + (A) = D
Thus, we need table sums, for each i and j containing number of 1's in sub-rectangle [0..i][0..j].
You can create this table in O(n^2), but even the direct way to fill it (for each i and j iterate all elements of [0..i][0..j] area) will be O(n^4).
Having this table,
count_ones([i0..i1]x[j0..j1]) = sums[i1][j1] - sums[i0 - 1][j1] - sums[i1][j0 - 1] + sums[i0 - 1][j0 - 1]
Therefore, time complexity O(n^4) reached.
This is still brute force, but something you should note is that you don't have to recompute everything from scratch for a new i*j rectangle. Instead, for each possible rectangle size, you can move the rectangle across the n*n grid one step at a time, decrementing the counts for the bits no longer within the rectangle and incrementing the counts for the bits that newly entered the rectangle. You could potentially combine this with varying the rectangle size, and try to find an optimal pattern for moving and resizing the rectangle.
Just some hints..
You could impose better restrictions on the values. The requirement leads to condition
N1*(k+1) == S*k, where N1 is number of ones in an area, and S=dx*dy is its surface.
It can be rewritten in better form:
N1/k == S/(k+1).
Because the greatest common divisor of numbers n and n+1 is always 1, then N1 have to be multiple of k and dx*dy to be multiple of k+1. It reduces greatly the possible space of solutions, the larger is k, the better (for dx*dy case you'll need to play with prime divisors of k+1).
Now, because you need just the surface of the largest area with such property, it would be wise to start from largest areas and move to smaller ones. By trying dx*dy from n^2 downto k+1 that would satisfy the divisor and the bounding conditions, you'll find quite fast the solution, muuuch faster than O(n^4), because of a special reason: except cases when the array was specially constructed, if we assume a random input, the probability that there are N1 ones out of S values in the (n-dx+1)*(n-dy+1) areas that have the surface S will constantly grow with decrease of S. (large values of k will make the probability smaller, but in the same time they will make the filter for dx and dy pairs stronger).
Also, this problem: http://ioinformatics.org/locations/ioi99/contest/land/land.shtml , looks somehow similar, maybe you'll find some ideas in their solution.

Resources