Given an NxNxN binary array (containing only 0's or 1's), how can we obtain the largest cuboid with a non-trivial solution i.e. in O(N^3) ?
--
It is the same problem that Find largest rectangle containing only zeros in an N×N binary matrix but in an upper dimension.
Also, in my case, the largest rectangle can "cross the edge" of the array i.e. the space is like a torus for a 2D matrix.
For a 2D array, if the entry is :
00111
00111
11000
00000
00111
the solution depicted by 'X' is
00XXX
00XXX
11000
00000
00XXX
I've done the computation for a NxN binary array and find a solution for the largest rectangle problem in O(N^2) by following the idea in http://tech-queries.blogspot.de/2011/03/maximum-area-rectangle-in-histogram.html.
But I don't know how to apply it for a 3D array.
--
Example for a 3x3x3 array where the solution "cross the edge":
111
100
011
111
001
111
011
110
011
the solution should be:
1XX
100
0XX
1XX
001
1XX
0XX
110
0XX
This solution has O(N3 log2 N) complexity (may be optimized to O(N3 log N)). Additional integer array of size 2*8*N3 will be needed.
Compute r(i,j,k): for each of the N2 rows, compute cumulative sum of all non-zero elements, resetting it when a zero element is found.
Perform the following steps for various values of K, using Golden section search (or Fibonacci search) to find the maximum result.
Compute c(i,j,k): for each of the N2 columns, compute cumulative sum of all elements with r(i,j,k) >= K, resetting it when an element with r(i,j,k) < K is found. For good visualization of steps 1 and 2, see this answer.
Perform the last step for various values of M, using Golden section search to find the maximum result.
Compute sum: for each of the N2 values of 3rd coordinate, compute cumulative sum of all elements with c(i,j,k) >= M, resetting it when an element with c(i,j,k) < M is found. Calculate sumKM and update the best so far result if necessary.
"cross the edge" property of the array is handled in obvious way: iterate every index twice and keep all cumulative sums not larger than N.
For multidimensional case, this algorithm has O(ND logD-1 N) time complexity and O(D*ND) space complexity.
Optimization to O(N3 log N)
Step 4 of the algorithm sets a global value for M. This step may be excluded (and complexity decreased by log N) if value for M is determined locally.
To do this, step 5 should be improved. It should maintain a double-ended queue (head of which contains local value of M) and a stack (keeping starting positions for all values of M, evicted from the queue).
While c(i,j,k) increases, it is appended to the tail of the queue.
If c(i,j,k) decreases, all larger values are removed from the queue's tail. If it decreases further (queue is empty), stack is used to restore 'sum' value and put corresponding 'M' value to the queue.
Then several elements may be removed from the head of the queue (and pushed to the stack) if this allows to increase local solution's value.
For multidimensional case, this optimization gives O(ND logD-2 N) complexity.
Here is only O(N^4).
Lets assume you are storing cubiod in bool cuboid[N][N][N];
bool array2d[N][N];
for(int x_min = 0; x_min < N; x_min++) {
//initializing array2d
for(int y = 0; y < N; y++) {
for(int z = 0; z < N; z++) {
array2d[y][z] = true;
}
}
//computation
for(int x_max = x_min; x_max < N; x_max++) {
// now we want to find largest cube that
// X coordinates are equal to x_min and x_max
// cells at y,z can be used in cube if and only if
// there are only 1's in cuboid[x][y][z] where x_min <= x <= x_max
// so lets compute for each cell in array2d,
// if are only 1's in cuboid[x][y][z] where x_min <= x <= x_max
for(int y = 0; y < N; y++) {
for(int z = 0; z < N; z++) {
array2d[y][z] &= cubiod[x_max][y][z];
}
}
//you already know how to find largest rectangle in 2d in O(N^2)
local_volume = (x_max - x_min + 1) * find_largest_area(array2d);
largest_volume = max(largest_volumne, local_volume);
}
}
You can use the same trick, to compute best solution in X dimentions. Just reduce the problem to X-1 dimensions. Complexity: O(N^(2*X-2)).
Related
I wrote the following quick-select randomize algorithm that moves the smallest k elements of an array to the beginning of it in linear time (technically worst case O(n^2) but the probability drops exponentially):
// This function moves the smallest k elements of the array to
// the beginning of it in time O(n).
void moveKSmallestValuesToTheLeft( double arr[] ,
unsigned int n ,
unsigned int k )
{
int l = 0, r = n - 1; //Begginning and end indices of the array
while (0 < k && k < n && n > 10)
{
unsigned int partition_index, left_size, pivot;
//Partition the data around a random pivot
pivot = generatePivot(arr, l, n, k); //explained later
partition_index = partition(arr, l, r, pivot); //standard quick sort partition
left_size = partition_index - l + 1;
if (k < left_size)
{
//Continue with left subarray
r = partition_index - 1;
n = partition_index - l;
}
else
{
//Continue with right subarray
l += left_size;
n -= left_size;
k -= left_size;
}
}
if (n <= 10)
insertionSort(arr + l, n);
}
And I tested 3 different methods for generating pivot all of them are based on selecting 5 random candidates and returning one of them, for each method I ran the code 100,000. These were the methods:
Choose random 5 elements and return their median
Choose random 5 elements, calculate k/n and check which element of the 5 is closest to it. I.e, if k/n <= 1/5 return the min if k/n <= 2/5 return the second smallest value, if k/n <= 3/5 return the median and so on.
Exactly the same as method 2 but we give more weight to the pivots closer to the median based on the binomial coefficients of them, i.e. I calculated the binomial coefficients for n=5-1 and got [1 4 6 4 1] then I normalized them and calculated their cum-sum and got [0.0625 0.3125, 0.6875 0.9375 1] and then I did: If k/n <= 0.0625 return the min, if k/n <= 0.3125 return the second smallest value, if k/n <= 0.6875 return the median and so on...
My intuition told me that method 2 would perform the best because it always chooses the pivot that would most likely be closest to the k'th smallest element and therefore would probably decrease k or n the most at each iteration, but instead every time I ran the code I got the following results (ranked fastest method to slowest method based an average and worst case times):
Average running time:
First place (fastest): Method 3
Second place: Method 2
Last place: Method 1
Worst case running time:
First place (fastest): Method 1
Second place: Method 3
Last place: Method 2
My question is is there any mathematical way to explain these results or at least give some intuition to them? Because my intuition was completely wrong method 2 didn't outperform neither of the other 2 methods.
EDIT
So apparently the problem was that I only tested k=n/2 which is an edge case so I got this weird results.
I am stuck at a simple problem, I am looking for a better solution than my.
I have an integers matrix array (tab[N][M]) and integer (k) and I have to find the smallest rectangle (sub matrix array) that has sum of it's elements greater then k
So, my current attempt of a solution is:
Make additional matrix array (sum[N][M]) and integer solution = infinity
For each 1 < i <= N + 1 and 1 < j <= M + 1
sum[ i ][ j ] = sum[ i - 1 ][ j ] + sum [ i ][ j - 1] + tab[ i ] [ j ] - sum[ i - 1] [ j - 1]
Then look on each rectangle f.e rectangle that starts at (x, y) and ends (a, b)
Rectangle_(x,y)_(a,b) = sum[ a ][ b ] - sum[ a - x ] [ b ] - sum[ a ][ b - y ] + sum[ a - x ][ b - y ]
and if Rectangle_(x,y)_(a,b) >= k then solution = minimum of current_solution and (a - x) * (b - y)
But this solution is quite slow (quartic time), is there any possibility to make it faster? I am looking for iterated logarithmic time (or worse/better). I managed to reduce my time , but not substantially.
If the matrix only contains values >= 0, then there is a linear time solution in the 1D case that can be extended to a cubic time solution in the 2D case.
For the 1D case, you do a single pass from left to right, sliding a window across the array, stretching or shrinking it as you go so that the numbers contained in the interval always sum to at least k (or breaking out of the loop if this is not possible).
Initially, set the left index bound of the interval to the first element, and the right index bound to -1, then in a loop:
Increment the right bound by 1, and then keep incrementing it until either the values inside the interval sum to > k, or end of the array is reached.
Increment the left bound to shrink the interval as small as possible without letting the values sum to less than or equal to k.
If the result is a valid interval (meaning the first step did not reach the end of the array without finding a valid interval) then compare it to the smallest so far and update if necessary.
This doesn't work if negative values are allowed, because in the second step you need to be able to assume that shrinking the interval always leads to a smaller sum, so when the sum dips below k you know that's the smallest possible for a given interval endpoint.
For the 2D case, you can iterate over all possible sub-matrix heights, and over each possible starting row for a given height, and perform this horizontal sweep for each row.
In pseudo-code:
Assume you have a function rectangle_sum(x, y, a, b) that returns the sum of the values from (x, y) to (a, b) inclusive and runs in O(1) time used a summed area table.
for(height = 1; height <= M; height++) // iterate over submatrix heights
{
for(row = 0; row <= (M-h); row++) // iterate over all rows
{
start = 0; end = -1; // initialize interval
while(end < N) // iterate across the row
{
valid_interval = false;
// increment end until the interval sums to > k:
while(end < (N-1))
{
end = end + 1;
if(rectangle_sum(start, row, end, row + height) > k)
{
valid_interval = true;
break;
}
}
if(!valid_interval)
break;
// shrink interval by incrementing start:
while((start < end) &&
rectangle_sum(start+1, row, end, row + height) > k))
start = start + 1;
compare (start, row), (end, row + height) with current smallest
submatrix and make it the new current if it is smaller
}
}
}
I have seen a number of answers to matrix rectangle problems here which worked by solving a similar 1-dimensional problem and then applying this to every row of the matrix, every row formed by taking the sum of two adjacent rows, every sum from three adjacent rows, and so on. So here's an attempt at finding the smallest interval in a line which has at least a given sum. (Clearly, if your matrix is tall and thin instead of short and fat you would work with columns instead of rows)
Work from left to right, maintaining the sums of all prefixes of the values seen so far, up to the current position. The value of an interval ending in a position is the sum up to and including that position, minus the sum of a prefix which ends just before the interval starts. So if you keep a list of the prefix sums up to just before the current position you can find, at each point, the shortest interval ending at that point which passes your threshold. I'll explain how to search for this efficiently in the next paragraph.
In fact, you probably don't need a list of all prefix sums. Smaller prefix sums are more valuable, and prefix sums which end further along are more valuable. So any prefix sum which ends before another prefix sum and is also larger than that other prefix sum is pointless. So the prefix sums you want can be arranged into a list which retains the order in which they were calculated but also has the property that each prefix sum is smaller than the prefix sum to the right of it. This means that when you want to find the closest prefix sum which is at most a given value you can do this by binary search. It also means that when you calculate a new prefix sum you can put it into its place in the list by just discarding all prefix sums at the right hand end of the list which are larger than it, or equal to it.
So, I wanted to have some fun with graphs and now it's driving me crazy.
First, I generate a connected graph with a given number of edges. This is the easy part, which became my curse. Basically, it works as intended, but the results I'm getting are quite bizarre (well, maybe they're not, and I'm the issue here). The algorithm for generating the graph is fairly simple.
I have two arrays, one of them is filled with numbers from 0 to n - 1, and the other is empty.
At the beginning I shuffle the first one move its last element to the empty one.
Then, in a loop, I'm creating an edge between the last element of the first array and a random element from the second one and after that I, again, move the last element from the first array to the other one.
After that part is done, I have to create random edges between the vertexes until I get as many as I need. This is, again, very easy. I just random two numbers in the range from 0 to n - 1 and if there is no edge between these vertexes, I create one.
This is the code:
void generate(int n, double d) {
initMatrix(n); // <- creates an adjacency matrix n x n, filled with 0s
int *array1 = malloc(n * sizeof(int));
int *array2 = malloc(n * sizeof(int));
int j = n - 1, k = 0;
for (int i = 0; i < n; ++i) {
array1[i] = i;
array2[i] = 0;
}
shuffle(array1, 0, n); // <- Fisher-Yates shuffle
array2[k++] = array1[j--];
int edges = d * n * (n - 1) * .5;
if (edges % 2) {
++edges;
}
while (j >= 0) {
int r = rand() % k;
createEdge(array1[j], array2[r]);
array2[k++] = array1[j--];
--edges;
}
free(array1);
free(array2);
while (edges) {
int a = rand() % n;
int b = rand() % n;
if (a == b || checkEdge(a, b)) {
continue;
}
createEdge(a, b);
--edges;
}
}
Now, if I print it out, it's a fine graph. Then I want to find a Hammiltonian cycle. This part works. Then I get to my bane - Eulerian cycle. What's the problem?
Well, first I check if all vertexes are even. And they are not. Always. Every single time, unless I choose to generate a complete graph.
I now feel destroyed by my own code. Is something wrong? Or is it supposed to be like this? I knew that Eulerian circuits would be rare, but not that rare. Please, help.
Let's analyze the probability for having euleran cycle, and for simplicity - let's do it for all graphs with n vertices, no matter number of edges.
Given a graph G of size n, choose one arbitrary vertex. The probability of it's degree being even is roughly 1/2 (assuming for each u1,u2, P((v,u1) exists) = P((v,u2) exists)).
Now, remove v from G, and create a new graph G' with n-1 vertices, and without all edges connected to v.
Similarly, for any arbitrary vertex v' in G' - if (v,v') was an edge on G', we need d(v') to be odd. Otherwise, we need d(v') to be even (both in G'). Either way, probability of it is still roughly ~1/2. (independent from previous degree of v).
....
For the ith round, let #(v) be the number of discarded edges until reaching the current graph that are connected to v. If #(v) is odd, the probability of its current degree being odd is ~1/2, and if #(v) is even, the probability of its current degree being even is also ~1/2, and we remain with current probability of ~1/2
We can now understand how it works, and make a recurrence formula for the probability of the graph being eulerian cyclic:
P(n) ~= 1/2*P(n-1)
P(1) = 1
This is going to give us P(n) ~= 2^-n, which is very unlikely for reasonable n.
Note, 1/2 is just a rough estimation (and is correct when n->infinity), probability is in fact a bit higher, but it is still exponential in -n - which makes it very unlikely for reasonable size graphs.
If n numbers are given, how would I find the total number of possible triangles? Is there any method that does this in less than O(n^3) time?
I am considering a+b>c, b+c>a and a+c>b conditions for being a triangle.
Assume there is no equal numbers in given n and it's allowed to use one number more than once. For example, we given a numbers {1,2,3}, so we can create 7 triangles:
1 1 1
1 2 2
1 3 3
2 2 2
2 2 3
2 3 3
3 3 3
If any of those assumptions isn't true, it's easy to modify algorithm.
Here I present algorithm which takes O(n^2) time in worst case:
Sort numbers (ascending order).
We will take triples ai <= aj <= ak, such that i <= j <= k.
For each i, j you need to find largest k that satisfy ak <= ai + aj. Then all triples (ai,aj,al) j <= l <= k is triangle (because ak >= aj >= ai we can only violate ak < a i+ aj).
Consider two pairs (i, j1) and (i, j2) j1 <= j2. It's easy to see that k2 (found on step 2 for (i, j2)) >= k1 (found one step 2 for (i, j1)). It means that if you iterate for j, and you only need to check numbers starting from previous k. So it gives you O(n) time complexity for each particular i, which implies O(n^2) for whole algorithm.
C++ source code:
int Solve(int* a, int n)
{
int answer = 0;
std::sort(a, a + n);
for (int i = 0; i < n; ++i)
{
int k = i;
for (int j = i; j < n; ++j)
{
while (n > k && a[i] + a[j] > a[k])
++k;
answer += k - j;
}
}
return answer;
}
Update for downvoters:
This definitely is O(n^2)! Please read carefully "An Introduction of Algorithms" by Thomas H. Cormen chapter about Amortized Analysis (17.2 in second edition).
Finding complexity by counting nested loops is completely wrong sometimes.
Here I try to explain it as simple as I could. Let's fix i variable. Then for that i we must iterate j from i to n (it means O(n) operation) and internal while loop iterate k from i to n (it also means O(n) operation). Note: I don't start while loop from the beginning for each j. We also need to do it for each i from 0 to n. So it gives us n * (O(n) + O(n)) = O(n^2).
There is a simple algorithm in O(n^2*logn).
Assume you want all triangles as triples (a, b, c) where a <= b <= c.
There are 3 triangle inequalities but only a + b > c suffices (others then hold trivially).
And now:
Sort the sequence in O(n * logn), e.g. by merge-sort.
For each pair (a, b), a <= b the remaining value c needs to be at least b and less than a + b.
So you need to count the number of items in the interval [b, a+b).
This can be simply done by binary-searching a+b (O(logn)) and counting the number of items in [b,a+b) for every possibility which is b-a.
All together O(n * logn + n^2 * logn) which is O(n^2 * logn). Hope this helps.
If you use a binary sort, that's O(n-log(n)), right? Keep your binary tree handy, and for each pair (a,b) where a b and c < (a+b).
Let a, b and c be three sides. The below condition must hold for a triangle (Sum of two sides is greater than the third side)
i) a + b > c
ii) b + c > a
iii) a + c > b
Following are steps to count triangle.
Sort the array in non-decreasing order.
Initialize two pointers ‘i’ and ‘j’ to first and second elements respectively, and initialize count of triangles as 0.
Fix ‘i’ and ‘j’ and find the rightmost index ‘k’ (or largest ‘arr[k]‘) such that ‘arr[i] + arr[j] > arr[k]‘. The number of triangles that can be formed with ‘arr[i]‘ and ‘arr[j]‘ as two sides is ‘k – j’. Add ‘k – j’ to count of triangles.
Let us consider ‘arr[i]‘ as ‘a’, ‘arr[j]‘ as b and all elements between ‘arr[j+1]‘ and ‘arr[k]‘ as ‘c’. The above mentioned conditions (ii) and (iii) are satisfied because ‘arr[i] < arr[j] < arr[k]'. And we check for condition (i) when we pick 'k'
4.Increment ‘j’ to fix the second element again.
Note that in step 3, we can use the previous value of ‘k’. The reason is simple, if we know that the value of ‘arr[i] + arr[j-1]‘ is greater than ‘arr[k]‘, then we can say ‘arr[i] + arr[j]‘ will also be greater than ‘arr[k]‘, because the array is sorted in increasing order.
5.If ‘j’ has reached end, then increment ‘i’. Initialize ‘j’ as ‘i + 1′, ‘k’ as ‘i+2′ and repeat the steps 3 and 4.
Time Complexity: O(n^2).
The time complexity looks more because of 3 nested loops. If we take a closer look at the algorithm, we observe that k is initialized only once in the outermost loop. The innermost loop executes at most O(n) time for every iteration of outer most loop, because k starts from i+2 and goes upto n for all values of j. Therefore, the time complexity is O(n^2).
I have worked out an algorithm that runs in O(n^2 lgn) time. I think its correct...
The code is wtitten in C++...
int Search_Closest(A,p,q,n) /*Returns the index of the element closest to n in array
A[p..q]*/
{
if(p<q)
{
int r = (p+q)/2;
if(n==A[r])
return r;
if(p==r)
return r;
if(n<A[r])
Search_Closest(A,p,r,n);
else
Search_Closest(A,r,q,n);
}
else
return p;
}
int no_of_triangles(A,p,q) /*Returns the no of triangles possible in A[p..q]*/
{
int sum = 0;
Quicksort(A,p,q); //Sorts the array A[p..q] in O(nlgn) expected case time
for(int i=p;i<=q;i++)
for(int j =i+1;j<=q;j++)
{
int c = A[i]+A[j];
int k = Search_Closest(A,j,q,c);
/* no of triangles formed with A[i] and A[j] as two sides is (k+1)-2 if A[k] is small or equal to c else its (k+1)-3. As index starts from zero we need to add 1 to the value*/
if(A[k]>c)
sum+=k-2;
else
sum+=k-1;
}
return sum;
}
Hope it helps........
possible answer
Although we can use binary search to find the value of 'k' hence improve time complexity!
N0,N1,N2,...Nn-1
sort
X0,X1,X2,...Xn-1 as X0>=X1>=X2>=...>=Xn-1
choice X0(to Xn-3) and choice form rest two item x1...
choice case of (X0,X1,X2)
check(X0<X1+X2)
OK is find and continue
NG is skip choice rest
It seems there is no algorithm better than O(n^3). In the worst case, the result set itself has O(n^3) elements.
For Example, if n equal numbers are given, the algorithm has to return n*(n-1)*(n-2) results.
Given an Array arr of size 100000, each element 0 <= arr[i] < 100. (not sorted, contains duplicates)
Find out how many triplets (i,j,k) are present such that arr[i] ^ arr[j] ^ arr[k] == 0
Note : ^ is the Xor operator. also 0 <= i <= j <= k <= 100000
I have a feeling i have to calculate the frequencies and do some calculation using the frequency, but i just can't seem to get started.
Any algorithm better than the obvious O(n^3) is welcome. :)
It's not homework. :)
I think the key is you don't need to identify the i,j,k, just count how many.
Initialise an array size 100
Loop though arr, counting how many of each value there are - O(n)
Loop through non-zero elements of the the small array, working out what triples meet the condition - assume the counts of the three numbers involved are A, B, C - the number of combinations in the original arr is (A+B+C)/!A!B!C! - 100**3 operations, but that's still O(1) assuming the 100 is a fixed value.
So, O(n).
Possible O(n^2) solution, if it works: Maintain variable count and two arrays, single[100] and pair[100]. Iterate the arr, and for each element of value n:
update count: count += pair[n]
update pair: iterate array single and for each element of index x and value s != 0 do pair[s^n] += single[x]
update single: single[n]++
In the end count holds the result.
Possible O(100 * n) = O(n) solution.
it solve problem i <= j <= k.
As you know A ^ B = 0 <=> A = B, so
long long calcTripletsCount( const vector<int>& sourceArray )
{
long long res = 0;
vector<int> count(128);
vector<int> countPairs(128);
for(int i = 0; i < sourceArray.size(); i++)
{
count[sourceArray[i]]++; // count[t] contain count of element t in (sourceArray[0]..sourceArray[i])
for(int j = 0; j < count.size(); j++)
countPairs[j ^ sourceArray[i]] += count[j]; // countPairs[t] contain count of pairs p1, p2 (p1 <= p2 for keeping order) where t = sourceArray[i] ^ sourceArray[j]
res += countPairs[sourceArray[i]]; // a ^ b ^ c = 0 if a ^ b = c, we add count of pairs (p1, p2) where sourceArray[p1] ^ sourceArray[p2] = sourceArray[i]. it easy to see that we keep order(p1 <= p2 <= i)
}
return res;
}
Sorry for my bad English...
I have a (simple) O(n^2 log n) solution which takes into account the fact that i, j and k refer to indices, not integers.
A simple first pass allow us to build an array A of 100 values: values -> list of indices, we keep the list sorted for later use. O(n log n)
For each pair i,j such that i <= j, we compute X = arr[i]^arr[j]. We then perform a binary search in A[X] to locate the number of indices k such that k >= j. O(n^2 log n)
I could not find any way to leverage sorting / counting algorithm because they annihilate the index requirement.
Sort the array, keeping a map of new indices to originals. O(nlgn)
Loop over i,j:i<j. O(n^2)
Calculate x = arr[i] ^ arr[j]
Since x ^ arr[k] == 0, arr[k] = x, so binary search k>j for x. O(lgn)
For all found k, print mapped i,j,k
O(n^2 lgn)
Start with a frequency count of the number of occurrences of each number between 1 and 100, as Paul suggests. This produces an array freq[] of length 100.
Next, instead of looping over triples A,B,C from that array and testing the condition A^B^C=0,
loop over pairs A,B with A < B. For each A,B, calculate C=A^B (so that now A^B^C=0), and verify that A < B < C < 100. (Any triple will occur in some order, so this doesn't miss triples. But see below). The running total will look like:
Sum+=freq[A]*freq[B]*freq[C]
The work is O(n) for the frequency count, plus about 5000 for the loop over A < B.
Since every triple of three different numbers A,B,C must occur in some order, this finds each such triple exactly once. Next you'll have to look for triples in which two numbers are equal. But if two numbers are equal and the xor of three of them is 0, the third number must be zero. So this amounts to a secondary linear search for B over the frequency count array, counting occurrences of (A=0, B=C < 100). (Be very careful with this case, and especially careful with the case B=0. The count is not just freq[B] ** 2 or freq[0] ** 3. There is a little combinatorics problem hiding there.)
Hope this helps!