Let's say I have 3 arrays image, blur and out, all of dimensions M×N×3.
I want to compute the bilateral gradient of each pixel in the array image (current_pixel - (previous_previous + next_pixel) / 2) over x and y dimensions, divide it by some floats, then add the value of the corresponding pixel from the array blur and finally put the result into the array out.
My question is, in C, what is the most efficient way to do it (regarding the memory access speed and computing efficiency) :
One loop indexing the 3 arrays at once :
for (i = 0, j = 0, k = 0 ; i < M-1, j < N-1, k < 3 ; i++, j++, k++):
out[i][j][k] = (2 * image[i][j][k] - image[i+1][j][k] - image[i][j+1][k]) / 2. + lambda * blur[i][j][k]
Two loops indexing only two arrays :
for (i = 0, j = 0, k = 0 ; i < M-1, j < N-1, k < 3 ; i++, j++, k++):
out[i][j][k] = (2 * image[i][j][k] - image[i+1][j][k] - image[i][j+1][k]) / 2.
for (i = 0, j = 0, k = 0 ; i < M-1, j < N-1, k < 3 ; i++, j++, k++):
out[i][j][k] += lambda * blur[i][j][k]
(for readability, I only wrote a simple forward gradient, but the complete formula is given above).
Or is there another faster way ? I'm programming for x86_64 CPUs.
One loop indexing the 3 arrays at once will be slightly easier for compiler to optimize. But you can quite likely check it and tested it.
Related
Given two arrays A and B and an upper limit k, what will the most efficient way to compute the index pair (i, j) such that given,
s = A[i] + B[j]
s = max(A[a] + B[b]) for a = 0, 1, 2, .. , len(A)-1 and b = 0, 1, 2, .. , len(B)-1
and
s < k
For example,
Given,
A = [9,2,5]
B = [2,1,6]
k = 5
we get,
s = 2 + 2 = 4 < 5
and hence,
i = 1 and j = 0
So the output should be (1,0)
A straight-forward approach would be looping through all the elements of A and B but that would make the worst case time complexity O(nm) where n = len(A) and m = len(B).
Is there a better way to solve this problem?
This type of problems can be solved by sorting one of the array.
One Approach could be this ::
make an array temp of tuples such that each tuple will be (value,index) where value is item of B and index is its corresponding index in B.
Now, sort this temp array with respect to first item of tuple i.e, value.
iterate through array A and using Binary Search find the Lower bound of K - A[i] in temp array. let it be at index j.
Now there are two possibilities, either A[ i ] + temp[ j ][ 0 ] > = K or < k.
If it is greater than K, than check if j - 1 exists or not and update currentMaximum if possible because this pair can be max and at the same time less than k because we found lower bound.
If it is less than K, than update currentMaximum if possible.
If you need indices than whenever you update you currentMaximum, store i and j.
In this way you can find maximum sum of pairs such that it is less than K with original index as given in array B
If order of elements does not matter than, just sort B and do same steps on B instead of temp.
Time Complexity
For sorting = O( len(B) * Log(len(B)) )
for traversing A and doing Binary Search on B = O ( len(A) * Log (len(B))) i.e, O ( nlog(n))
You can use sort for A and B. Then you can use an early break once you are >= k. The function below returns indices, s.t. A[i] + B[j] < k and A[p] + B[q] < A[i] + B[j], for all p < i and for all q < j.
def sum_less_than_k(A, B, k):
i_max = -1
j_max = -1
s_max = -np.inf
for i, a in enumerate(A):
if a + B[0] >= k:
break
for j, b in enumerate(B):
if a + b >= k:
break
if a + b > s_max:
s_max = a + b
i_max = i
j_max = j
return i_max, j_max
A.sort()
B.sort()
i, j = sum_less_than_k(A, B, k)
I wrote the code for Saurab's suggestion as well which is way faster for large k relative to what's in the list. However, for rather short lists or small k the two for loops are faster according to some sample runs.
def sum_less_than_k(A, B, k):
i_max = j_max = -1
s_max = -np.inf
for i, a in enumerate(A):
j = bisect(B, k - a - 1)
if len(B) > j > -1 and k > A[i] + B[j] > s_max:
s_max = A[i] + B[j]
i_max = i
j_max = j
return i_max, j_max
B.sort()
i, j = sum_less_than_k(A, B, k)
For a given sequence of positive integers A1,A2,…,AN, you are supposed to find the number of triplets (i,j,k) such that Ai^Ai+1^..^Aj-1=Aj^Aj+1^..Ak
where ^ denotes bitwise XOR.
The link to the question is here: https://www.codechef.com/AUG19B/problems/KS1
All I did is try to find all subarrays with xor 0. The solution works but is quadratic time and thus too slow.
This is the solution that I managed to get to.
for (int i = 0; i < arr.length; i++) {
int xor = arr[i];
for (int j = i + 1; j < arr.length; j++) {
xor ^= arr[j];
if (xor == 0) {
ans += (j - i);
}
}
}
finAns.append(ans + "\n");
Here's an O(n) solution based on CiaPan's comment under the question description:
If xor of items at indices I through J-1 equals that from J to K, then xor from I to K equals zero. And for any such subarray [I .. K] every J between I+1 and K-1 makes a triplet satisfying the requirements. And xor from I to K equals (xor from 0 to K) xor (xor from 0 to I-1). So I suppose you might find xor-s of all possible initial parts of the sequence and look for equal pairs of them.
The function f is the main method. brute_force is used for validation.
Python 2.7 code:
import random
def brute_force(A):
res = 0
for i in xrange(len(A) - 1):
left = A[i]
for j in xrange(i + 1, len(A)):
if j > i + 1:
left ^= A[j - 1]
right = A[j]
for k in xrange(j, len(A)):
if k > j:
right ^= A[k]
if left == right:
res += 1
return res
def f(A):
ps = [A[0]] + [0] * (len(A) - 1)
for i in xrange(1, len(A)):
ps[i] = ps[i- 1] ^ A[i]
res = 0
seen = {0: (-1, 1, 0)}
for i in xrange(len(A)):
if ps[i] in seen:
prev_i, i_count, count = seen[ps[i]]
new_count = count + i_count * (i - prev_i) - 1
res += new_count
seen[ps[i]] = (i, i_count + 1, new_count)
else:
seen[ps[i]] = (i, 1, 0)
return res
for i in xrange(100):
A = [random.randint(1, 10) for x in xrange(200)]
f_A, brute_force_A = f(A), brute_force(A)
assert f_A == brute_force_A
print "Done"
I have the following code and i am trying to understand what is its time complexity:
for (int i = 1 ; i <= n ; i = i*2)
for (int j = 1 ; j <= n ; j = j*2)
for (int k = 1 ; k <= j ; k++)
What I did was:
the first loop runs log n times, the second loop also runs log n times and the third loop is a geometric series
so overall I have the running time will be: n*(log(n))^2
Is this correct?
thank you!
By theory you are correct the complexity is n*(log(n))^2.
For Practical Lets iterate for n=1000:
i = 1; n= 1000; j= 1;k =1; result = 0
while i<=n:
j=1
while j<=n:
k=1
while k<=j:
result = result+1
k = k+1
j = j*2
i = i*2
print(result)
and we get result = 10230
so the actual value of result we get using the floor(logn)+1) * (2 ^ floor(logn)+1) - 1) formula. For n=1000 it is 10*(2^10-1)
For n=2^25
we get result 1744830438 which also satisfies using the formula= 26*((2^26)-1) = 1744830438.
I was working on a function to transpose an NxN matrix which is stored in an array of floats. My first implementation seemed to cause the function to loop infinitely and I can't seem to figure out why. Here is the original code:
for(int i = 0; i < numRows % 2 == 0 ? numRows / 2 : numRows / 2 + 1; i++)
{
for(int j = i + 1; j < numColumns; j++)
{
//Swap [i,j]th element with [j,i]th element
}
}
However the function never returns. Failing to see the error in my logic I rephrased the expression and now have the following working code:
int middleRow = numRows % 2 == 0 ? numRows / 2 : numRows / 2 + 1;
for(int i = 0; i < middleRow; i++)
{
for(int j = i + 1; j < numColumns; j++)
{
//Swap [i,j]th element with [j,i]th element
}
}
Can anybody help explain why the first version does not work but the seemingly equivalent second version does?
As per the operator precedence table, < has higher priority over ?:. You need to use () as required explicitly to enforce the required priority.
Change
for(int i = 0; i < numRows % 2 == 0 ? numRows / 2 : numRows / 2 + 1; i++)
to
for(int i = 0; i < ( numRows % 2 == 0 ? numRows / 2 : numRows / 2 + 1) ; i++)
Note: Please use the second approach. Much, Much better in readability, maintenance and understanding.
I think there is a problem with the precedence of the operators.
If you want to keep the cluttered first version (which I don't recommend) use parenthesis:
i < (numRows % 2 == 0 ? numRows / 2 : numRows / 2 + 1)
Try:
i < ((numRows + 1) / 2)
If numRows is even, it will just be numRows/2. If odd, it will be numRows/2+1.
That will be faster and avoids branching due to the compare (unless you have a n excelent compiler which knows this pattern - unlikely.
You sometimes have to step back to see the whole picture.
The problem:
Larry is very bad at math - he usually uses a calculator, which worked well throughout college. Unforunately, he is now struck in a deserted island with his good buddy Ryan after a snowboarding accident. They're now trying to spend some time figuring out some good problems, and Ryan will eat Larry if he cannot answer, so his fate is up to you!
It's a very simple problem - given a number N, how many ways can K numbers less than N add up to N?
For example, for N = 20 and K = 2, there are 21 ways:
0+20
1+19
2+18
3+17
4+16
5+15
...
18+2
19+1
20+0
Input
Each line will contain a pair of numbers N and K. N and K will both be an integer from 1 to 100, inclusive. The input will terminate on 2 0's.
Output
Since Larry is only interested in the last few digits of the answer, for each pair of numbers N and K, print a single number mod 1,000,000 on a single line.
Sample Input
20 2
20 2
0 0
Sample Output
21
21
The solution code:
#include<iostream>
#include<stdlib.h>
#include<stdio.h>
using namespace std;
#define maxn 100
typedef long ss;
ss T[maxn+2][maxn+2];
void Gen() {
ss i, j;
for(i = 0; i<= maxn; i++)
T[1][i] = 1;
for(i = 2; i<= 100; i++) {
T[i][0] = 1;
for(j = 1; j <= 100; j++)
T[i][j] = (T[i][j-1] + T[i-1][j]) % 1000000;
}
}
int main() {
//freopen("in.txt", "r", stdin);
ss n, m;
Gen();
while(cin>>n>>m) {
if(!n && !m) break;
cout<<T[m][n]<<endl;
}
return 0;
}
How has this calculation been derived?
How has it come T[i][j] = (T[i][j-1] + T[i-1][j]) ?
Note: I only use n and k (lower case) to refer to some anonymous variable. I will always use N and K (upper case) to refer to N and K as defined in the question (sum and the number of portions).
Let C(n, k) be the result of n choose k, then the solution to the problem is C(N + K - 1, K - 1), with the assumption that those K numbers are non-negative (or there will be infinitely many solution even for N = 0 and K = 2).
Since the K numbers are non-negative, and the sum N is fixed, we can think of the problem as: how many ways to divide candy among K people. We can divide the candies, by lying them into a line, and put (K - 1) separator between the candies. The (K - 1) separators will divide the candies up to K portions of candies. Looking at another perspective, it is also like choosing (K - 1) positions among (N + K - 1) positions to put in the separators, then the rest of the positions are candies. So, this explains why the number of ways is N + (K - 1) choose (K - 1).
Then the problem reduce to how to find the least significant digits of C(n, k). (Since maximum of N and K is 100 as defined in maxn, we don't have to worry if the algorithm goes up to O(n3)).
The calculation uses this combinatorial identity C(n, k) = C(n - 1, k) + C(n, k - 1) (Pascal's rule). The clever thing about the implementation is that it doesn't store C(n, k) (table of result of combination, which is a jagged array), but it stores C(N, K) instead. The identity is actually present in the T[i][j] = (T[i][j-1] + T[i-1][j]):
The first dimension is actually K, the number of portions. And the second dimension is the sum N. T[K][N] will directly store the result, and according to the mathematical result derived above, is (least significant digits of) C(N + K - 1, K - 1).
Re-writing the T[i][j] = (T[i][j-1] + T[i-1][j]) back to equivalent mathematical result:
C(i + j - 1, i - 1) = C(i + j - 2, i - 1) + C(i + j - 2, i - 2), which is correct according to the identity.
The program will fill the array row by row:
The row K = 0 is already initialized to 0, using the fact that static array is initialized to 0.
It fills the row K = 1 with 1 (there is only 1 way to divide N into 1 portion).
For the rest of the rows, it sets the case N = 0 to 1 (there is only 1 way to divide 0 into K parts - all parts are 0).
Then the rest are filled with the expression T[i][j] = (T[i][j-1] + T[i-1][j]), which will refer to the previous row, and the previous element of the same row, both of which has been filled up in earlier iterations.
Let C(x, y) to be the result of x choose y, then the value of T[i][j] equals: C(i - 1 + j, j).
You can proove this by induction.
Base cases:
T[1][j] = C(1 - 1 + j, j) = C(j, j) = 1
T[i][0] = C(i - 1, 0) = 1
For the induction step, use the formula (for 0<=y<=x):
C(x,y) = C(x - 1, y - 1) + C(x - 1, y)
Therefore:
C(i - 1 + j, j) = C(i-1+j - 1, j - 1) + C(i-1+j - 1, j) = C(i-1+(j-1), (j-1)) + C((i-1)-1+j, j)
Or in other words:
T[i][j] = T[i,j-1] + T[i-1,j]
Now, as nhahtdh mentioned before, the value you are looking for is C(N + K - 1, K - 1)
which equals:
T[N+1][K-1] = C(N+1-1+K-1, K-1)
(modulo 1000000)
This is a famous problem - you can check solution here
How many ways to drop N identical balls to K boxes.
The following algorithm is a dynamic-programming solution to your problem:
Define D[i,j] to be the number of ways i numbers less than j, can sum up to j.
0 <= i < = N
1 <= j <= K
Where D[j,1] = 1 for every j.
And where j > 1 you get:
D[i,j] = D[i,j-1] + D[i-1,j-1] +...+ D[0,j-1]
The problem is known as "the integer partition problem". Basically there exists a recursive computation of the k-partition of n, but your solution is just the dynamic programming version of it (non-recursive and computing bottom-up for short).