I have a matrix M (4*2) with values:
[1 0;
0 0;
1 1;
0 1]
And an array X = [0.3 0.4 0.5 0.2];
All column entries of M are binary (0/1). What I want is a corresponding row value mapped into a ND-array of [2,2] called Z. Each dimension here indicates 0/1, by having it in the first row or second row. X(1) needs to go to Z(2,1) and X(2) needs to go to Z(1,1) and so on..
Z will look like this:
Z =
[0.4 0.2;
0.3 0.5];
Currently I am looping over this, but it is really expensive to do so. Please note that this is a minimal example - I need to do this for a 128*7 matrix into a 7D array.
Any suggestions on how to speed up this process?
You could try using accumarray (not sure if it's faster):
>> Z = accumarray(M + 1, X, [2 2])
Z =
0.4000 0.2000
0.2000 0.5000
How about
D=[1 0;0 0;1 1;0 1]; X = [0.3 0.4 0.5 0.2];
Z(sub2ind([2 2], D(:, 1) + 1, D(:, 2) + 1)) = X;
Z = reshape(Z, 2, 2);
EDIT
Most of the overhead of sub2ind is, unfortunately, error checking. If you are comfortable that all your D values are in range, you can effectively inline the sub2ind operation. Here's an example:
ndx = M(:, 1) + 1;
ndx = ndx + M(:, 2) * 2;
Z2=[];
Z2(ndx) = X;
Z2 = reshape(Z2, 2, 2);
Using this code and the timing test in #johnnyfuego's answer, I get
Elapsed time is 0.154196 seconds. <--- johnny's
Elapsed time is 0.288680 seconds. <--- mine
Elapsed time is 0.143874 seconds.
So, better, but still not beating the other two. However, note that there is a break-even point here. If I change the setup code in the time test to
M = randi(2, 1000, 2) - 1;
X = rand(1, 1000);
That is, I bump the number of values to write up from 4 to 1000, then the time trial results in
Elapsed time is 3.650833 seconds. <--- johnny's
Elapsed time is 0.607361 seconds. <--- mine
Elapsed time is 0.872595 seconds.
EDIT #2
Here's how you would unroll a multidimensional sub2ind:
siz = [2 2 2 2 2 2 2];
offsets = cumprod(siz);
ndx = M(:, 1) + 1;
ndx = ndx + M(:, 2) * offsets(1);
ndx = ndx + M(:, 3) * offsets(2);
ndx = ndx + M(:, 4) * offsets(3);
ndx = ndx + M(:, 5) * offsets(4);
ndx = ndx + M(:, 6) * offsets(5);
ndx = ndx + M(:, 7) * offsets(6);
Z2=[];
Z2(ndx) = X;
Z2 = reshape(Z2, [siz]);
Using that with your updated time test I get:
Elapsed time is 43.754363 seconds.
Elapsed time is 1.045980 seconds.
Elapsed time is 0.689487 seconds.
So, still better than looping, but it looks like in this multidimensional case accumarray (Rafael's answer) wins. I'd consider awarding him the "accepted answer" points.
Thank you #rafael-monteiro & #SCFRench.
My original procedure was faster. I have pasted a benchmarking script below.
M=[1 0; 0 0; 1 1; 0 1];
X = [0.3 0.4 0.5 0.2];
nrep=50000;
%% My own code
tic
for A=1:nrep;
MN=M+1; % I know I can do this outside of the loop, but comparison with this seems more fair.
Z=zeros(size(X,2)/2,size(X,2)/2); % without pre-allocation it is twice as fast, I guess finding the size and the computation does not help here!
for I=1:4
Z1(MN(I,1),MN(I,2))=X(I);
end
end
toc
%% SCFrench code
tic
for A=1:nrep;
Z2(sub2ind([2 2], M(:, 1) + 1, M(:, 2) + 1)) = X;
Z2 = reshape(Z2, 2, 2);
end
toc
%% Rafael code
tic
for A=1:nrep;
Z3 = accumarray(M + 1, X, [2 2]);
end
toc
Elapsed time is 0.115488 seconds. % mine
Elapsed time is 1.082505 seconds. % SCFrench
Elapsed time is 0.282693 seconds. % rafael
EDIT:
Using larger data, the first implementation seems far slower.
alts=7;
M = dec2bin(0:2^alts-1)-'0';
X = rand(size(M,1),1);
nrep=50000;
tic
for A=1:nrep;
MN=M+1;
for I=1:128
Z1(MN(I,1),MN(I,2),MN(I,3),MN(I,4),MN(I,5),MN(I,6),MN(I,7))=X(I);
end
end
toc
tic
for A=1:nrep;
Z2(sub2ind([2 2 2 2 2 2 2], M(:, 1) + 1, M(:, 2) + 1, M(:, 3) + 1, M(:, 4) + 1, M(:, 5) + 1, M(:, 6) + 1, M(:, 7) + 1)) = X;
Z2 = reshape(Z2, [2 2 2 2 2 2 2]);
end
toc
tic
for A=1:nrep;
Z3 = accumarray(M + 1, X, [2 2 2 2 2 2 2]);
end
toc
Elapsed time is 33.390247 seconds. % Mine
Elapsed time is 4.280668 seconds. % SCFrench
Elapsed time is 0.629584 seconds. % Rafael
Related
Given an array {1,3,5,7}, its subparts are defined as {1357,135,137,157,357,13,15,17,35,37,57,1,3,5,7}.
I have to find the sum of all these numbers in the new array. In this case sum comes out to be 2333.
Please help me find a solution in O(n). My O(n^2) solution times out.
link to the problem is here or here.
My current attempt( at finding a pattern) is
for(I=0 to len) //len is length of the array
{
for(j=0 to len-i)
{
sum+= arr[I]*pow(10,j)*((len-i) C i)*pow(2,i)
}
}
In words - len-i C i = (number of integers to right) C weight. (combinations {from permutation and combination})
2^i = 2 power (number of integers to left)
Thanks
You can easily solve this problem with a simple recursive.
def F(arr):
if len(arr) == 1:
return (arr[0], 1)
else:
r = F(arr[:-1])
return (11 * r[0] + (r[1] + 1) * arr[-1], 2 * r[1] + 1)
So, how does it work? It is simple. Let say we want to compute the sum of all subpart of {1,3,5,7}. Let assume that we know the number of combinatiton of {1,3,5} and the sum of subpart of {1,3,5} and we can easily compute the {1,3,5,7} using the following formula:
SUM_SUBPART({1,3,5,7}) = 11 * SUM_SUBPART({1,3,5}) + NUMBER_COMBINATION({1,3,5}) * 7 + 7
This formula can easily be derived by observing. Let say we have all combination of {1,3,5}
A = [135, 13, 15, 35, 1, 3, 5]
We can easily create a list of {1,3,5,7} by
A = [135, 13, 15, 35, 1, 3, 5] +
[135 * 10 + 7,
13 * 10 + 7,
15 * 10 + 7,
35 * 10 + 7,
1 * 10 + 7,
3 * 10 + 7,
5 * 10 + 7] + [7]
Well, you could look at at the subparts as sums of numbers:
1357 = 1000*1 + 100*3 + 10*5 + 1*7
135 = 100*1 + 10*3 + 1*5
137 = 100*1 + 10*3 + 1*7
etc..
So, all you need to do is sum up the numbers you have, and then according to the number of items work out what is the multiplier:
Two numbers [x, y]:
[x, y, 10x+y, 10y+x]
=> your multiplier is 1 + 10 + 1 = 12
Three numbers [x, y, z]:
[x, y, z,
10x+y, 10x+z,
10y+x, 10y+z,
10z+x, 10z+y,
100x+10y+z, 100x10z+y
.
. ]
=> you multiplier is 1+10+10+1+1+100+100+10+10+1+1=245
You can easily work out the equation for n numbers....
If you expand invisal's recursive solution you get this explicit formula:
subpart sum = sum for k=0 to N-1: 11^(N-k) * 2^k * a[k]
This suggests the following O(n) algorithm:
multiplier = 1
for k from 0 to N-1:
a[k] = a[k]*multiplier
multiplier = multiplier*2
multiplier = 1
sum = 0
for k from N-1 to 0:
sum = sum + a[k]*multiplier
multiplier = multiplier*11
Multiplication and addition should be done modulo M of course.
I've just found out how to solve this in O(n^2 log n) time (assuming each array has the same length):
for each A[i]:
for each B[j]:
if A[i] + B[j] + C.binarySearch(S - A[i] - B[j]) == S:
return (i, j, k)
Is there any way to solve this in O(n^2) time or to improve the above algorithm?
The algorithm you have ain't bad. Relative to n^2, log(n) grows so slowly that it can practically be considered a constant. For example, for n = 1000000, n^2 = 1000000000000 and log(n) = 20. Once n becomes large enough for log(n) to have any significant influence, n^2 will already be so big that the result cannot be computed anyway.
A solution, inspired by #YvesDaoust, though I'm not sure if it's exactly the same:
For every A[i], calculate the remainder R = S - A[i] which should be a combination of some B[j] and C[k];
Let j = 0 and k = |C|-1 (the last index in C);
If B[j] + C[k] < R, increase j;
If B[j] + C[k] > R, decrease k;
Repeat the two previous steps until B[j] + C[k] = R or j >= |B| or k < 0
I suggest not to complicate the algorithm too much with micro-optimizations. For any reasonably small set of numbers and it will be fast enough. If the arrays become too large for this approach, your problem would make a good candidate for Machine Learning approaches such as Hill Climbing.
if the arrays are of non-negatives
* you can trim all 3 arrays to at S => A[n] > S
* similarly, dont bother checking array C if A[aIdx] + B[bIdx] > S
prepare:
sort each array ascending +O(N.log(n))
implement binary search on each array ?O(log(N))
compute:
i=bin_search(smallest i that A[i]+B[0]+C[0]>=S); for (;i<Na;i++) { if (A[i]+B[0]+C[0]>S) break;
j=bin_search(smallest j that A[i]+B[j]+C[0]>=S); for (;j<Nb;j++) { if (A[i]+B[j]+C[0]>S) break;
ss=S-A[i]-B[j];
if (ss<0) break;
k=bin_search(ss);
if (k found) return; // found solution is: i,j,k
}
}
if I see it right and: N=max(Na,Nb,Nc), M=max(valid intervals A,B,C) ... M<=N
it is (3*N.log(N)+log(N)+M*log(N)*M*log(N)) -> O((M^2)*log(N))
the j binary search can be called just once and then iterate +1 if needed
the complexity is the same but the N has changed
for average conditions is this much much faster because M<<N
The O(N²) solution is very simple.
First consider the case of two arrays, finding A[i] + B[j] = S'.
This can be rewritten as A[i] = S' - B[j] = B'[j]: you need to find equal values in two sorted arrays. This is readily done in linear time with a merging process. (You can explicitly compute the array B' but this is unnecessary, just do it on the fly: instead of fetching B'[j], get S' - B[NB-1-j]).
Having established this procedure, it suffices to use it for all elements of C, in search of S - C[k].
Here is Python code that does that and reports all solutions. (It has been rewritten to be compact and symmetric.)
for k in range(NC):
# Find S - C[k] in top-to-tail merged A and B
i, j= 0, NB - 1
while i < NA and 0 <= j:
if A[i] + B[j] + C[k] < S:
# Move forward A
i+= 1
elif A[i] + B[j] + C[k] > S:
# Move back B
j-= 1
else:
# Found
print A[i] + B[j] + C[k], "=", A[i], "+", B[j], "+", C[k]
i+= 1; j-= 1
Execution with
A= [1, 2, 3, 4, 5, 6, 7]; NA= len(A)
B= [2, 3, 5, 7, 11]; NB= len(B)
C= [1, 1, 2, 3, 5, 7]; NC= len(C)
S= 15
gives
15 = 3 + 11 + 1
15 = 7 + 7 + 1
15 = 3 + 11 + 1
15 = 7 + 7 + 1
15 = 2 + 11 + 2
15 = 6 + 7 + 2
15 = 1 + 11 + 3
15 = 5 + 7 + 3
15 = 7 + 5 + 3
15 = 3 + 7 + 5
15 = 5 + 5 + 5
15 = 7 + 3 + 5
15 = 1 + 7 + 7
15 = 3 + 5 + 7
15 = 5 + 3 + 7
15 = 6 + 2 + 7
What is the fastest way of taking an array A and outputing both unique(A) [i.e. the set of unique array elements of A] as well as the multiplicity array which takes in its i-th place the i-th multiplicity of the i-th entry of unique(A) in A.
That's a mouthful, so here's an example. Given A=[1 1 3 1 4 5 3], I want:
unique(A)=[1 3 4 5]
mult = [3 2 1 1]
This can be done with a tedious for loop, but would like to know if there is a way to exploit the array nature of MATLAB.
uA = unique(A);
mult = histc(A,uA);
Alternatively:
uA = unique(A);
mult = sum(bsxfun(#eq, uA(:).', A(:)));
Benchmarking
N = 100;
A = randi(N,1,2*N); %// size 1 x 2*N
%// Luis Mendo, first approach
tic
for iter = 1:1e3;
uA = unique(A);
mult = histc(A,uA);
end
toc
%// Luis Mendo, second approach
tic
for iter = 1:1e3;
uA = unique(A);
mult = sum(bsxfun(#eq, uA(:).', A(:)));
end
toc
%'// chappjc
tic
for iter = 1:1e3;
[uA,~,ic] = unique(A); % uA(ic) == A
mult= accumarray(ic.',1);
end
toc
Results with N = 100:
Elapsed time is 0.096206 seconds.
Elapsed time is 0.235686 seconds.
Elapsed time is 0.154150 seconds.
Results with N = 1000:
Elapsed time is 0.481456 seconds.
Elapsed time is 4.534572 seconds.
Elapsed time is 0.550606 seconds.
[uA,~,ic] = unique(A); % uA(ic) == A
mult = accumarray(ic.',1);
accumarray is very fast. Unfortunately, unique gets slow with 3 outputs.
Late addition:
uA = unique(A);
mult = nonzeros(accumarray(A(:),1,[],#sum,0,true))
S = sparse(A,1,1);
[uA,~,mult] = find(S);
I've found this elegant solution in an old Newsgroup thread.
Testing with the benchmark of Luis Mendo for N = 1000 :
Elapsed time is 0.228704 seconds. % histc
Elapsed time is 1.838388 seconds. % bsxfun
Elapsed time is 0.128791 seconds. % sparse
(On my machine, accumarray results in Error: Maximum variable size allowed by the program is exceeded.)
I have an array in Matlab, let say of (256, 256). Now i need to build a new array of dimensions (3, 256*256) containing in each row the value, and the index of the value in the original array. I.e:
test = [1,2,3;4,5,6;7,8,9]
test =
1 2 3
4 5 6
7 8 9
I need as result:
[1, 1, 1; 2, 1, 2; 3, 1, 3; 4, 2, 1; 5, 2, 2; and so on]
Any ideas?
Thanks in advance!
What you want is the output of meshgrid
[C,B]=meshgrid(1:size(test,1),1:size(test,2))
M=test;
M(:,:,2)=B;
M(:,:,3)=C;
here's what i came up with
test = [1,2,3;4,5,6;7,8,9]; % orig matrix
[m, n] = size(test); % example 1, breaks with value zero elems
o = find(test);
test1 = [o, reshape(test, m*n, 1), o]
Elapsed time is 0.004104 seconds.
% one liner from above
% (depending on data size might want to avoid dual find calls)
test2=[ find(test) reshape(test, size(test,1)*size(test,2), 1 ) find(test)]
Elapsed time is 0.008121 seconds.
[r, c, v] = find(test); % just another way to write above, still breaks on zeros
test3 = [r, v, c]
Elapsed time is 0.009516 seconds.
[i, j] =ind2sub([m n],[1:m*n]); % use ind2sub to build tables of indicies
% and reshape to build col vector
test4 = [i', reshape(test, m*n, 1), j']
Elapsed time is 0.011579 seconds.
test0 = [1,2,3;0,5,6;0,8,9]; % testing find with zeros.....breaks
% test5=[ find(test0) reshape(test0, size(test0,1)*size(test0,2), 1 ) find(test0)] % error in horzcat
[i, j] =ind2sub([m n],[1:m*n]); % testing ind2sub with zeros.... winner
test6 = [i', reshape(test0, m*n, 1), j']
Elapsed time is 0.014166 seconds.
Using meshgrid from above:
Elapsed time is 0.048007 seconds.
I've found the following macro in a utility header in our codebase:
#define CEILING(x,y) (((x) + (y) - 1) / (y))
Which (with help from this answer) I've parsed as:
// Return the smallest multiple N of y such that:
// x <= y * N
But, no matter how much I stare at how this macro is used in our codebase, I can't understand the value of such an operation. None of the usages are commented, which seems to indicate it is something obvious.
Can anyone offer an English explanation of a use-case for this macro? It's probably blindingly obvious, I just can't see it...
Say you want to allocate memory in chunks (think: cache lines, disk sectors); how much memory will it take to hold an integral number of chunks that will contain the X bytes? If the chuck size is Y, then the answer is: CEILING(X,Y)
When you use an integer division in C like this
y = a / b
you get a result of division rounded towards zero, i.e. 5 / 2 == 2, -5 / 2 == -2. Sometimes it's desirable to round it another way so that 5 / 2 == 3, for example, if you want to take minimal integer array size to hold n bytes, you would want n / sizeof(int) rounded up, because you want space to hold that extra bytes.
So this macro does exactly this: CEILING(5,2) == 3, but note that it works for positive y only, so be careful.
Hmm... English example... You can only buy bananas in bunches of 5. You have 47 people who want a banana. How many bunches do you need? Answer = CEILING(47,5) = ((47 + 5) - 1) / 5 = 51 / 5 = 10 (dropping the remainder - integer division).
Let's try some test values
CEILING(6, 3) = (6 + 3 -1) / 3 = 8 / 3 = 2 // integer division
CEILING(7, 3) = (7 + 3 -1) / 3 = 9 / 3 = 3
CEILING(8, 3) = (8 + 3 -1) / 3 = 10 / 3 = 3
CEILING(9, 3) = (9 + 3 -1) / 3 = 11 / 3 = 3
CEILING(10, 3) = (9 + 3 -1) / 3 = 12 / 3 = 4
As you see, the result of the macro is an integer, the smallest possible z which satisfies: z * y >= x.
We can try with symbolics, as well:
CEILING(k*y, y) = (k*y + y -1) / y = ((k+1)*y - 1) / y = k
CEILING(k*y + 1, y) = ((k*y + 1) + y -1) / y = ((k+1)*y) / y = k + 1
CEILING(k*y + 2, y) = ((k*y + 2) + y -1) / y = ((k+1)*y + 1) / y = k + 1
....
CEILING(k*y + y - 1, y) = ((k*y + y - 1) + y -1) / y = ((k+1)*y + y - 2) / y = k + 1
CEILING(k*y + y, y) = ((k*y + y) + y -1) / y = ((k+1)*y + y - 1) / y = k + 1
CEILING(k*y + y + 1, y) = ((k*y + y + 1) + y -1) / y = ((k+2)*y) / y = k + 2
You canuse this to allocate memory with a size multiple of a constant, to determine how many tiles are needed to fill a screen, etc.
Watch out, though. This works only for positive y.
Hope it helps.
CEILING(x,y) gives you, assuming y > 0, the ceiling of x/y (mathematical division). One use case for that would be a prime sieve starting at an offset x, where you'd mark all multiples of the prime y in the sieve range as composites.