I have a large array A with ~500.000 rows of the form
[ id1 id2 value1 value2 zero zero ]
and another, smaller Array B (~20.000 rows) with rows with some of the identifiers from A
[ id1 id2 value3 value4 ]
All the pairs of IDs in B exist in A. I want to update the values of B into A at the positions where respectively the values of id1 and id2 match. The (row-)order of the new array may be arbitraty.
An example:
A = 1 1 3 5 0 0
1 2 6 4 0 0
1 3 3 1 0 0
2 1 3 8 0 0
3 4 0 2 0 0
B = 2 1 7 4
1 1 2 1
should yield
C = 1 1 3 5 2 1
1 2 6 4 0 0
1 3 3 1 0 0
2 1 3 8 7 4
3 4 0 2 0 0
Iterating through A for each element in B with for loops takes incredibly long. I hope there is a faster way.
You can use ismember to obtain the indices of the rows where "id1" and "id2" match, and then update the last two columns with the corresponding values from B:
C = A;
[tf, loc] = ismember(B(:, 1:2), A(:, 1:2), 'rows');
C(loc, 5:6) = B(:, 3:4);
Related
I have a tab-delimited data frame like this:
Sample 1
A 2
B 1
D 2
Sample 2
C 1
D 1
E 2
Sample 3
D 1
E 3
Sample 4
A 1
E 3
Sample 5
A 2
B 3
Sample 6
C 1
D 2
Sample 7
D 1
E 3
Sample 8
A 3
D 2
Sample 9
A 1
Sample 10
A 1
C 2
E 3
and I would like to transpose it and link the alphabets and the associated values. If the sample does not contain alphabets, I want "0" inserted.
So the modified dataframe I desire is
Sample A B C D E
1 2 1 0 2 0
2 0 0 1 1 1
3 0 0 0 1 3
4 1 0 0 0 3
5 2 3 0 0 0
6 0 0 1 2 0
7 0 0 0 1 3
8 3 0 0 2 0
9 1 0 0 0 0
10 1 0 2 0 3
I tried to summarize the datafreame, and then transpose it. When I used,
awk 'NR==1{print} NR>1{a[$1]=a[$1]" "$2}END{for (i in a){print i " " a[i]}}' TEST
the output data was
Sample 1
A 2 1 2 3 1 1
B 1 3
C 1 1 2
D 2 1 1 2 1 2
E 2 3 3 3 3
Sample 2 3 4 5 6 7 8 9 10
Sample 1 was isolated, and there was no space when the alphabets were not included by the samples.
I hope that makes sense.
Within the context of writing a certain function, I have the following example matrix:
temp =
1 2 0 0 1 0
1 0 0 0 0 0
0 1 0 0 0 1
I want to obtain an array whose each element indicates the number of the element out of all non-zero elements which starts that column. If a column is empty, the element should correspond to the next non-empty column. For the matrix temp, the result would be:
result = [1 3 5 5 5 6]
Because the first non-zero element starts the first column, the third starts the second column, the fifth starts the fifth column and the sixth starts the sixth column.
How can I do this operation for any general matrix (one which may or may not contain empty columns) in a vectorized way?
Code:
temp = [1 2 0 0 1 0; 1 0 0 0 0 0; 0 1 0 0 0 1]
t10 = temp~=0
l2 = cumsum(t10(end:-1:1))
temp2 = reshape(l2(end)-l2(end:-1:1)+1, size(temp))
result = temp2(1,:)
Output:
temp =
1 2 0 0 1 0
1 0 0 0 0 0
0 1 0 0 0 1
t10 =
1 1 0 0 1 0
1 0 0 0 0 0
0 1 0 0 0 1
l2 =
1 1 1 1 1 2 2 2 2 2 2 2 3 3 4 4 5 6
temp2 =
1 3 5 5 5 6
2 4 5 5 6 6
3 4 5 5 6 6
result =
1 3 5 5 5 6
Printing values of each step may be clearer than my explanation. Basically we use cumsum to get the IDs of the non-zero elements. As you need to know the ID before reaching the element, a reversed cumsum will do. Then the only thing left is to reverse the ID numbers back.
Here's another way:
temp = [1 2 0 0 1 0; 1 0 0 0 0 0; 0 1 0 0 0 1]; % data
[~, c] = find(temp); % col indices of nonzero elements
result = accumarray(c, 1:numel(c), [], #min, NaN).'; % index, among all nonzero
% values, of the first nonzero value of each col; or NaN if none exists
result = cummin(result, 'reverse'); % fill NaN's using backwards cumulative maximum
I have the following matrix:
1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2
0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2
0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2
I'd like to randomly permute the columns, with the constraint that every four numbers in the second row should contain some form of
0 0 1 2
e.g. Columns 1:4, 5:8, 9:12, 13:16, 17:20, 21:24 in the example below each contain the numbers 0 0 1 2.
0 1 0 2 2 0 1 0 0 0 2 1 1 2 0 0 2 0 1 0 1 0 0 2
Every column in the permuted matrix should have a corresponding one in the first matrix. In other words, nothing should be altered within a column.
I can't seem to think of an intuitive solution to this - Is there another way of coming up with some form of the initial matrix that both satisfies the constraint and retains the integrity of the columns? Each column represents conditions in an experiment, which is why I'd like them to be balanced.
You can compute the permutations directly in the following manner: First, permute all columns with 0 in the second row among themselves, then all 1s among themselves, and finally all 2s among themselves. This ensures that, for example, any two 0 columns are equally likely to be the first two columns in the resulting permutation of A.
The second step is to permute all columns in blocks of 4: permute columns 1-4 randomly, permute columns 5-8 randomly, etc. Once you do this, you have a matrix that maintains the (0 0 1 2) pattern for every block of 4 columns, but each set of (0 0 1 2) is equally likely to be in any given block of 4, and the (0 0 1 2) are equally likely to be in any order.
A = [1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2
0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2
0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2];
%% Find the indices of the zeros and generate a random permutation with that size
zeroes = find(A(2,:)==0);
perm0 = zeroes(randperm(length(zeroes)));
%% Find the indices of the ones and generate a random permutation with that size
wons = find(A(2,:) == 1);
perm1 = wons(randperm(length(wons)));
%% NOTE: the spelling of `zeroes` and `wons` is to prevent overwriting
%% the MATLAB builtin functions `zeros` and `ones`
%% Find the indices of the twos and generate a random permutation with that size
twos = find(A(2,:) == 2);
perm2 = twos(randperm(length(twos)));
%% permute the zeros among themselves, the ones among themselves and the twos among themselves
A(:,zeroes) = A(:,perm0);
A(:,wons) = A(:,perm1);
A(:,twos) = A(:,perm2);
%% finally, permute each block of 4 columns, so that the (0 0 1 2) pattern is preserved, but each column still has an
%% equi-probable chance of being in any position
for i = 1:size(A,2)/4
perm = randperm(4) + 4*i-4;
A(:, 4*i-3:4*i) = A(:,perm);
end
Example result:
A =
Columns 1 through 15
1 1 2 2 2 2 1 1 2 2 1 2 2 1 2
0 0 2 1 0 2 0 1 0 2 1 0 1 2 0
0 1 2 2 2 0 1 1 1 1 2 0 0 2 0
Columns 16 through 24
2 1 1 1 1 1 2 2 1
0 2 0 0 1 0 0 1 2
1 1 2 2 0 0 2 1 0
I was able to generate 100000 constrained permutations of A in about 9.32 seconds running MATLAB 2016a, to give you an idea of how long this code takes. There are certainly ways to optimize the permutation selection so you don't have to make quite so many random draws, but I always prefer the simple, straightforward approach until it proves insufficient.
You could use a rejection method: keep trying random permutations, chosen equiprobably, until one satisfies the requirement. This guarantees that all valid permutations have the same probability of being picked.
A = [ 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2
0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2 0 0 1 2
0 0 0 0 1 1 1 1 2 2 2 2 0 0 0 0 1 1 1 1 2 2 2 2 ]; % data matrix
required = [0 0 1 2]; % restriction
row = 2; % row to which the resitriction applies
sorted_req = sort(required(:)); % sort required values
done = false; % initiallize
while ~done
result = A(:, randperm(size(A,2))); % random permutation of columns of A
test = sort(reshape(result(row,:), numel(required), []), 1); % reshape row
% into blocks, each block in a column; and sort each block
done = all(all(bsxfun(#eq, test, sorted_req))); % test if valid
end
Here's an example result:
result =
2 1 1 1 1 2 1 2 1 2 2 1 2 2 1 2 2 2 1 1 1 2 1 2
2 0 0 1 2 1 0 0 0 1 0 2 2 0 1 0 1 2 0 0 2 0 1 0
2 1 2 2 1 2 2 0 1 1 1 2 1 1 0 0 0 0 0 0 0 2 1 2
Apologies in advance if this question is a duplicate, or if the solution to this question is very straightforward in Matlab. I have a M x N matrix A, a 1 x M vector ind, and another vector val. For example,
A = zeros(6,5);
ind = [3 4 2 4 2 3];
val = [1 2 3];
I would like to vectorize the following code:
for i = 1 : size(A,1)
A(i, ind(i)-1 : ind(i)+1) = val;
end
>> A
A =
0 1 2 3 0
0 0 1 2 3
1 2 3 0 0
0 0 1 2 3
1 2 3 0 0
0 1 2 3 0
That is, for row i of A, I want to insert the vector val in a certain location, as specificied by the i'th entry of ind. What's the best way to do this in Matlab without a for loop?
It can be done using bsxfun's masking capability: build a mask telling where the values will be placed, and then fill those values in. In doing this, it's easier to work with columns instead of rows (because of Matlab's column major order), and transpose at the end.
The code below determines the minimum number of columns in the final A so that all values fit at the specified positions.
Your example applies a displacement of -1 with respect to ind. The code includes a generic displacement, which can be modified.
%// Data
ind = [3 4 2 4 2 3]; %// indices
val = [1 2 3]; %// values
d = -1; %// displacement for indices. -1 in your example
%// Let's go
n = numel(val);
m = numel(ind);
N = max(ind-1) + n + d; %// number of rows in A (rows before transposition)
mask = bsxfun(#ge, (1:N).', ind+d) & bsxfun(#le, (1:N).', ind+n-1+d); %// build mask
A = zeros(size(mask)); %/// define A with zeros
A(mask) = repmat(val(:), m, 1); %// fill in values as indicated by mask
A = A.'; %// transpose
Result in your example:
A =
0 1 2 3 0
0 0 1 2 3
1 2 3 0 0
0 0 1 2 3
1 2 3 0 0
0 1 2 3 0
Result with d = 0 (no displacement):
A =
0 0 1 2 3 0
0 0 0 1 2 3
0 1 2 3 0 0
0 0 0 1 2 3
0 1 2 3 0 0
0 0 1 2 3 0
If you can handle a bit of bsxfun overdose, here's one with bsxfun's adding capability -
N = numel(ind);
A(bsxfun(#plus,N*[-1:1]',(ind-1)*N + [1:N])) = repmat(val(:),1,N)
Sample run -
>> ind
ind =
3 4 2 4 2 3
>> val
val =
1 2 3
>> A = zeros(6,5);
>> N = numel(ind);
>> A(bsxfun(#plus,N*[-1:1]',(ind-1)*N + [1:N])) = repmat(val(:),1,N)
A =
0 1 2 3 0
0 0 1 2 3
1 2 3 0 0
0 0 1 2 3
1 2 3 0 0
0 1 2 3 0
can any one help me to Vectorized this loop.
i have large Matrix and i want to replace all the pixel values whose length is less then some threshold Value For simplicity lets say
a = randi([1 5],10,10);
for i = 1:length(a)
someMat=a(a==i);
if length(someMat)<20
a(a==i)=0;
end
end
but its killing me.
Example:
a = randi([1 5],10,10)
a =
5 2 1 5 5 5 2 2 3 2
3 3 5 4 4 4 3 1 1 5
5 1 3 5 3 3 4 1 3 1
3 1 5 3 2 5 1 1 5 1
1 1 4 3 4 3 4 4 5 1
1 4 3 5 1 1 2 2 2 1
3 3 5 2 4 1 1 3 2 4
4 1 5 3 4 5 3 4 3 3
5 3 5 5 4 3 1 3 4 1
4 1 1 3 5 5 1 3 3 5
Result for Thresold 20
5 0 1 5 5 5 0 0 3 0
3 3 5 0 0 0 3 1 1 5
5 1 3 5 3 3 0 1 3 1
3 1 5 3 0 5 1 1 5 1
1 1 0 3 0 3 0 0 5 1
1 0 3 5 1 1 0 0 0 1
3 3 5 0 0 1 1 3 0 0
0 1 5 3 0 5 3 0 3 3
5 3 5 5 0 3 1 3 0 1
0 1 1 3 5 5 1 3 3 5
length of pixel 4 was 17
length of pixel 2 was 10
i try it by some thing like
[nVal Index] = histc(a(:),unique(a)); %
nVal(nVal>20) = 1; % just some threshold value and assigning by some Number may be zero as well
But I dont Know how to replace the Index Values of the corresponding Pixal and apply reshape to get it in original form. Here Even i am not sure that i will get the same Matrix With Reshape . Please Help me.....
thanks
I think this does what you want:
threshold_length = 20;
replace_value = 0;
u = unique(a); %// values of a
h = histc(a(:), u); %// count for each value
r = u(h<threshold_length); %// values to be removed
a(ismember(a,r)) = replace_value; %// remove those values
I see #LuisMendo arrived at mostly the same solution quicker than I did, but an alternative to using ismember is to use more of what unique gives you:
threshold = 20;
[vals, ~, ix] = unique(a); % capture the values and their indices
counts = histc(a(:), vals); % count the occurrences of each value
vals(counts<threshold) = 0; % zero the values that aren't common enough
a(:) = vals(ix); % recreate the matrix with updated values