How can I generate this matrix (containing only 0s and ±1s)? - arrays

I would like to generate matrix of size (n(n-1)/2, n) that looks like this (n=5 in this case):
-1 1 0 0 0
-1 0 1 0 0
-1 0 0 1 0
-1 0 0 0 1
0 -1 1 0 0
0 -1 0 1 0
0 -1 0 0 1
0 0 -1 1 0
0 0 -1 0 1
0 0 0 -1 1
This is what I, quickly, came up with:
G = [];
for i = 1:n-1;
for j = i+1:n
v = sparse(1,i,-1,1,n);
w = sparse(1,j,1,1,n);
vw = v+w;
G = [G; vw];
end
end
G = full(G);
It works, but is there a faster/cleaner way of doing it?

Use nchoosek to generate the indices of the columns that will be nonzero:
n = 5; %// number of columns
ind = nchoosek(1:n,2); %// ind(:,1): columns with "-1". ind(:,2): with "1".
m = size(ind,1);
rows = (1:m).'; %'// row indices
G = zeros(m,n);
G(rows + m*(ind(:,1)-1)) = -1;
G(rows + m*(ind(:,2)-1)) = 1;

You have two nested loops, which leads to O(N^2) complexity of non-vectorized operations, which is too much for this task. Take a look that your matrix actually has a rectursive pattern:
G(n+1) = [ -1 I(n)]
[ 0 G(n)];
where I(n) is identity matrix of size n. That's how you can express this pattern in matlab:
function G = mat(n)
% Treat original call as G(n+1)
n = n - 1;
% Non-recursive branch for trivial case
if n == 1
G = [-1 1];
return;
end
RT = eye(n); % Right-top: I(n)
LT = repmat(-1, n, 1); % Left-top: -1
RB = mat(n); % Right-bottom: G(n), recursive
LB = zeros(size(RB, 1), 1); % Left-bottom: 0
G = [LT RT; LB RB];
end
And it gives us O(N) complexity of non-vectorized operations. It probably will waste some memory during recursion and matrix composition if Matlab is not smart enought to factor these out. If it is critical, you may unroll recursion into loop and iteratively fill up corresponding places in your original pre-allocated matrix.

Related

Creating Indicator matrix based on vector with group IDs

I have a vector of group IDs:
groups = [ 1 ; 1; 2; 2; 3];
which I want to use to create a matrix consisting of 1's in case the i-th and the j-th element are in the same group, and 0 otherwise. Currently I do this as follows:
n = size(groups, 1);
indMatrix = zeros(n,n);
for i = 1:n
for j = 1:n
indMatrix(i,j) = groups(i) == groups(j);
end
end
indMatrix
indMatrix =
1 1 0 0 0
1 1 0 0 0
0 0 1 1 0
0 0 1 1 0
0 0 0 0 1
Is there a better solution avoiding the nasty double for-loop? Thanks!
This can be done quite easily using implicit singleton expansion, for R2016b or later:
indMatrix = groups==groups.';
For MATLAB versions before R2016b you need bsxfun to achieve singleton expansion:
indMatrix = bsxfun(#eq, groups, groups.');

Create a matrix with a diagonal and left-diagonal of all 1s in MATLAB

I would like to create a square matrix of size n x n where the diagonal elements as well as the left-diagonal are all equal to 1. The rest of the elements are equal to 0.
For example, this would be the expected result if the matrix was 5 x 5:
1 0 0 0 0
1 1 0 0 0
0 1 1 0 0
0 0 1 1 0
0 0 0 1 1
How could I do this in MATLAB?
Trivial using the tril function:
tril(ones(n),0) - tril(ones(n),-2)
And if you wanted a thicker line of 1s just adjust that -2:
n = 10;
m = 4;
tril(ones(n),0) - tril(ones(n),-m)
If you prefer to use diag like excaza suggested then try
diag(ones(n,1)) + diag(ones(n-1,1),-1)
but you can't control the 'thickness' of the stripe this way. However, for a thickness of 2, it might perform better. You'd have to test it though.
You can also use spdiags too to create that matrix:
n = 5;
v = ones(n,1);
d = full(spdiags([v v], [-1 0], n, n));
We get:
>> d
d =
1 0 0 0 0
1 1 0 0 0
0 1 1 0 0
0 0 1 1 0
0 0 0 1 1
The first two lines define the desired size of the matrix, assuming a square n x n as well as a vector of all ones that is of length n x 1. We then call spdiags to define where along the diagonal of this matrix this vector will be populating. We want to define the main diagonal to have all ones as well as the diagonal to the left of the main diagonal, or -1 away from the main diagonal. spdiags will adjust the total number of elements for the diagonal away from the main to compensate.
We also ensure that the output is of size n x n, but this matrix is actually sparse . We need to convert the matrix to full to complete the result.,
With a bit of indices juggling, you can also do this:
N = 5;
ind = repelem(1:N, 2); % [1 1 2 2 3 3 ... N N]
M = full(sparse(ind(2:end), ind(1:end-1), 1))
Simple approach using linear indexing:
n = 5;
M = eye(n);
M(2:n+1:end) = 1;
This can also be done with bsxfun:
n = 5; %// matrix size
d = [0 -1]; %// diagonals you want set to 1
M = double(ismember(bsxfun(#minus, 1:n, (1:n).'), d));
For example, to obtain a 5x5 matrix with the main diagonal and the two diagonals below set to 1, define n=5 and d = [0 -1 -2], which gives
M =
1 0 0 0 0
1 1 0 0 0
1 1 1 0 0
0 1 1 1 0
0 0 1 1 1

Finding the column indices of submatrices in MATLAB

Suppose I have the following matrix
1 1 0 0 0
1 1 0 0 0
0 0 1 1 1
0 0 1 1 1
0 0 1 1 1
The result would be
{[1,2],[3,4,5]}
How would I implement this?
I have an ugly solution involving a loop that runs through the diagonal (except (1,1)) and checks whether the element directly left is 0. If not, that is the start of a new cluster.
Is there a prettier solution?
EDIT: current solution:
n = size(input, 2);
result = cell(1,n);
result{1} = 1;
counter = 1;
for i = 2:n
if input(i,i-1) ~= 1
counter = counter + 1;
end
result{counter} = [result{counter} i];
end
result = result(~cellfun('isempty',result));
use unique with 'rows' argument on the matrix transposed

In matlab, find the frequency at which unique rows appear in a matrix

In Matlab, say I have the following matrix, which represents a population of 10 individuals:
pop = [0 0 0 0 0; 1 1 1 0 0; 1 1 1 1 1; 1 1 1 0 0; 0 0 0 0 0; 0 0 0 0 0; 1 0 0 0 0; 1 1 1 1 1; 0 0 0 0 0; 0 0 0 0 0];
Where rows of ones and zeros define 6 different 'types' of individuals.
a = [0 0 0 0 0];
b = [1 0 0 0 0];
c = [1 1 0 0 0];
d = [1 1 1 0 0];
e = [1 1 1 1 0];
f = [1 1 1 1 1];
I want to define the proportion/frequency of a, b, c, d, e and f in pop.
I want to end up with the following list:
a = 0.5;
b = 0.1;
c = 0;
d = 0.2;
e = 0;
f = 0.2;
One way I can think of is by summing the rows, then counting the number of times each appears, and then sorting and indexing
sum_pop = sum(pop')';
x = unique(sum_pop);
N = numel(x);
count = zeros(N,1);
for l = 1:N
count(l) = sum(sum_pop==x(l));
end
pop_frequency = [x(:) count/10];
But this doesn't quite get me what I want (i.e. when frequency = 0) and it seems there must be a faster way?
You can use pdist2 (Statistics Toolbox) to get all frequencies:
indiv = [a;b;c;d;e;f]; %// matrix with all individuals
result = mean(pdist2(pop, indiv)==0, 1);
This gives, in your example,
result =
0.5000 0.1000 0 0.2000 0 0.2000
Equivalently, you can use bsxfun to manually compute pdist2(pop, indiv)==0, as in Divakar's answer.
For the specific individuals in your example (that can be identified by the number of ones) you could also do
result = histc(sum(pop, 2), 0:size(pop,2)) / size(pop,1);
There is some functionality in unique that can be used for this. If
[q,w,e] = unique(pop,'rows');
q is the matrix of unique rows, w is the index of the row first appears in the matrix. The third element e contains indices of q so that pop = q(e,:). Armed with this, the rest of the problem should be straight forward. The probability of a value in e should be the probability that this row appears in pop.
The counting can be done with histc
histc(e,1:max(e))/length(e)
and the non occuring rows can be found with
ismember(a,q,'rows')
There is of course other ways as well, maybe (probably) faster ways, or oneliners. Why I post this is because it provides a way that is easy to understand, readable and that does not require any special toolboxes.
EDIT
This example gives expected output
a = [0,0,0,0,0;1,0,0,0,0;1,1,0,0,0;1,1,1,0,0;1,1,1,1,0;1,1,1,1,1]; % catenated a-f
[q,w,e] = unique(pop,'rows');
prob = histc(e,1:max(e))/length(e);
out = zeros(size(a,1),1);
out(ismember(a,q,'rows')) = prob;
Approach #1
With bsxfun -
A = cat(1,a,b,c,d,e,f)
out = squeeze(sum(all(bsxfun(#eq,pop,permute(A,[3 2 1])),2),1))/size(pop,1)
Output -
out =
0.5000
0.1000
0
0.2000
0
0.2000
Approach #2
If those elements are binary numbers, you can convert them into decimal format.
Thus, decimal format for pop becomes -
>> bi2de(pop)
ans =
0
7
31
7
0
0
1
31
0
0
And that of the concatenated array, A becomes -
>> bi2de(A)
ans =
0
1
3
7
15
31
Finally, you need to count the decimal formatted numbers from A in that of pop, which you can do with histc. Here's the code -
A = cat(1,a,b,c,d,e,f)
out = histc(bi2de(pop),bi2de(A))/size(pop,1)
Output -
out =
0.5000
0.1000
0
0.2000
0
0.2000
I think ismember is the most direct and general way to do this. If your groups were more complicated, this would be the way to go:
population = [0,0,0,0,0; 1,1,1,0,0; 1,1,1,1,1; 1,1,1,0,0; 0,0,0,0,0; 0,0,0,0,0; 1,0,0,0,0; 1,1,1,1,1; 0,0,0,0,0; 0,0,0,0,0];
groups = [0,0,0,0,0; 1,0,0,0,0; 1,1,0,0,0; 1,1,1,0,0; 1,1,1,1,0; 1,1,1,1,1];
[~, whichGroup] = ismember(population, groups, 'rows');
freqOfGroup = accumarray(whichGroup, 1)/size(groups, 1);
In your special case the groups can be represented by their sums, so if this generic solution is not fast enough, use the sum-histc simplification Luis used.

How to find the longest interval of 1's in a list [matlab]

I need to find the longest interval of 1's in a matrix, and the position of the first "1" in that interval.
For example if i have a matrix: [1 0 0 1 1 1 0 0 0 0 1 1 1 1 1 1 1 ]
I need to have both the length of 7 and that the first 1's position is 11.
Any suggestions on how to proceed would be appreciated.
Using this anwser as a basis, you can do as follows:
a = [1 0 0 1 1 1 0 0 0 0 1 1 1 1 1 1 1 ]
dsig = diff([0 a 0]);
startIndex = find(dsig > 0);
endIndex = find(dsig < 0) - 1;
duration = endIndex-startIndex+1;
duration
startIdx = startIndex(duration == max(duration))
endIdx = endIndex(duration == max(duration))
This outputs:
duration =
1 3 7
startIdx =
11
endIdx =
17
Please note, this probably needs double checking if it works for other cases than your example. Nevertheless, I think this is the way in the right directions. If not, in the linked anwser you can find more info and possibilities.
If there are multiple intervals of one of the same length, it will only give the position of the first interval.
A=round(rand(1,20)) %// test vector
[~,p2]=find(diff([0 A])==1); %// finds where a string of 1's starts
[~,p3]=find(diff([A 0])==-1); %// finds where a string of 1's ends
le=p3-p2+1; %// length of each interval of 1's
ML=max(le); %// length of longest interval
ML %// display ML
p2(le==ML) %// find where strings of maximum length begin (per Marcin's answer)
I have thought of a brute force approach;
clc; clear all; close all;
A= [1 0 0 1 1 1 0 0 0 0 1 1 1 1 1 1 1 ];
index = 1;
globalCount = 0;
count = 0;
flag = 0; %// A flag to keep if the previous encounter was 0 or 1
for i = 1 : length(A)
if A(i) == 1
count = count + 1;
if flag == 0
index = i
flag = 1;
end
end
if A(i) == 0 || i == length(A)
if count > globalCount
globalCount = count;
end
flag = 0;
count = 0;
end
end

Resources