Transforming a data file into matrix with columns of identical elements? - arrays

I have a very large data file which has a format like below:
1 2 3 4 6 7 8
1 2 3 4 6
1 2 3 5 4 6
1 2 3 4 6
1 2 3 4 6
1 2 3 4 6 8
I am trying to load this data into Matlab. My aim is to create a matrix which has identical elements per one column and if some value is missing fill it with zero. So the output will be something like below:
1 2 3 4 0 6 7 8
1 2 3 4 0 6 0 0
1 2 3 4 5 6 0 0
1 2 3 4 0 6 0 0
1 2 3 4 0 6 0 0
1 2 3 4 0 6 0 8
Can someone give me any idea/code-snippets/links to realize this?

OK. Here is how I did it(test.dat is the file name with the input data):
%// The first section reads the dat file and fills missing entries in columns with zeros
fid = fopen('test.dat');
textLine = fgets(fid); % Read first line.
lineCounter = 1;
while ischar(textLine)
% get into numbers array.
numbers = sscanf(textLine, '%f ');
% Put numbers into a cell array IF and only if
% you need them after the loop has exited.
% First method - each number in one cell.
for k = 1 : length(numbers)
ca{lineCounter, k} = numbers(k);
end
% ALternate way where the whole array is in one cell.
ca2{lineCounter} = numbers;
% Read the next line.
textLine = fgets(fid);
lineCounter = lineCounter + 1;
end
fclose(fid);
emptyIndex = cellfun(#isempty,ca); %# Find indices of empty cells
ca(emptyIndex) = {0}; %# Fill empty cells with 0
A=cell2mat(ca);
%// The second section with create a new matrix AA from A matrix
%// which will be a unique entry in each column with missing entries as zero
uniq=unique(A);
row=size(A);
row=row(1);
%not considering zero
AA=zeros(row,uniq(end));
AA_idx=[];
for x=uniq(2):uniq(end)
AA_idxr=mod(find(A==x),row);
AA_idxr(AA_idxr==0)=row;
AA_idxc=x*ones(length(AA_idxr),1);
% AA_idxc(AA_idxc==0)=uniq(end)
c=[AA_idxr AA_idxc];
AA_idx=cat(1,AA_idx,c);
c=[];
end
for i=1:length(AA_idx)
index=AA_idx(i,:);
a=index(1);
b=index(2);
AA(a,b)=b;
end

Related

Number 0's and 1's blocks in a binary vector

In MATLAB, there is the bwlabel function, that given a binary vector, for instance x=[1 1 0 0 0 1 1 0 0 1 1 1 0] gives (bwlabel(x)):
[1 1 0 0 0 2 2 0 0 3 3 3 0]
but what I want to obtain is
[1 1 2 2 2 3 3 4 4 5 5 5 6]
I know I can negate x to obtain (bwlabel(~x))
[0 0 1 1 1 0 0 2 2 0 0 0 3]
But how can I combine them?
All in one line:
y = cumsum([1,abs(diff(x))])
Namely, abs(diff(x)) spots changes in the binary vector, and you gain the output with the cumulative sum.
You can still do it using bwlabel by vertically concatenating x and ~x, using 4-connected components for the labeling, then taking the maximum down each column:
>> max(bwlabel([x; ~x], 4))
ans =
1 1 2 2 2 3 3 4 4 5 5 5 6
However, the solution from Bentoy13 is probably a bit faster.
x=[1 1 0 0 0 1 1 0 0 1 1 1 0];
A = bwlabel(x);
B = bwlabel(~x);
if x(1)==1
tmp = A>0;
A(tmp) = 2*A(tmp)-1;
tmp = B>0;
B(tmp) = 2*B(tmp);
C = A+B
elseif x(1)==0
tmp = A>0;
A(tmp) = 2*A(tmp);
tmp = B>1;
B(tmp) = 2*B(tmp)-1;
C = A+B
end
C =
1 1 2 2 2 3 3 4 4 5 5 5 6
You know the first index should remain 1, but the second index should go from 1 to 2, the third from 2 to 3 etc; thus even indices should be doubled and odd indices should double minus one. This is given by A+A-1 for odd entries, and B+B for even entries. So a simple check for whether A or B contains the even points is sufficient, and then simply add the two arrays.
I found this function that does exactly what i wanted:
https://github.com/davidstutz/matlab-multi-label-connected-components
So, clone the repository and compile in matlab using mex :
mex sp_fast_connected_relabel.cpp
Then,
labels = sp_fast_connected_relabel(x);

Finding the distribution of length of islands in a 2D array?

I will explain my question using an example. Imagine you have a 2D matrix like below:
5 4 3 8 0 0
5 4 2 9 1 0
5 6 2 7 2 0
5 4 7 9 0 0
5 6 7 1 2 0
By islands I mean column groups of same elements (except zeros).
I would like to find the histogram of length of islands except those consisting of zero elements.
This matrix has
island-length occurrence
5 1
2 3
1 12
How can I realize this task using Matlab ?
Maybe there are shorter possibilities, but this will do - and it is fully vectorized:
A = [5 4 3 8 0 0
5 4 2 9 1 0
5 6 2 7 2 0
5 4 7 9 0 0
5 6 7 1 2 0]
%// pad zeros to first line of A
X(2:size(A,1)+1,:) = A;
%// differences of X
dX = diff(X)
%// cumulative sum of "logicalized" differences
cs = cumsum(logical(dX(:)))
%// filter out zeros
cs = cs(logical(A(:)))
%// count occurances
aa = accumarray(cs,1)
%// unique occurances
uaa = unique(aa)
%// count unique occurances
occ = hist(aa,uaa).'
%// accumarray may introduce new zeros, filter out
mask = logical(uaa)
%// output
out = [occ(mask) uaa(mask)]
out =
12 1
3 2
1 5
Needed a slight modification to one of my old snippets to filter the zeros. Here you go:
% Your Matrix
A = [ 5 4 3 8 0 0;
5 4 2 9 1 0;
5 6 2 7 2 0;
5 4 7 9 0 0;
5 6 7 1 2 0];
% Find Edges (Ends of Islands)
B = diff(A);
B = [ones(1,size(A,2));B~=0;ones(1,size(A,2))];
% At each column, find distances between island edges, filter out zero islands.
R = cell(size(A,2),1);
for i = 1:size(A,2)
[C ~] = find(B(:,i));
Ac = A(C(1:end-1),i);
D = diff(C);
D(Ac==0)=[];
R{i} = D;
end
% Find histogram of island lengths
R = R(find(~cellfun(#isempty,R)),1);
R = cell2mat(R);
[a,~,c] = unique(R);
out = [a, accumarray(c,ones(size(R)))];

Removing zeros and then vertically collapse the matrix

In MATLAB, say I have a set of square matrices, say A, with trace(A)=0 as follows:
For example,
A = [0 1 2; 3 0 4; 5 6 0]
How can I remove the zeros and then vertically collapse the matrix to become as follow:
A_reduced = [1 2; 3 4; 5 6]
More generally, what if the zeroes can appear anywhere in the column (i.e., not necessarily at the long diagonal)? Assuming, of course, that the total number of zeros for all columns are the same.
The matrix can be quite big (hundreds x hundreds in dimension). So, an efficient way will be appreciated.
To compress the matrix vertically (assuming every column has the same number of zeros):
A_reduced_v = reshape(nonzeros(A), nnz(A(:,1)), []);
To compress the matrix horizontally (assuming every row has the same number of zeros):
A_reduced_h = reshape(nonzeros(A.'), nnz(A(1,:)), []).';
Case #1
Assuming that A has equal number of zeros across all rows, you can compress it horizontally (i.e. per row) with this -
At = A' %//'# transpose input array
out = reshape(At(At~=0),size(A,2)-sum(A(1,:)==0),[]).' %//'# final output
Sample code run -
>> A
A =
0 3 0 2
3 0 0 1
7 0 6 0
1 0 6 0
0 16 0 9
>> out
out =
3 2
3 1
7 6
1 6
16 9
Case #2
If A has equal number of zeros across all columns, you can compress it vertically (i.e. per column) with something like this -
out = reshape(A(A~=0),size(A,1)-sum(A(:,1)==0),[]) %//'# final output
Sample code run -
>> A
A =
0 3 7 1 0
3 0 0 0 16
0 0 6 6 0
2 1 0 0 9
>> out
out =
3 3 7 1 16
2 1 6 6 9
This seems to work, quite fiddly to get the behaviour right with transposing:
>> B = A';
>> C = B(:);
>> reshape(C(~C==0), size(A) - [1, 0])'
ans =
1 2
3 4
5 6
As your zeros are always in the main diagonal you can do the following:
l = tril(A, -1);
u = triu(A, 1);
out = l(:, 1:end-1) + u(:, 2:end)
A correct and very simple way to do what you want is:
A = [0 1 2; 3 0 4; 5 6 0]
A =
0 1 2
3 0 4
5 6 0
A = sort((A(find(A))))
A =
1
2
3
4
5
6
A = reshape(A, 2, 3)
A =
1 3 5
2 4 6
I came up with almost the same solution as Mr E's though with another reshape command. This solution is more universal, as it uses the number of rows in A to create the final matrix, instead of counting the number of zeros or assuming a fixed number of zeros..
B = A.';
B = B(:);
C = reshape(B(B~=0),[],size(A,1)).'

Matlab- moving numbers to new row if condition is met

I have a variable like this that is all one row:
1 2 3 4 5 6 7 8 9 2 4 5 6 5
I want to write a for loop that will find where a number is less than the previous one and put the rest of the numbers in a new row, like this
1 2 3 4 5 6 7 8 9
2 4 5 6
5
I have tried this:
test = [1 2 3 4 5 6 7 8 9 2 4 5 6 5];
m = zeros(size(test));
for i=1:numel(test)-1;
for rows=1:size(m,1)
if test(i) > test(i+1);
m(i+1, rows+1) = test(i+1:end)
end % for rows
end % for
But it's clearly not right and just hangs.
Let x be your data vector. What you want can be done quite simply as follows:
ind = [find(diff(x)<0) numel(x)]; %// find ends of increasing subsequences
ind(2:end) = diff(ind); %// compute lengths of those subsequences
y = mat2cell(x, 1, ind); %// split data vector according to those lenghts
This produces the desired result in cell array y. A cell array is used so that each "row" can have a different number of columns.
Example:
x = [1 2 3 4 5 6 7 8 9 2 4 5 6 5];
gives
y{1} =
1 2 3 4 5 6 7 8 9
y{2} =
2 4 5 6
y{3} =
5
If you are looking for a numeric array output, you would need to fill the "gaps" with something and filling with zeros seem like a good option as you seem to be doing in your code as well.
So, here's a bsxfun based approach to achieve the same -
test = [1 2 3 4 5 6 7 8 9 2 4 5 6 5] %// Input
idx = [find(diff(test)<0) numel(test)] %// positions of row shifts
lens = [idx(1) diff(idx)] %// lengths of each row in the proposed output
m = zeros(max(lens),numel(lens)) %// setup output matrix
m(bsxfun(#le,[1:max(lens)]',lens)) = test; %//'# put values from input array
m = m.' %//'# Output that is a transposed version after putting the values
Output -
m =
1 2 3 4 5 6 7 8 9
2 4 5 6 0 0 0 0 0
5 0 0 0 0 0 0 0 0

Counting the occurance of a unique number in an array - MATLAB

I have an array that looks something like...
1 0 0 1 2 2 1 1 2 1 0
2 1 0 0 0 1 1 0 0 2 1
1 2 2 1 1 1 2 0 0 1 0
0 0 0 1 2 1 1 2 0 1 2
however my real array is (50x50).
I am relatively new to MATLAB and need to be able to count the amount of unique values in each row and column, for example there is four '1's in row-2 and three '0's in column-3. I need to be able to do this with my real array.
It would help even more if these quantities of unique values were in arrays of their own also.
PLEASE use simple language, or else i will get lost, for example if representing an array, don't call it x, but perhaps column_occurances_array... for me please :)
What I would do is iterate over each row of your matrix and calculate a histogram of occurrences for each row. Use histc to calculate the occurrences of each row. The thing that is nice about histc is that you are able to specify where the bins are to start accumulating. These correspond to the unique entries for each row of your matrix. As such, use unique to compute these unique entries.
Now, I would use arrayfun to iterate over all of your rows in your matrix, and this will produce a cell array. Each element in this cell array will give you the counts for each unique value for each row. Therefore, assuming your matrix of values is stored in A, you would simply do:
vals = arrayfun(#(x) [unique(A(x,:)); histc(A(x,:), unique(A(x,:)))], 1:size(A,1), 'uni', 0);
Now, if we want to display all of our counts, use celldisp. Using your example, and with the above code combined with celldisp, this is what I get:
vals{1} =
0 1 2
3 5 3
vals{2} =
0 1 2
5 4 2
vals{3} =
0 1 2
3 5 3
vals{4} =
0 1 2
4 4 3
What the above display is saying is that for the first row, you have 3 zeros, 5 ones and 3 twos. The second row has 5 zeros, 4 ones and 2 twos and so on. These are just for the rows. If you want to do these for columns, you have to modify your code slightly to operate along columns:
vals = arrayfun(#(x) [unique(A(:,x)) histc(A(:,x), unique(A(:,x)))].', 1:size(A,2), 'uni', 0);
By using celldisp, this is what we get:
vals{1} =
0 1 2
1 2 1
vals{2} =
0 1 2
2 1 1
vals{3} =
0 2
3 1
vals{4} =
0 1
1 3
vals{5} =
0 1 2
1 1 2
vals{6} =
1 2
3 1
vals{7} =
1 2
3 1
vals{8} =
0 1 2
2 1 1
vals{9} =
0 2
3 1
vals{10} =
1 2
3 1
vals{11} =
0 1 2
2 1 1
This means that in the first column, we see 1 zero, 2 ones and 1 two, etc. etc.
I absolutely agree with rayryeng! However, here is some code which might be easier to understand for you as a beginner. It is without cell arrays or arrayfuns and quite self-explanatory:
%% initialize your array randomly for demonstration:
numRows = 50;
numCols = 50;
yourArray = round(10*rand(numRows,numCols));
%% do some stuff of what you are asking for
% find all occuring numbers in yourArray
occVals = unique(yourArray(:));
% now you could sort them just for convinience
occVals = sort(occVals);
% now we could create a matrix occMat_row of dimension |occVals| x numRows
% where occMat_row(i,j) represents how often the ith value occurs in the
% jth row, analoguesly occMat_col:
occMat_row = zeros(length(occVals),numRows);
occMat_col = zeros(length(occVals),numCols);
for k = 1:length(occVals)
occMat_row(k,:) = sum(yourArray == occVals(k),2)';
occMat_col(k,:) = sum(yourArray == occVals(k),1);
end

Resources