I have a cell array that I need to split into several matrices so that I can take the sum of subsets of the data. This is a sample of what I have:
A = {'M00.300', '1644.07';...
'M00.300', '9745.42'; ...
'M00.300', '2232.88'; ...
'M00.600', '13180.82'; ...
'M00.600', '2755.19'; ...
'M00.600', '15800.38'; ...
'M00.900', '18088.11'; ...
'M00.900', '1666.61'};
I want the sum of the second columns for each of 'M00.300', 'M00.600', and 'M00.900'. For example, to correspond to 'M00.300' I would want 1644.07 + 9745.42 + 2232.88.
I don't want to just hard code it because each data set is different, so I need the code to work for different size cell arrays.
I'm not sure of the best way to do this, I was going to begin by looping through A and comparing the strings in the first column and creating matrices within that loop, but that sounded messy and not efficient.
Is there a simpler way to do this?
Classic use of accumarray. You would use the first column as an index and the second column as the values associated with each index. accumarray works where you group values that belong to the same index together and you apply a function to those values. In your case, you'd use the default behaviour and sum things together.
However, you'll need to convert the first column into numeric labels. The third output of unique will help you do this. You'll also need to convert the second column into a numeric array and so str2double is a perfect way to do this.
Without further ado:
[val,~,id] = unique(A(:,1)); %// Get unique values and indices
out = accumarray(id, str2double(A(:,2))); %// Aggregate the groups and sum
format long g; %// For better display of precision
T = table(val, out) %// Display on a nice table
I get this:
>> T = table(val, out)
T =
val out
_________ ________
'M00.300' 13622.37
'M00.600' 31736.39
'M00.900' 19754.72
The above uses the table class that is available from R2013b and onwards. If you don't have this, you can perhaps use a for loop and print out each cell and value separately:
for idx = 1 : numel(out)
fprintf('%s: %f\n', val{idx}, out(idx));
end
We get:
M00.300: 13622.370000
M00.600: 31736.390000
M00.900: 19754.720000
Related
So I m trying to find some alternative to sumifs in excel where each condition needs to be checked in a 2D range instead of a 1D range.
For example, in the below table I want the sum of values in column V for rows where A12 ("IJ") is present in range A2:C8 (P), B12 ("NM") is present in the range D2:F8 (S) and C12 ("XX") is present in range G2:I8 (A)
I am trying to find a solution involving an array-based formula (without VBA).
Like for example in the below-given formulas,
SUMPRODUCT((B2:B8'=A12)*J2:J8) will give an array-based calculation as follows
SUMPRODUCT({TRUE;FALSE;TRUE;FALSE;FALSE;TRUE;FALSE}*{22;79;45;67;43;72;52})
= SUMPRODUCT({22;0;45;0;0;72;0})
=139
It is easy when there is only one condition needs to be checked but like sumifs, I intend to check multiple conditions, but as soon as I add other conditions, the array becomes multidimensional and gives the wrong answer.
Example:
SUMPRODUCT((A2:C8=A12)*(D2:F8=B12)*J2:J8) breaks down to
=SUMPRODUCT(
{FALSE,TRUE,FALSE;FALSE,FALSE,FALSE;FALSE,TRUE,FALSE;FALSE,FALSE,FALSE;FALSE,FALSE,FALSE;FALSE,TRUE,FALSE;FALSE,FALSE,FALSE}*
{TRUE,FALSE,FALSE;FALSE,FALSE,FALSE;TRUE,FALSE,FALSE;FALSE,FALSE,FALSE;FALSE,FALSE,FALSE;TRUE,FALSE,FALSE;FALSE,FALSE,FALSE}
*J2:J8)
in the background what is happening is (example for 3rd row)
SUMPRODUCT( ({FALSE, TRUE ,FALSE} * {TRUE,FALSE,FALSE}) * 45 )
= SUMPRODUCT({FALSE,FALSE,FALSE} *45 )
=0
SUMPRODUCT(({FALSE,TRUE ,FALSE} + {TRUE,FALSE,FALSE}) * 45 )
= SUMPRODUCT({TRUE,TRUE,FALSE} *45 )
= 90
#expected answer =45
Can someone help me understand where I am going wrong or what I am missing?
If there is any other way then suggestions are always welcome.
Please note this is a dummy data actual data is very big for each header (P,S,A) there are values in 10 columns respectively and the number of rows is also very large.
Try this...
=SUMPRODUCT( ((A2:A8=A12)+(B2:B8=A12)+(C2:C8=A12)) * ((D2:D8=B12)+(E2:E8=B12)+(F2:F8=B12)) * ((G2:G8=C12)+(H2:H8=C12)+(I2:I8=C12)) * J2:J8 )
For SUMPRODUCT to work, the shape of the Boolean array needs to match the shape of the array you wish to conditionally sum.
J2:J8 is seven rows tall by one column wide.
The above formula creates an array of 1s and 0s from your three criteria ranges and shapes it into seven rows tall by one column wide.
At that point, SUMPRODUCT can do it's normal thing because the criteria array matches the dimension of the sum array J2:J8.
I have a column matrix and a cell array which has two columns.The first column has 1x2 doubles and the second column has 1x1 doubles.
For example
columnMatrix = [1;5];
cellArray = {[1,8],[10];[8,1],[20];[4,6],[80];[3,5],[40];[14,16],[85];[5,10],[36]};
I would like to search each element of columnMatrix in cellArray(:,1) and then return its corresponding value in cellArray(:,2)
For example the output has to be like this
newCell = {[1],[10,20];[5],[40,36]};
I tried using the ismember function in this way
[~,idx] = ismember(cell2mat(cellArray(:,1)),columnMatrix (: , 1));
This returns all the indices which have the searched element but they are in two seperate columns and I can not perform any other logical operation to get the corresponding second column entry.
Is there some way this operation can be achieved? Could some one please help?
Thanks
First of all, convert first column of cellArray to a matrix so it would be easier to search values in. Then iterate over columnMatrix values (e.g. using arrayfun, but you could also use for loop), find rows that match (any across columns) and select corresponding values from the second column of cellArray, converting to numeric array ([cellArray{...,2}]). Finally, append the columnMatrix as the first column of the resulting cell array:
columnMatrix = [1;5]; cellArray = {[1,8],[10];[8,1],[20];[4,6],[80];[3,5],[40];[14,16],[85];[5,10],[36]};
mat = cell2mat(cellArray(:,1));
values = arrayfun(#(x) [cellArray{any(mat==x,2),2}], columnMatrix, 'uni', false);
result = [num2cell(columnMatrix), values];
I'm quite new to MatLab and this problem really drives me insane:
I have a huge array of 2 column and about 31,000 rows. One of the two columns depicts a spatial coordinate on a grid the other one a dependent parameter. What I want to do is the following:
I. I need to split the array into smaller parts defined by the spatial column; let's say the spatial coordinate are ranging from 0 to 500 - I now want arrays that give me the two column values for spatial coordinate 0-10, then 10-20 and so on. This would result in 50 arrays of unequal size that cover a spatial range from 0 to 500.
II. Secondly, I would need to calculate the average values of the resulting columns of every single array so that I obtain per array one 2-dimensional point.
III. Thirdly, I could plot these points and I would be super happy.
Sadly, I'm super confused since I miserably fail at step I. - Maybe there is even an easier way than to split the giant array in so many small arrays - who knows..
I would be really really happy for any suggestion.
Thank you,
Arne
First of all, since you wish a data structure of array of different size you will need to place them in a cell array so you could try something like this:
res = arrayfun(#(x)arr(arr(:,1)==x,:), unique(arr(:,1)), 'UniformOutput', 0);
The previous code return a cell array with the array splitted according its first column with #(x)arr(arr(:,1)==x,:) you are doing a function on x and arrayfun(function, ..., 'UniformOutput', 0) applies function to each element in the following arguments (taken a single value of each argument to evaluate the function) but you must notice that arr must be numeric so if not you should map your values to numeric values or use another way to select this values.
In the same way you could do
uo = 'UniformOutput';
res = arrayfun(#(x){arr(arr(:,1)==x,:), mean(arr(arr(:,1)==x,2))), unique(arr(:,1)), uo, 0);
You will probably want to flat the returning value, check the function cat, you could do:
res = cat(1,res{:})
Plot your data depends on their format, so I can't help if i don't know how the data are, but you could try to plot inside a loop over your 'res' variable or something similar.
Step I indeed comes with some difficulties. Once these are solved, I guess steps II and III can easily be solved. Let me make some suggestions for step I:
You first define the maximum value (maxValue = 500;) and the step size (stepSize = 10;). Now it is possible to iterate through all steps and create your new vectors.
for k=1:maxValue/stepSize
...
end
As every resulting array will have different dimensions, I suggest you save the vectors in a cell array:
Y = cell(maxValue/stepSize,1);
Use the find function to find the rows of the entries for each matrix. At each step k, the range of values of interest will be (k-1)*stepSize to k*stepSize.
row = find( (k-1)*stepSize <= X(:,1) & X(:,1) < k*stepSize );
You can now create the matrix for a stepk by
Y{k,1} = X(row,:);
Putting everything together you should be able to create the cell array Y containing your matrices and continue with the other tasks. You could also save the average of each value range in a second column of the cell array Y:
Y{k,2} = mean( Y{k,1}(:,2) );
I hope this helps you with your task. Note that these are only suggestions and there may be different (maybe more appropriate) ways to handle this.
Going nuts with cell array, because I just can't get rid of it... However, it will be an easy one for you guys out here.
So here is why:
I have a dataset (data) which contains two variables: A (Numbers) and B (cell array).
Unfortunately I can't even reconstruct the problem nevertheless my imported table looks like this:
data=dataset;
data.A = [1;1;3;3;3];
data.B = ['A';'A';'BUU';'BUU';'A'];
where data.B is of the type 5x1 cell which I can't reconstruct
all I want now is the unique rows like
ans= [1 A;3 BUU;3 A]
the result should be in a dataset or just two vectors where the rows are equivalent.
but unique([dataA dataB],'rows') can't handle cell arrays and I can't find anywhere in the www how I simple convert the cell array B to a vector of strings (does it exist?).
cell2mat() didn't work for me, because of the different word length ('A' vs 'BUU').
Though, two things I would love to learn: Making an 5x1 cell to an string vector
and find unique rows out of numbers and strings (or cells).
Thank you very much!
Cheers Dominik
The problem is that the A and B fields are of a different type. Although they could be concatenated into a cell array, unique can't handle that. A general trick for cases like this is to "translate" elements of each field (column) to unique identifiers, i.e. numbers. This translation can be done applying unique to each field separately and getting its third output. The obtained identifiers can now be concatenated into a matrix, so that each row of this matrix is a "composite identifier". Finally, unique with 'rows' option can be applied to this matrix.
So, in your case:
[~, ~, kA] = unique(data.A);
[~, ~, kB] = unique(data.B);
[~, jR] = unique([kA kB], 'rows');
Now build the result as (same format as data)
result.A = data.A(jR);
result.B = data.B(jR);
or as (2D cell array)
result = cat(2, mat2cell(data.A(jR), ones(1,numel(jR))), data.B(jR));
Here is my clumpsy solution
tt.A = [1;1;3;3;3];
tt.B = {'A';'A';'BUU';'BUU';'A'};
Convert integers to characters, then merge and find unique strings
tt.C = cellstr(num2str(tt.A));
tt.D = cellfun(#(x,y) [x y],tt.C,tt.B,'UniformOutput',0);
[tt.F,tt.E] = unique(tt.D);
Display results
tt.F
My array contains a string in the first row
how can I sum the array from the 2nd row to the Nth/1442th row (as in my example) disregarding the negative signs present in the column?
for example, my code for an array called data2 is:
S = sum(data2(2,15):data2(1442,15));
so sum all of the elements from row 2 to row 1442 in column 15.
This doesn't work but it also does not have anything to deal with the absolute value of whatever row its checking
data is from a .csv:
You should do something like this:
sum(abs(data(2:1442,15)));
The abs function will find the absolute value of each value in the array (i.e. disregard the negative sign). data(2:1442,15) will grab rows 2-1442 of the 15th column, as you wanted.
EDIT: apparently data is a cell array, so you could do the following, I think:
sum(abs([data{2:1442,15}]));
Ok so it looks like you have a constant column so
data2(2,15) = -0.02
and further down
data2(1442,15) = -0.02 %(I would assume)
So when you form:
data2(2,15):data2(1442,15)
this is essential like trying to create an array but of a single value since:
-0.02:-0.02
ans =
-0.0200
which of course gives:
>> sum(-0.02:-0.02)
ans =
-0.0200
What you want should be more like:
sum(data2(2:1442,15))
That way, the index: 2:1442, forms a vector of all the row references for you.
To disregard the negative values:
your answer = sum(abs(data2(2:1442,15)))
EDIT: For a cell array this works:
sum(abs(cell2mat(data2(2:1442,15))))