Non-empty size information for cell arrays - arrays

In MATLAB I would like to keep a list of polylines - containing vertices (x,y) - in a container and I thought the best idea is to use cell arrays for this task. Each line would be represented in a row in a cell array, with vertices (x,y) being the elements of the cells. The different lines would be of different length, that's why I thought it would be a good idea to use cell arrays.
My problem however is that I don't know how can I append to the first non-empty element of each row in a cell-array?
Here is an example:
cell{1,1} = 1
cell{2,1} = 2
cell{3,1} = 3
cell{2,2} = 4
cell{2,3} = 5
cell =
[1] [] []
[2] [4] [5]
[3] [] []
For example now I want to append a new element to the end of row 1, and another one to row 2. How do I know what is the first position where I can append a new element?
Or shell I use cell arrays inside cell arrays for this task?
How would you implement a container for a list of polylines MATLAB?

This is a bad way to store your data, for the very problems you're encountering. A couple notes:
The first column is used as an index (i.e. 1 for polyline 1, 2 for polyline 2, etc.), which is unnecessary since that info is already stored implicitly in the structure of your data.
With this method, points will have to be stacked next to each other, which will be a nightmare for indexing.
With each x and y in a different cell, it's going to be an unneeded hassle to plot/store even a single point.
There are 2 good ways to store all this information.
Cell array: Like Clement pointed out, this is nice and simple, and will let you stack different points in the same polyline along a second dimension.
celldata = {[] [4 5] []};
celldata{2} = [celldata{2}; 1 1];
celldata{3} = [0.5 0.5];
>> celldata
celldata =
[] [2x2 double] [1x2 double]
Structure array: This is a nice way to go if you want to store polyline-level metadata along with your points.
strucdata = struct('points', {[] [4 5] []}, 'info', {'blah', 'blah', 'blah'});
strucdata(2).points = [strucdata(2).points; 1 1];
strucdata(3).points = [0.5 0.5];
>> strucdata
strucdata =
1x3 struct array with fields:
points
info
>> strucdata(2)
ans =
points: [2x2 double]
info: 'blah'

To answer your first question, you can use this:
n=1;
length([cell{n,:}])+1
n=2;
length([cell{n,:}])+1
With [...] you treat the cell slice as an array and not several scalar values.

Related

Concatenate cell array in MATLAB

In Matlab you can concatenate arrays by saying:
a=[];
a=[a,1];
How do you do something similar with a cell array?
a={};
a={a,'asd'};
The code above keeps on nesting cells within cells. I just want to append elements to the cell array. How do I accomplish this?
If a and b are cell arrays, then you concatenate them in the same way you concatenate other arrays: using []:
>> a={1,'f'}
a =
1×2 cell array
{[1]} {'f'}
>> b={'q',5}
b =
1×2 cell array
{'q'} {[5]}
>> [a,b]
ans =
1×4 cell array
{[1]} {'f'} {'q'} {[5]}
You can also use the functional form, cat, in which you can select along which dimension you want to concatenate:
>> cat(3,a,b)
1×2×2 cell array
ans(:,:,1) =
{[1]} {'f'}
ans(:,:,2) =
{'q'} {[5]}
To append a single element, you can do a=[a,{1}], but this is not efficient (see this Q&A). Instead, do a{end+1}=1 or a(end+1)={1}.
Remember that a cell array is just an array, like any other. You use the same tools to manipulate them, including indexing, which you do with (). The () indexing returns the same type of array as the one you index into, so it returns a cell array, even if you index just a single element. Just about every value in MATLAB is an array, including 6, which is a 1x1 double array.
The {} syntax is used to create a cell array, and to extract its content: a{1} is not a cell array, it extracts the contents of the first element of the array.
{5, 8, 3} is the same as [{5}, {8}, {3}]. 5 is a double array, {5} is a cell array containing a double array.
a{5} = 0 is the same as a(5) = {0}.

How to convert two associated arrays so that elements are evenly distributed?

There are two arrays, an array of images and an array of the corresponding labels. (e.g pictures of figures and it's values)
The occurrences in the labels are unevenly distributed.
What I want is to cut both arrays in such a way, that the labels are evenly distributed. E.g. every label occurs 2 times.
To test I've just created two 1D arrays and it was working:
labels = np.array([1, 2, 3, 3, 1, 2, 1, 3, 1, 3, 1,])
images = np.array(['A','B','C','C','A','B','A','C','A','C','A',])
x, y = zip(*sorted(zip(images, labels)))
label = list(set(y))
new_images = []
new_labels = []
amount = 2
for i in label:
start = y.index(i)
stop = start + amount
new_images = np.append(new_images, x[start: stop])
new_labels = np.append(new_labels, y[start: stop])
What I get/want is this:
new_labels: [ 1. 1. 2. 2. 3. 3.]
new_images: ['A' 'A' 'B' 'B' 'C' 'C']
(It is not necessary, that the arrays are sorted)
But when I tried it with the right data (images.shape = (35000, 32, 32, 3), labels.shape = (35000)) I've got an error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This does not help me a lot:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I think that my solution is quite dirty anyhow. Is there a way to do it right?
Thank you very much in advance!
When your labels are equal, the sort function tries to sort on the second value of the tuples it has as input, since this is an array in the case of your real data, (instead of the 1D data), it cannot compare them and raises this error.
Let me explain it a bit more detailed:
x, y = zip(*sorted(zip(images, labels)))
First, you zip your images and labels. What this means, is that you create tuples with the corresponding elements of images and lables. The first element from images by the first element of labels, etc.
In case of your real data, each label is paired with an array with shape (32, 32, 3).
Second you sort all those tuples. This function tries first to sort on the first element of the tuple. However, when they are equal, it will try to sort on the second element of the tuples. Since they are arrays it cannot compare them en throws an error.
You can solve this by explicitly telling the sorted function to only sort on the first tuple element.
x, y = zip(*sorted(zip(images, labels), key=lambda x: x[0]))
If performance is required, using itemgetter will be faster.
from operator import itemgetter
x, y = zip(*sorted(zip(images, labels), key=itemgetter(0)))

Form array of equivalence related classes

I have an array in Matlab. I numbered every entry in array with natural number. So I formed equivalence relation in array.
For example,
array = [1 2 3 5 6 7]
classes = [1 2 1 1 3 3].
I want to get cell array: i-th cell array's position is connected with i-th entry of initial array and shows, which elements are in the one class with this entry. For the example above, I would get:
{[1 3 5], [2], [1 3 5], [1 3 5], [6 7], [6 7]}
It can be done easily with for-loop, but is there any other solution? It will be good if it works faster than O(n^2), where n is the size of initial array.
Edit.
Problem will be solved, if I know the approach to split sorted array into cells with indeces of equal elements by O(n).
array = [1 1 1 2 3 3]
groups = {[1 2 3], [4], [5 6]}
Not sure about complexity, but accumarray with cell output is useful for splitting up the array based on unique values of the classes:
data = sortrows([classes; array].',1) %' stable w.r.t. array
arrayPieces = accumarray(data(:,1),data(:,2)',[],#(x){x.'})
classElements = arrayPieces(classes).'
Regarding sorted array splitting into cells of indeces:
>> array = [1 1 1 2 3 3]
>> arrayinds = accumarray(array',1:numel(array),[],#(x){x'})' %' transpose for rows
arrayinds =
[1x3 double] [4] [1x2 double]
>> arrayinds{:}
ans =
1 2 3
ans =
4
ans =
5 6
I don't know how to do this without for-loops entirely, but you can use a combination of sort, diff, and find to organize and partition the equivalence class identifiers. That'll give you a mostly vectorized solution, where the M-code level for-loop is O(n) where n is the number of classes, not the length of the whole input array. This should be pretty fast in practice.
Here's a rough example using some index munging. Be careful; there's probably an off-by-one edge case bug in there somewhere since I just banged this out.
function [eqVals,eqIx] = equivsets(a,x)
%EQUIVSETS Find indexes of equivalent values
[b,ix] = sort(x);
ixEdges = find(diff(b)); % identifies partitions between equiv classes
ix2 = [0 ixEdges numel(ix)];
eqVals = cell([1 numel(ix2)-1]);
eqIx = cell([1 numel(ix2)-1]);
% Map back to original input indexes and values
for i = 1:numel(ix2)-1
eqIx{i} = ix((ix2(i)+1):ix2(i+1));
eqVals{i} = a(eqIx{i});
end
I included the indexes in the output because they're often more useful than the values themselves. You'd call it like this.
% Get indexes of occurrences of each class
equivs = equivsets(array, classes)
% You can expand that to get equivalences for each input element
equivsByValue = equivs(classes)
It's a lot more efficient to build the lists for each class first and then expand them out to match the input indexes. Not only do you have to do the work just once, but when you use the b = a(ix) to expand a small cell array to a larger one, Matlab's copy-on-write optimization will end up reusing the memory for the underlying numeric mxArrays so you get a more compact representation in memory.
This transformation pops up a lot when working with unique() or databases. For decision support systems and data warehouse style things I've worked with, it happens all over the place. I wish it were built in to Matlab. (And maybe it's been added to one of the db or timeseries toolboxes in recent years; I'm a few versions behind.)
Realistically, if performance of this is critical for your code, you might also look at dropping down to Java or C MEX functions and implementing it there. But if your data sets are low cardinality - that is, have a small number of classes/distinct values, like numel(unique(classes)) / numel(array) tends to be less than 0.1 or so - the M-code implementation will probably be just fine.
For the second question:
array = [1 1 1 2 3 3]; %// example data
Use diff to find the end of each run of equal values, and from that build the groups:
ind = [0 find(diff([array NaN])~=0)];
groups = arrayfun(#(n) ind(n)+1:ind(n+1), 1:numel(ind)-1, 'uni', 0);
Same approach using unique:
[~, ind] = unique(array);
ind = [0 ind];
groups = arrayfun(#(n) ind(n)+1:ind(n+1), 1:numel(ind)-1, 'uni', 0);
I haven't tested if the complexity is O(n), though.

What is the difference between the data stored in a cell and the data stored as double in MATLAB?

I have two variables who look exactly the same to me, but one is <double> and the other is <cell>. In the code it seems that they are converted by cell2mat. I understand it is a question of data storage but I just don't see the difference and the definition of cell and double for this.
Adding to nrz's answer, it is noteworthy that there is an additional memory overhead when storing cell arrays. For instance, consider the following code:
A = 1:5
B = {A}
C = num2cell(A)
whos
which produces the following output:
A =
1 2 3 4 5
B =
[1x5 double]
C =
[1] [2] [3] [4] [5]
Name Size Bytes Class Attributes
A 1x5 40 double
B 1x1 152 cell
C 1x5 600 cell
As you can see from the first line, the basic 1-by-5 vector A of doubles takes 40 bytes in memory (each double takes 8 bytes).
The second line shows that just wrapping A with a single cell to produce B adds extra 112 bytes. That's the overhead of a single cell in MATLAB.
The third line confirms that, because C contains 5 cells and takes (112+8)×5 = 600 bytes.
Arrays and cell arrays are probably the two most commonly used data types in MATLAB.
1D and 2D arrays are matrices just like in mathematics, in linear algebra. But arrays can also be multidimensional (n-dimensional) arrays, also called tensors, MATLAB calls them multidimensional arrays. Further, MATLAB does not make any distinction between scalars and arrays, nor between vectors and other matrices. A scalar is just a 1x1 array in MATLAB, and vectors are Nx1 and 1xN arrays in MATLAB.
Some examples:
MyScalar = 1;
MyHorizVector = [ 1 2 3 ];
MyVertVector = [ 1 2 3 ]';
MyMatrix = [ 1, 2; 3, 4 ];
My4Darray = cat(4, [ 1 2; 3 4], [ 5 6; 7 8 ], [ 9 10; 11 12 ], [ 13 14; 15 16 ]);
class(MyScalar)
ans =
double
class(MyHorizVector)
ans =
double
class(MyVertVector)
ans =
double
class(MyMatrix)
ans =
double
class(My4Darray)
ans =
double
So, the class of all these 5 different arrays is double, as reported by class command. double means the numeric precision used (double-precision).
The cell array is a more abstract concept. A cell array can hold one or more arrays, it can also hold other types of variables that are not arrays. A cell array can also hold other cell arrays which can again hold whatever a cell array can hold. So, cell arrays can also be stored recursively inside one another.
Cell arrays are useful for combining different objects into a single variable that can eg. be passed to a function or handled with cellfun. Each cell array consists of 1 or more cells. Any array can be converted to cell array using { } operators, the result is a 1x1 cell array. There are also mat2cell and num2cell commands available.
MyCellArrayContainingMyScalar = { MyScalar };
MyCellArrayContainingMyHorizVector = { MyHorizVector };
MyCellArrayContainingMyCellArrayContainingMyScalar = { MyCellArrayContainingMyScalar };
All cell arrays created above are 1x1 cell arrays.
class(MyCellArrayContainingMyScalar)
ans =
cell
class(MyCellArrayContainingMyHorizVector)
ans =
cell
class(MyCellArrayContainingMyCellArrayContainingMyScalar)
ans =
cell
But not all cell arrays can be converted into matrices using cell2mat, because a single cell array can hold several different data types that cannot exist in the same array.
These do work:
cell2mat(MyCellArrayContainingMyScalar)
ans =
1
cell2mat(MyCellArrayContainingMyHorizVector)
ans =
1 2 3
But this fails:
cell2mat(MyCellArrayContainingMyCellArrayContainingMyScalar);
Error using cell2mat (line 53)
Cannot support cell arrays containing cell arrays or objects.
But let's try a different kind of a cell array consisting of different arrays:
MyCellArray{1} = [ 1 2 3 ];
MyCellArray{2} = 'This is the 2nd cell of MyCellArray!';
class(MyCellArray)
ans =
cell
This cell array neither cannot be converted to an array by using cell2mat:
cell2mat(MyCellArray)
Error using cell2mat (line 46)
All contents of the input cell array must be of the same data type.
Hope this helps to get an idea.

How do concatenation and indexing differ for cells and arrays in MATLAB?

I am a little confused about the usage of cells and arrays in MATLAB and would like some clarification on a few points. Here are my observations:
An array can dynamically adjust its own memory to allow for a dynamic number of elements, while cells seem to not act in the same way:
a=[]; a=[a 1]; b={}; b={b 1};
Several elements can be retrieved from cells, but it doesn't seem like they can be from arrays:
a={'1' '2'}; figure; plot(...); hold on; plot(...); legend(a{1:2});
b=['1' '2']; figure; plot(...); hold on; plot(...); legend(b(1:2));
%# b(1:2) is an array, not its elements, so it is wrong with legend.
Are these correct? What are some other different usages between cells and array?
Cell arrays can be a little tricky since you can use the [], (), and {} syntaxes in various ways for creating, concatenating, and indexing them, although they each do different things. Addressing your two points:
To grow a cell array, you can use one of the following syntaxes:
b = [b {1}]; % Make a cell with 1 in it, and append it to the existing
% cell array b using []
b = {b{:} 1}; % Get the contents of the cell array as a comma-separated
% list, then regroup them into a cell array along with a
% new value 1
b{end+1} = 1; % Append a new cell to the end of b using {}
b(end+1) = {1}; % Append a new cell to the end of b using ()
When you index a cell array with (), it returns a subset of cells in a cell array. When you index a cell array with {}, it returns a comma-separated list of the cell contents. For example:
b = {1 2 3 4 5}; % A 1-by-5 cell array
c = b(2:4); % A 1-by-3 cell array, equivalent to {2 3 4}
d = [b{2:4}]; % A 1-by-3 numeric array, equivalent to [2 3 4]
For d, the {} syntax extracts the contents of cells 2, 3, and 4 as a comma-separated list, then uses [] to collect these values into a numeric array. Therefore, b{2:4} is equivalent to writing b{2}, b{3}, b{4}, or 2, 3, 4.
With respect to your call to legend, the syntax legend(a{1:2}) is equivalent to legend(a{1}, a{2}), or legend('1', '2'). Thus two arguments (two separate characters) are passed to legend. The syntax legend(b(1:2)) passes a single argument, which is a 1-by-2 string '12'.
Every cell array is an array! From this answer:
[] is an array-related operator. An array can be of any type - array of numbers, char array (string), struct array or cell array. All elements in an array must be of the same type!
Example: [1,2,3,4]
{} is a type. Imagine you want to put items of different type into an array - a number and a string. This is possible with a trick - first put each item into a container {} and then make an array with these containers - cell array.
Example: [{1},{'Hallo'}] with shorthand notation {1, 'Hallo'}

Resources