textscan() reading result is a nested cell array? - arrays

I have a data file containing 100 lines with the following format
0,device1,3
1,device2,33
2,device3,3
3,device4,34
...
99,device100,36
Now I wish to read them into a 100x3 cell array in MATLAB. I did the following:
allData = textscan(fID,'%s %s %f', 'delimiter', ',');
Then, I noticed that allData is a 1x3 cell array with each item being another 100x1 cell array. (The first two columns are string-type cell arrays, whereas the third column is double-type cell array)
In other words, the reading result is a nested array, which I don't want.
How may I achieve 100x3 cell array directly while reading?

With that textscan, the variable allData looks something like (just 4 rows) this:
allData =
{4x1 cell} {4x1 cell} [4x1 double]
You can only merge into a single cell array directly with textscan via the 'CollectOutput' option when all data has the same type.
One possible workaround, which unfortunately converts all numeric data to double (not a problem in your case),
C = cell(numel(allData{1}),numel(allData));
areCells = cellfun(#iscell,allData);
C(:,areCells) = [allData{areCells}];
C(:,~areCells) = num2cell([allData{~areCells}])
C =
'0' 'device1' [ 3]
'1' 'device2' [33]
'2' 'device3' [ 3]
'3' 'device4' [34]
Again, the drawback of this is that the last statement will convert all non-cell types (e.g. uint8, char, etc.) into doubles. To avoid this possible conversion:
% after copying cell array data (areCells) as above, but before ~areCells data
Cn = arrayfun(#(ii)num2cell(allData{ii}),find(~areCells),'uni',0);
C(:,~areCells) = [Cn{:}];

Code -
sz = 100; % Line count
out=cell(sz,size(allData,2));
for k = 1:size(allData,2)
t1 = allData(k);
t2 = [t1{:}];
if isnumeric(t2) % Takes care of floats
out(:,k) = num2cell(t2);
else
out(:,k) = t2
end
end
Thus, the first four lines would be shown as -
out =
'0' 'device1' [ 3]
'1' 'device2' [33]
'2' 'device3' [ 3]
'3' 'device4' [34]

Related

Filter MATLAB non-numerical array data based on criteria

Two questions, one fairly simple question (at least it seems it should be simple) and one that may take a bit more work. Feel free to contribute to either or both.
First, I'd like to create a string array based off of an existing string array based on a criteria. Take for example a similar operation with a double array:
>> nums = [ 1 2 1 2]
nums =
1 2 1 2
>> big_nums = (nums == 2) .* nums
big_nums =
0 2 0 2
I'd like to do something similar with a string array, however I don't know what function to use:
>> sizes = ["XL" "L" "XL" "L"]
sizes =
1×4 string array
"XL" "L" "XL" "L"
>> large_sizes = (sizes == "L") .* sizes
Undefined operator '.*' for input arguments of type 'string'.
I'd like the output to be
large_sizes =
1×4 string array
"" "L" "" "L"
Second question. Suppose I have a 2 dimensional cell array. I'd like to filter data based on criteria:
>> data = {"winter", 1; "spring", 2; "summer", 3; "fall", 4}
data =
4×2 cell array
["winter"] [1]
["spring"] [2]
["summer"] [3]
["fall" ] [4]
>> nice_weather = ( (data(1,:) == "fall") + (data(1,:) == "spring") ) .* data
Error using ==
Cell must be a cell array of character vectors.
I'd like a code that results in one of two arrays:
nice_weather =
4×2 cell array
[""] [1]
["spring"] [2]
[""] [3]
["fall"] [4]
----- OR -----
nice_weather =
2×2 cell array
["spring"] [2]
["fall"] [4]
For this question, I am also open to separating data into multiple arrays (for example, one array for strings and one array for numbers).
Thanks!
This solution uses the strcmpi function from MATLAB (no toolbox required) to compare two strings (insensitive to the case).
1D Cell Array:
sizes = {'XL' 'L' 'XL' 'L'}; % Changed " to ' & used cell array
idx = strcmpi(sizes,'L'); % Logical index
sizelist = {sizes{idx}}
Or you could try something like
sizes(~idx) = {"" ""} % manual just for example
For this to automatically adjust the number of blanks "", you could use repmat like this
sizes(~idx) = repmat({""},1,sum(~idx))
Output:
sizes = 1×4 cell array
{[""]} {'L'} {[""]} {'L'}
2D Cell Array:
data = {'winter', 1; 'spring', 2; 'summer', 3; 'fall', 4}; % Changed " to '
nicemo1 = 'spring';
nicemo2 = 'fall';
idx = strcmpi(data(:,1),nicemo1) | strcmp(data(:,1),nicemo2); % Obtain logical index
data(idx,:)
Output:
ans = 2×2 cell array
{'spring'} {[2]}
{'fall' } {[4]}
Tested with MATLAB R2018b.
Also beware variables like sizes as dropping a letter masks a useful function, size.

Matlab- split cell array column by delimiter

I have many 33213168x1 cell arrays, where each cell contains an 85 x 1 column.
Each cell in the column is in the form
[0.55;0.25;0.75]
[0.33;0.66;0.99]
I want to split up this single column by the semi-colon delimiter so that each cell in the cell array is 85x3, like:
[0.55][0.25][0.75]
[0.33][0.66][0.99]
I've tried numerous techniques to solve this, but most commonly get the errors 'cell elements must be character arrays' or 'input must be a string.'
Some of the approaches I've tried:
splitcells = strsplit(regress_original_053108,';');
splitcells = cellfun(#(x) strsplit(regress_original_053108, ';'),regress_original_053108 , 'UniformOutput',0);
splitcells = regexp(regress_original_053108, ';', 'split');
splitcells = textscan(regress_original_053108, 'delimiter', ';');
Etc. Any feedback about how to do this would be appreciated.
Hope this solves your problem:
% Example input
input = {[0.55;0.25;0.75]};
cellArray(1:85,1) = input;
% Create array
doubleArray = zeros(85,3);
% Fill array
for i=1:85
doubleArray(i,:) = cellArray{i,1}';
end
Each cell you have is not a string, hence you can't use strsplit. Use this approach:
for ii = length(X) % Here X denotes your 33213168x1 cell array
X{ii} = cell2mat(cellfun(#(y) y.', X{ii}, 'UniformOutput', false));
end

How do I convert a cell array w/ different data formats to a matrix in Matlab?

So my main objective is to take a matrix of form
matrix = [a, 1; b, 2; c, 3]
and a list of identifiers in matrix[:,1]
list = [a; c]
and generate a new matrix
new_matrix = [a, 1;c, 3]
My problem is I need to import the data that would be used in 'matrix' from a tab-delimited text file. To get this data into Matlab I use the code:
matrix_open = fopen(fn_matrix, 'r');
matrix = textscan(matrix_open, '%c %d', 'Delimiter', '\t');
which outputs a cell array of two 3x1 arrays. I want to get this into one 3x2 matrix where the first column is a character, and the second column an integer (these data formats will be different in my implementation).
So far I've tried the code:
matrix_1 = cell2mat(matrix(1,1));
matrix_2 = cell2mat(matrix(1,2));
matrix = horzcat(matrix_1, matrix_2)
but this is returning a 3x2 matrix where the second column is empty.
If I just use
cell2mat(matrix)
it says it can't do it because of the different data formats.
Thanks!
This is the help of matlab for the cell2mat function:
cell2mat Convert the contents of a cell array into a single matrix.
M = cell2mat(C) converts a multidimensional cell array with contents of
the same data type into a single matrix. The contents of C must be able
to concatenate into a hyperrectangle. Moreover, for each pair of
neighboring cells, the dimensions of the cell's contents must match,
excluding the dimension in which the cells are neighbors. This constraint
must hold true for neighboring cells along all of the cell array's
dimensions.
From what I understand the contents you want to put in a matrix should be of the same type otherwise why do you want a matrix? you could simply create a new cell array.
It's not possible to have a normal matrix with characters and numbers. That's why cell2mat won't work here. But you can store different datatypes in a cell-array. Use cellstr for the strings/characters and num2cell for the integers to convert the contents of matrix. If you have other datatypes, use an appropriate function for this step. Then assign them to the columns of an empty cell-array.
Here is the code:
fn_matrix = 'data.txt';
matrix_open = fopen(fn_matrix, 'r');
matrix = textscan(matrix_open, '%c %d', 'Delimiter', '\t');
X = cell(size(matrix{1},1),2);
X(:,1) = cellstr(matrix{1});
X(:,2) = num2cell(matrix{2});
The result:
X =
'a' [1]
'b' [2]
'c' [3]
Now we can do the second part of the question. Extracting the entries where the letter matches with one of the list. Therefore you can use ismember and logical indexing like this:
list = ['a'; 'c'];
sel = ismember(X(:,1),list);
Y(:,1) = X(sel,1);
Y(:,2) = X(sel,2);
The result here:
Y =
'a' [1]
'c' [3]

Get indices of string occurrences in cell-array

I have a cell array that contains a long list of strings. Most of the strings are in duplicates. I need the indices of instances of a string within the cell array.
I tried the following:
[bool,ind] = ismember(string,var);
Which consistently returns scalar ind while there are clearly more than one index for which the contents in the cell array matches string.
How can I have a list of indices that points to the locations in the cell array that contains string?
As an alternative to Divakar's comment, you could use strcmp. This works even if some cell doesn't contain a string:
>> strcmp('aaa', {'aaa', 'bb', 'aaa', 'c', 25, [1 2 3]})
ans =
1 0 1 0 0 0
Alternatively, you can ID each string and thus have representative numeric arrays corresponding to the input cell array and string. For IDing, you can use unique and then use find as you would with numeric arrays. Here's how you can achieve that -
var_ext = [var string]
[~,~,idx] = unique(var_ext)
out = find(idx(1:end-1)==idx(end))
Breakdown of the code:
var_ext = [var string]: Concatenate everything (string and var) into a single cell array, with the string ending up at the end (last element) of it.
[~,~,idx] = unique(var_ext): ID everything in that concatenated cell array.
find(idx(1:end-1)==idx(end)): idx(1:end-1) represents the numeric IDs for the cell array elements and idx(end) would be the ID for the string. Compare these IDs and use find to pick up the matching indices to give us the final output.
Sample run -
Inputs:
var = {'er','meh','nop','meh','ya','meh'}
string = 'meh'
Output:
out =
2
4
6
regexp would solve this problem better and the easy way.
string = ['my' 'bat' 'my' 'ball' 'my' 'score']
expression = ['my']
regexp(string,expresssion)
ans = 1 6 12

What is the difference between the data stored in a cell and the data stored as double in MATLAB?

I have two variables who look exactly the same to me, but one is <double> and the other is <cell>. In the code it seems that they are converted by cell2mat. I understand it is a question of data storage but I just don't see the difference and the definition of cell and double for this.
Adding to nrz's answer, it is noteworthy that there is an additional memory overhead when storing cell arrays. For instance, consider the following code:
A = 1:5
B = {A}
C = num2cell(A)
whos
which produces the following output:
A =
1 2 3 4 5
B =
[1x5 double]
C =
[1] [2] [3] [4] [5]
Name Size Bytes Class Attributes
A 1x5 40 double
B 1x1 152 cell
C 1x5 600 cell
As you can see from the first line, the basic 1-by-5 vector A of doubles takes 40 bytes in memory (each double takes 8 bytes).
The second line shows that just wrapping A with a single cell to produce B adds extra 112 bytes. That's the overhead of a single cell in MATLAB.
The third line confirms that, because C contains 5 cells and takes (112+8)×5 = 600 bytes.
Arrays and cell arrays are probably the two most commonly used data types in MATLAB.
1D and 2D arrays are matrices just like in mathematics, in linear algebra. But arrays can also be multidimensional (n-dimensional) arrays, also called tensors, MATLAB calls them multidimensional arrays. Further, MATLAB does not make any distinction between scalars and arrays, nor between vectors and other matrices. A scalar is just a 1x1 array in MATLAB, and vectors are Nx1 and 1xN arrays in MATLAB.
Some examples:
MyScalar = 1;
MyHorizVector = [ 1 2 3 ];
MyVertVector = [ 1 2 3 ]';
MyMatrix = [ 1, 2; 3, 4 ];
My4Darray = cat(4, [ 1 2; 3 4], [ 5 6; 7 8 ], [ 9 10; 11 12 ], [ 13 14; 15 16 ]);
class(MyScalar)
ans =
double
class(MyHorizVector)
ans =
double
class(MyVertVector)
ans =
double
class(MyMatrix)
ans =
double
class(My4Darray)
ans =
double
So, the class of all these 5 different arrays is double, as reported by class command. double means the numeric precision used (double-precision).
The cell array is a more abstract concept. A cell array can hold one or more arrays, it can also hold other types of variables that are not arrays. A cell array can also hold other cell arrays which can again hold whatever a cell array can hold. So, cell arrays can also be stored recursively inside one another.
Cell arrays are useful for combining different objects into a single variable that can eg. be passed to a function or handled with cellfun. Each cell array consists of 1 or more cells. Any array can be converted to cell array using { } operators, the result is a 1x1 cell array. There are also mat2cell and num2cell commands available.
MyCellArrayContainingMyScalar = { MyScalar };
MyCellArrayContainingMyHorizVector = { MyHorizVector };
MyCellArrayContainingMyCellArrayContainingMyScalar = { MyCellArrayContainingMyScalar };
All cell arrays created above are 1x1 cell arrays.
class(MyCellArrayContainingMyScalar)
ans =
cell
class(MyCellArrayContainingMyHorizVector)
ans =
cell
class(MyCellArrayContainingMyCellArrayContainingMyScalar)
ans =
cell
But not all cell arrays can be converted into matrices using cell2mat, because a single cell array can hold several different data types that cannot exist in the same array.
These do work:
cell2mat(MyCellArrayContainingMyScalar)
ans =
1
cell2mat(MyCellArrayContainingMyHorizVector)
ans =
1 2 3
But this fails:
cell2mat(MyCellArrayContainingMyCellArrayContainingMyScalar);
Error using cell2mat (line 53)
Cannot support cell arrays containing cell arrays or objects.
But let's try a different kind of a cell array consisting of different arrays:
MyCellArray{1} = [ 1 2 3 ];
MyCellArray{2} = 'This is the 2nd cell of MyCellArray!';
class(MyCellArray)
ans =
cell
This cell array neither cannot be converted to an array by using cell2mat:
cell2mat(MyCellArray)
Error using cell2mat (line 46)
All contents of the input cell array must be of the same data type.
Hope this helps to get an idea.

Resources