Matlab- split cell array column by delimiter - arrays

I have many 33213168x1 cell arrays, where each cell contains an 85 x 1 column.
Each cell in the column is in the form
[0.55;0.25;0.75]
[0.33;0.66;0.99]
I want to split up this single column by the semi-colon delimiter so that each cell in the cell array is 85x3, like:
[0.55][0.25][0.75]
[0.33][0.66][0.99]
I've tried numerous techniques to solve this, but most commonly get the errors 'cell elements must be character arrays' or 'input must be a string.'
Some of the approaches I've tried:
splitcells = strsplit(regress_original_053108,';');
splitcells = cellfun(#(x) strsplit(regress_original_053108, ';'),regress_original_053108 , 'UniformOutput',0);
splitcells = regexp(regress_original_053108, ';', 'split');
splitcells = textscan(regress_original_053108, 'delimiter', ';');
Etc. Any feedback about how to do this would be appreciated.

Hope this solves your problem:
% Example input
input = {[0.55;0.25;0.75]};
cellArray(1:85,1) = input;
% Create array
doubleArray = zeros(85,3);
% Fill array
for i=1:85
doubleArray(i,:) = cellArray{i,1}';
end

Each cell you have is not a string, hence you can't use strsplit. Use this approach:
for ii = length(X) % Here X denotes your 33213168x1 cell array
X{ii} = cell2mat(cellfun(#(y) y.', X{ii}, 'UniformOutput', false));
end

Related

MATLAB Search Within a Cell Array of Cells

Setup:
I have a 21 x 3 cell array.
The first 2 columns are USUALLY strings or char arrays, but could be 1xn cells of strings or char arrays (if there are multiple alternate strings that mean the same thing in the context of my script). The 3rd element is a number.
I'm looking to return the index of any EXACT match of with a string or char array (but type doesn't have to match) contained in this cell array in column 1, and if column 1 doesn't match, then column 2.
I can use the following:
find(strcmp( 'example', celllist(:,1) ))
find(strcmp( 'example', celllist(:,2) ))
And these will match the corresponding indices with any strings / char arrays in the top level cell array. This won't, of course, match any strings that are inside of cells of strings inside the top level cell array.
Is there an elegant way to match those strings (that is, without using a for, while, or similar loop)? I want it to return the index of the main cell array (1 through 21) if the cells contains the match OR the cell within the cell contains an exact match in ANY of its cells.
The cellstr function is your friend, since it converts all of the following to a cell array of chars:
chars e.g. cellstr( 'abc' ) => {'abc'}
cells of chars e.g. cellstr( {'abc','def'} ) => {'abc','def'}
strings e.g. cellstr( "abc" ) => {'abc'}
string arrays e.g. cellstr( ["abc", "def"] ) => {'abc','def'}
Then you don't have to care about variable types, and can just do an ismember check on every element, which we can assume is a cell of chars.
We can set up a test:
testStr = 'example';
arr = { 'abc', 'def', {'example','ghi'}, "jkl", "example" };
% Expected output is [0,0,1,0,1]
Doing this with a loop to better understand the logic would look like this:
isMatch = false(1,numel(arr)); % initialise output
for ii = 1:numel(arr) % loop over main array
x = cellstr(arr{ii}); % convert to cellstr
isMatch(ii) = any( ismember( testStr, x ) ); % check if any sub-element is match
end
If you want to avoid loops* then you can do this one-liner instead using cellfun
isMatch = cellfun( #(x) any( ismember( testStr, cellstr(x) ) ), arr );
% >> isMatch = [0 0 1 0 1]
So for your case, you could run this on both columns and apply some simple logic to select the one you want
isMatchCol1 = cellfun( #(x) any( ismember( testStr, cellstr(x) ) ), arr(:,1) );
isMatchCol2 = cellfun( #(x) any( ismember( testStr, cellstr(x) ) ), arr(:,2) );
If you want the row index instead of a logical array, you can wrap the output with the find function, i.e. isMatchIdx = find(isMatch);.
*This only avoids loops visually, cellfun is basically a looping device in disguise, but it does save us initialising the output at least.

Add space before values in xticklabels (MATLAB)

I'm new to MATLAB and really struggling with their datatypes / conventions compared to other programming languages.
For instance, I have created a simple plot (e.g. using the peaks command) and simply want to include a padding space before all xticklabels. My MATLAB/pseudocode solution is thus:
labels = xticklabels; # Get labels
newlabels = xticklabels; # Create new array
i = 1
for label in labels # Loop through all labels
label = ' ' + label # Add single character pad
newlabels(i) = label # Update new labels array
i = i + 1
set(gca,'XTickLabel', {newlabels}) # Set plot to use new array
How can I achieve this please? I feel like it should be possible quite simply
Thanks!
PS, I have found the pad command in MATLAB2017, but not all my xticklabels are equal length and hence, I only want to add one trailing space, not fix the total string length using pad
The simplest way, given a cell array of strings, is to use strcat:
labels = {'1','2','3','4'};
newlabels = strcat('x',labels); % append 'x' because it's more visible
Result:
newlabels =
{
[1,1] = x1
[1,2] = x2
[1,3] = x3
[1,4] = x4
}
Alternatively, you could loop through the cell array and concatenate to each char array:
newlabels = cell(size(labels)); % preallocate cell array
for k = 1:numel(labels)
newlabels{k} = ['x', labels{k}]; % concatenate new char to existing label
end

How to transform 'double' into 'cell array'?

My code:
B = zeros(height(A),1);
col_names = A.Properties.VariableNames; % Replicate header names
for k = 1:height(A)
% the following 'cellfun' compares each column to the values in A.L{k},
% and returns a cell array of the result for each of them, then
% 'cell2mat' converts it to logical array, and 'any' combines the
% results for all elements in A.L{k} to one logical vector:
C = any(cell2mat(...
cellfun(#(x) strcmp(col_names,x),A.L{k},...
'UniformOutput', false).'),1);
% then a logical indexing is used to define the columns for summation:
B(k) = sum(A{k,C});
end
generates the following error message.
Error using cellfun
Input #2 expected to be a cell array, was double instead.
How do I solve this error?
This is how table 'A' looks like:
A.L{1,1} contains:
C = any(cell2mat(...
cellfun(#(x) strcmp(col_names,x),A.L{k},...
'UniformOutput', false).'),1);
here A.L{k} gets the contents of the cell located at the kth position of A.L. Using A.L(k) you get the cell itself which is located at A.L:
tmp = A.L(k);
C = any(cell2mat(...
cellfun(#(x) strcmp(col_names,x),tmp{1},...
'UniformOutput', false).'),1);
Bit of a hacky way, as you first need to get the cell at A.L(k) and then need the contents of that cell, so you need a temporary variable.
I'm not entirely sure quite what's going on here, but here's a fabricated example that I think is similar to what you're trying to achieve.
%% Setup - fabricate some data
colNames = {'xx', 'yy', 'zz', 'qq'};
h = 20;
% It looks like 'L' contains something related to the column names
% so I'm going to build something like that.
L = repmat(colNames, h, 1);
% Empty some rows out completely
L(rand(h,1) > 0.7, :) = {''};
% Empty some other cells out at random
L(rand(numel(L), 1) > 0.8) = {''};
A = table(L, rand(h,1), rand(h, 1), rand(h, 1), rand(h, 1), ...
'VariableNames', ['L', colNames]);
%% Attempt to process each row
varNames = A.Properties.VariableNames;
B = zeros(height(A), 1);
for k = 1:height(A)
% I think this is what's required - work out which columns are
% named in "A.L(k,:)". This can be done simply by using ISMEMBER
% on the row of A.L.
C = ismember(varNames, A.L(k,:));
B(k) = sum(A{k, C});
end
If I'm completely off-course here, then perhaps you could give us an executable example.

How do I convert a cell array w/ different data formats to a matrix in Matlab?

So my main objective is to take a matrix of form
matrix = [a, 1; b, 2; c, 3]
and a list of identifiers in matrix[:,1]
list = [a; c]
and generate a new matrix
new_matrix = [a, 1;c, 3]
My problem is I need to import the data that would be used in 'matrix' from a tab-delimited text file. To get this data into Matlab I use the code:
matrix_open = fopen(fn_matrix, 'r');
matrix = textscan(matrix_open, '%c %d', 'Delimiter', '\t');
which outputs a cell array of two 3x1 arrays. I want to get this into one 3x2 matrix where the first column is a character, and the second column an integer (these data formats will be different in my implementation).
So far I've tried the code:
matrix_1 = cell2mat(matrix(1,1));
matrix_2 = cell2mat(matrix(1,2));
matrix = horzcat(matrix_1, matrix_2)
but this is returning a 3x2 matrix where the second column is empty.
If I just use
cell2mat(matrix)
it says it can't do it because of the different data formats.
Thanks!
This is the help of matlab for the cell2mat function:
cell2mat Convert the contents of a cell array into a single matrix.
M = cell2mat(C) converts a multidimensional cell array with contents of
the same data type into a single matrix. The contents of C must be able
to concatenate into a hyperrectangle. Moreover, for each pair of
neighboring cells, the dimensions of the cell's contents must match,
excluding the dimension in which the cells are neighbors. This constraint
must hold true for neighboring cells along all of the cell array's
dimensions.
From what I understand the contents you want to put in a matrix should be of the same type otherwise why do you want a matrix? you could simply create a new cell array.
It's not possible to have a normal matrix with characters and numbers. That's why cell2mat won't work here. But you can store different datatypes in a cell-array. Use cellstr for the strings/characters and num2cell for the integers to convert the contents of matrix. If you have other datatypes, use an appropriate function for this step. Then assign them to the columns of an empty cell-array.
Here is the code:
fn_matrix = 'data.txt';
matrix_open = fopen(fn_matrix, 'r');
matrix = textscan(matrix_open, '%c %d', 'Delimiter', '\t');
X = cell(size(matrix{1},1),2);
X(:,1) = cellstr(matrix{1});
X(:,2) = num2cell(matrix{2});
The result:
X =
'a' [1]
'b' [2]
'c' [3]
Now we can do the second part of the question. Extracting the entries where the letter matches with one of the list. Therefore you can use ismember and logical indexing like this:
list = ['a'; 'c'];
sel = ismember(X(:,1),list);
Y(:,1) = X(sel,1);
Y(:,2) = X(sel,2);
The result here:
Y =
'a' [1]
'c' [3]

Get indices of string occurrences in cell-array

I have a cell array that contains a long list of strings. Most of the strings are in duplicates. I need the indices of instances of a string within the cell array.
I tried the following:
[bool,ind] = ismember(string,var);
Which consistently returns scalar ind while there are clearly more than one index for which the contents in the cell array matches string.
How can I have a list of indices that points to the locations in the cell array that contains string?
As an alternative to Divakar's comment, you could use strcmp. This works even if some cell doesn't contain a string:
>> strcmp('aaa', {'aaa', 'bb', 'aaa', 'c', 25, [1 2 3]})
ans =
1 0 1 0 0 0
Alternatively, you can ID each string and thus have representative numeric arrays corresponding to the input cell array and string. For IDing, you can use unique and then use find as you would with numeric arrays. Here's how you can achieve that -
var_ext = [var string]
[~,~,idx] = unique(var_ext)
out = find(idx(1:end-1)==idx(end))
Breakdown of the code:
var_ext = [var string]: Concatenate everything (string and var) into a single cell array, with the string ending up at the end (last element) of it.
[~,~,idx] = unique(var_ext): ID everything in that concatenated cell array.
find(idx(1:end-1)==idx(end)): idx(1:end-1) represents the numeric IDs for the cell array elements and idx(end) would be the ID for the string. Compare these IDs and use find to pick up the matching indices to give us the final output.
Sample run -
Inputs:
var = {'er','meh','nop','meh','ya','meh'}
string = 'meh'
Output:
out =
2
4
6
regexp would solve this problem better and the easy way.
string = ['my' 'bat' 'my' 'ball' 'my' 'score']
expression = ['my']
regexp(string,expresssion)
ans = 1 6 12

Resources