How to create a new group of character array in matlab? - arrays

I have data stored in text files.The data is in 'cell array of string' after read it using textscan and contains various of colour name. Below is the content of my data:
name of colour
'lavender'
'lavenderblush'
'lemonchiffon'
'lightblue'
'lightcoral'
'lightcyan'
I want to create new array to group all color characters into the main color only (red, blue, orange, brown,etc).
I am really struggling to solve this problem. Thank you in advance for any help.
load_data = fopen('result.txt', 'r');
C = textscan(load_data, ' %s ');
fclose(load_data);
name = C{1,1};
group = char(name)
if group{:,1} == lavender
fprintf('purple');
else
fprintf('nothing');
end
This is my code but if I run this, always get error
Cell contents reference from a non-cell array object.

From your comment above, I assume you're having trouble with your code, not actually with the sorting of colors, or how to categorize them.
group is not a Cell array, it's a Char array. So in order to access its values you should use group(:,1) instead of group{:,1}
Remember that the number of columns in group is the number of characters in that line, normalized to the number of characters of the largest string in that set. So 2 issues here:
You can't use group(:,1), as it will get the first character of all the strings in that array. You should get the entire line for that string group(1,:). NB: I say string for simplicity, it's actually a char array.
'lavender' has only 10 characters, but it will have 15, as per the largest string. So the string comparison doesn't quite work, unless you add the extra blank spaces to compensate
You can try out the code below:
load_data = fopen('result.txt', 'r');
C = textscan(load_data, ' %s ');
fclose(load_data);
name = C{1,1};
if name{1,:} == '''lavender'''
fprintf('purple');
else
fprintf('nothing');
end
I assume that your TXT file actually has the string 'lavender', in this case I used character escape '''lavender'''.

The error you are getting is fairly clear. Once you have called char(name), you are working with a regular character array (i.e. a string) and no longer a cell array. Braces (i.e. {}) are used to index cell arrays so you should instead be using parentheses (i.e. ()). Note this actually happens as soon as you index using {} so C{1,1} would actually return a string already and the char line is thus redundant:
if group(:,1) == lavender
I suspect you actually want something more like
name = C{1};
group = char(name) %this line is redundant because C{1} already extracts a string
if strcmp(group,lavender)
fprintf('purple');
else
fprintf('nothing');
end
but it's impossible to say since you did not define the variable lavender.
I would also question what data structure you intend to use. I'm going to assume you actually have some way of categorizing the colours? I'm going to assume this is manual in which case I would suggest converting your txt file to a csv file and putting your manual colour categories as a second column but I'll leave the implementation details to you.
Lets say you have 3 colour categories for now, 'purple', 'blue', and 'orange', my suggestion is to use a logical matrix that has 3 columns (1 per colour category) and n rows where n is the number of rows in your text file (i.e. the number of colours you need to categorize).
Now I'm going to assume you have some sort of mapping function that can categorize your colours so map('lavender') returns 'purple' and map('lightcyan') returns 'blue'
First we should make a cell array of categories that we can use to map the category string to its column number:
categories = {'purple'
'blue'
'orange'}
and the result will go in the logical matrix categorized
load_data = fopen('result.txt', 'r');
C = textscan(load_data, ' %s ');
fclose(load_data);
n = numel(C);
categories = {'purple'
'blue'
'orange'};
categorized = false(n, numel(categories)); %preallocation
for row = 1:n
colour = C{row};
category = map(colour); %you need to implement this map function yourself.
categorized(row, strcmp(category, categories)) = true;
end

Related

Finding specific instance in a list when the list starts with a comma

I'm uploading a spreadsheet and mapping the spreadsheet column headings to those in my database. The email column is the only one that is required. In StringB below, the ,,, simply indicates that a column was skipped/ignored.
The meat of my question is this:
I have a string of text (StringA) comes from a spreadsheet that I need to find in another string of text (StringB) which matches my database (this is not the real values, just made it simple to illustrate my problem so hopefully this is clear).
StringA: YR,MNTH,ANNIVERSARIES,FIRSTNAME,LASTNAME,EMAIL,NOTES
StringB: ,YEAR,,MONTH,LastName,Email,Comments <-- this list is dynamic
MNTH and MONTH are intentionally different;
excelColumnList = 'YR,MNTH,ANNIV,FIRST NAME,LAST NAME,EMAIL,NOTES';
mappedColumnList= ',YEAR,,MONTH,,First Name,Last Name,Email,COMMENTS';
mappedColumn= 'Last Name';
local.index = ListFindNoCase(mappedColumnList, mappedColumn,',', true);
local.returnValue = "";
if ( local.index > 0 )
local.returnValue = ListGetAt(excelColumnList, local.index);
writedump(local.returnValue); // dumps "EMAIL" which is wrong
The problem I'm having is the index returned when StringB starts with a , returns the wrong index value which affects the mapping later. If StringB starts with a word, the process works perfectly. Is there a better way to to get the index when StringB starts with a ,?
I also tried using listtoarray and then arraytolist to clean it up but the index is still off and I cannot reliably just add +1 to the index to identify the correct item in the list.
On the other hand, I was considering this mappedColumnList = right(mappedColumnList,len(mappedColumnList)-1) to remove the leading , which still throws my index values off BUT I could account for that by adding 1 to the index and this appears to be reliably at first glance. Just concerned this is a sort of hack.
Any advice?
https://cfdocs.org/listfindnocase
Here is a cfgist: https://trycf.com/gist/4b087b40ae4cb4499c2b0ddf0727541b/lucee5?theme=monokai
UPDATED
I accepted the answer using EDIT #1. I also added a comment here: Finding specific instance in a list when the list starts with a comma
Identify and strip the "," off the list if it is the first character.
EDIT: Changed to a while loop to identify multiple leading ","s.
Try:
while(left(mappedColumnList,1) == ",") {
mappedColumnList = right( mappedColumnList,(len(mappedColumnList)-1) ) ;
}
https://trycf.com/gist/64287c72d5f54e1da294cc2c10b5ad86/acf2016?theme=monokai
EDIT 2: Or even better, if you don't mind dropping back into Java (and a little Regex), you can skip the loop completely. Super efficient.
mappedColumnList = mappedColumnList.replaceall("^(,*)","") ;
And then drop the while loop completely.
https://trycf.com/gist/346a005cdb72b844a83ca21eacb85035/acf2016?theme=monokai
<cfscript>
excelColumnList = 'YR,MNTH,ANNIV,FIRST NAME,LAST NAME,EMAIL,NOTES';
mappedColumnList= ',,,YEAR,MONTH,,First Name,Last Name,Email,COMMENTS';
mappedColumn= 'Last Name';
mappedColumnList = mappedColumnList.replaceall("^(,*)","") ;
local.index = ListFindNoCase(mappedColumnList, mappedColumn,',', true);
local.returnValue = ListGetAt(excelColumnList,local.index,",",true) ;
writeDump(local.returnValue);
</cfscript>
Explanation of the Regex ^(,*):
^ = Start at the beginning of the string.
() = Capture this group of characters
,* = A literal comma and all consecutive repeats.
So ^(,*) says, start at the beginning of the string and capture all consecutive commas until reaching the next non-matched character. Then the replaceall() just replaces that set of matched characters with an empty string.
EDIT 3: I fixed a typo in my original answer. I was only using one list.
writeOutput(arraytoList(listtoArray(mappedColumnList))) will get rid of your leading commas, but this is because it will drop empty elements before it becomes an array. This throws your indexing off because you have one empty element in your original mappedColumnList string. The later string functions will both read and index that empty element. So, to keep your indexes working like you see to, you'll either need to make sure that your Excel and db columns are always in the same order or you'll have to create some sort of mapping for each of the column names and then perform the ListGetAt() on the string you need to use.
By default many CF list functions ignore empty elements. A flag was added to these function so that you could disable this behavior. If you have string ,,1,2,3 by default listToArray would consider that 3 elements but listToArray(listVar, ",", true) will return 5 with first two as empty strings. ListGetAt has the same "includeEmptyValues" flag so your code should work consistently when that is set to true.

Keeping values of cell arrays when exporting to Excel

I have a cell array. Some of the elements in this cell array contains zeros as the first character and the whole element is only numbers (double) as well. When exporting these to Excel (which I prefer), the zeros are deleted and converting it to a number.
Let's take an example to illustrate my problem. I have a cell array with 10 elements:
NodeID = {'0000006';
'0000011';
'000011R';
'000016R';
'000021R';
'B276_2';
'EB 7.55';
'EB2521';
'EllebaekOPlB1';
'EllebaekOplB10'};
The first two elements contains zeros until the number 6 and 11, respectively. Unlike the third element and so forth, where letters are involved. So when exporting NodeID to Excel, it returns this in a column (I use writetable command by the way):
6
11
000011R
000016R
000021R
B276_2
EB 7.55'
EB2521
EllebaekOPlB1
EllebaekOplB10
Notice the removal of zeros for the first two elements. Now I know that in Excel, it will keep all the content with the addition of a quote symbol ' in front of the cell, eg. '0000006 for the first element.
I have searched in many places to find a solution to this. But is there a good way to avoid this from happening? Either by somehow adding an extra ยด or some other magical trick which I have not seen?
Thank you in advance!
One alternative (if your values are in a cell, as you say they are):
filename = 'NodeID.xlsx';
NodeID2 = cellfun(#(C) ['''',C], NodeID,'UniformOutput', false)
xlswrite(filename, NodeID2)
This gives you:
NodeID2 =
''0000006'
''0000011'
''000011R'
''000016R'
''000021R'
''B276_2'
''EB 7.55'
''EB2521'
''EllebaekOPlB1'
''EllebaekOplB10'
And an Excel file looking like this:
The cellfun line is equivalent to:
for ii = 1:numel(NodeID)
NodeID2{ii,1} = ['''', NodeID{ii}];
end
The part ['''', NodeID{ii}] inserts a single quotation mark in front for the string. Relevant answer.

Apply a string value to several positions of a cell array

I am trying to create a string array which will be fed with string values read from a text file this way:
labels = textread(file_name, '%s');
Basically, for each string in each line of the text file file_name I want to put this string in 10 positions of a final string array, which will be later saved in another text file.
What I do in my code is, for each string in file_name I put this string in 10 positions of a temporary cell array and then concatenate this array with a final array this way:
final_vector='';
for i=1:size(labels)
temp_vector=cell(1,10);
temp_vector{1:10}=labels{i};
final_vector=horzcat(final_vector,temp_vector);
end
But when I run the code the following error appears:
The right hand side of this assignment has too few values to satisfy the left hand side.
Error in my_example_code (line 16)
temp_vector{1:10}=labels{i};
I am too rookie in cell strings in matlab and I don't really know what is happening. Do you know what is happening or even have a better solution to my problem?
Use deal and put the left hand side in square brackets:
labels{1} = 'Hello World!'
temp_vector = cell(10,1)
[temp_vector{1:10}] = deal(labels{1});
This works because deal can distribute one value to multiple outputs [a,b,c,...]. temp_vector{1:10} alone creates a comma-separated list and putting them into [] creates the output array [temp_vector{1}, temp_vector{2}, ...] which can then be populated by deal.
It is happening because you want to distribute one value to 10 cells - but Matlab is expecting that you like to assign 10 values to 10 cells. So an alternative approach, maybe more logic, but slower, would be:
n = 10;
temp_vector(1:n) = repmat(labels(1),n,1);
I also found another solution
final_vector='';
for i=1:size(labels)
temp_vector=cell(1,10);
temp_vector(:,1:10)=cellstr(labels{i});
final_vector=horzcat(final_vector,temp_vector);
end

Matlab join array of strings

In ruby and other languages, I can create an array, push an arbitrary number of strings and then join the array:
ary=[]
...
ary.push some_str
ary.push some_other_str
...
result = ary.join ""
How do I accomplish this in matlab?
User story: my plot legend is composed of a variable number of strings. The number of strings is determined runtime, so I want to declare the array, add strings dynamically and then join the array to the legend string in the end of the script.
In MATLAB, String joining happens like the following
a = 'ding';
b = 'dong';
c = [a ' ' b]; % Produces 'ding dong'
P.S. a typeof(c,'char') shows TRUE in MATLAB because it "joins" all characters into C.
Suppose you want to start with an empty char placeholder. You can do this.
a = ``; % Produces an empty character with 0x0 size.
Then you can keep adding to the end of it; like this:
a = [a 'newly added'] % produces a = "newly added"
To prove that it works, do this again:
a = [a ' appended more to the end.'] % produces a = "newly added appended more to the end."
You can always use the end keyword that points to the last index of an array, but in this case you need to append to end+X where X is the extra number of characters you are appending (annoyingly). I suggest you just use the [] operator to join/append.
There is also this strjoin(C, delim) function which joins a cell C of strings using a delim delimiter (could be whitespace or whatever). But cheap and dirty one is the one I showed above.

Efficient allocation of cell array in matlab

I have some which converts a cell array of strings into a cell array of characters.
Note. For a number of reasons, both the input (C) and the output (C_itemised) must be cell arrays.
The cell array of strings (C) is as follows:
>> C(1:10)
ans =
't1416933446'
''
't1416933446'
''
't1416933446'
''
't1416933446'
''
't1416933446'
''
I have only shown a portion of the array here. In reality it is ~28,000 rows in length.
I have some code which does this, although it is very inefficient. The cellstr function takes up 72% of the code's time, as it is currently called thousands of times. The code is as follows:
C_itemised=cell(length(C),500);
for i=3:length(C)
temp=char(C{i});
for j=1:length(temp)
C(i-2,j)=cellstr(temp(j));
end
end
I have a feeling that some minor modifications could take out the inner loop, thus cutting down the overall running time substantially. I have tried a number of ways to do this, but I think I keep getting confused about whether to use {} or (), and haven't been able to find anything online that can help me. Can anyone see a way to make the code more efficient?
Please also note that this function is used in conjunction with other functions, and does work, although it is running slower than would be ideal. Therefore, I do not wish to change the format of C_itemised.
EDIT:
(A sample of) the output of my current function is:
C_itemised(1,1:12)
ans =
Columns 1 through 12
't' '1' '4' '1' '6' '9' '3' '3' '4' '4' '6' []
One thing I can suggest is to use the undocumented function sprintfc. This function is hidden from normal use in MATLAB, but it is used internally with a variety of other functions. Mainly, if you tried doing help sprintfc, it'll say that there's no function found! It's cool to sniff around the source sometimes!
How sprintfc works is that you provide it a formatting string, much like printf, and the data you want printed. It will take each individual element in the data and place them into individual cell arrays. As an example, supposing I had a string D = 'abcdefg';, if we did:
out = sprintfc('%c', D);
We get:
>> celldisp(out)
out{1} =
a
out{2} =
b
out{3} =
c
out{4} =
d
out{5} =
e
out{6} =
f
out{7} =
g
As such, it takes each element in your string and places them as individual characters serving as individual elements in a new cell array. The %c formatting string means that we want to print a single character per element. Check out the link to Undocumented MATLAB that I posted above if you want to learn more!
Therefore, try simplifying your loop to this:
C_itemised=cell(length(C));
for i=1:length(C)
C_itemised{i} = sprintfc('%c', C{i});
end
C_itemised will be a cell array, where each element C_itemised{i} is another cell array, with each element in this cell array being a single character that is composed of the string C{i}.
Minor Note
You said you were confused about {} and () in MATLAB for cells. {} is used to access individual elements inside the cell. So doing C{1} for example will grab whatever is stored in the first element of the cell array. () is used to slice and index into the cells. For example, if you wanted to make another cell array that is a subset of the current one, you would do something like C(1:3). This will create a three element cell array which is composed of the first three cells in C.

Resources