Matlab join array of strings - arrays

In ruby and other languages, I can create an array, push an arbitrary number of strings and then join the array:
ary=[]
...
ary.push some_str
ary.push some_other_str
...
result = ary.join ""
How do I accomplish this in matlab?
User story: my plot legend is composed of a variable number of strings. The number of strings is determined runtime, so I want to declare the array, add strings dynamically and then join the array to the legend string in the end of the script.

In MATLAB, String joining happens like the following
a = 'ding';
b = 'dong';
c = [a ' ' b]; % Produces 'ding dong'
P.S. a typeof(c,'char') shows TRUE in MATLAB because it "joins" all characters into C.
Suppose you want to start with an empty char placeholder. You can do this.
a = ``; % Produces an empty character with 0x0 size.
Then you can keep adding to the end of it; like this:
a = [a 'newly added'] % produces a = "newly added"
To prove that it works, do this again:
a = [a ' appended more to the end.'] % produces a = "newly added appended more to the end."
You can always use the end keyword that points to the last index of an array, but in this case you need to append to end+X where X is the extra number of characters you are appending (annoyingly). I suggest you just use the [] operator to join/append.
There is also this strjoin(C, delim) function which joins a cell C of strings using a delim delimiter (could be whitespace or whatever). But cheap and dirty one is the one I showed above.

Related

matlab: logically comparing two cell arrays

I have an excel file from which I obtained two string arrays, Titles of dimension 6264x1 and another Names of dimension 45696x1. I want to create an output matrix of size 6264x45696 containing in the elements a 1 or 0, a 1 if Titles contains Names.
I think I want something along the lines of:
for (j in Names)
for (k in Titles)
if (Names[j] is in Titles[k])
write to excel
end
end
end
But I don't know what functions I should use to achieve what I have in the picture. Here is what I have come up with:
[~,Title] = xlsread('exp1.xlsx',1,'A3:A6266','basic');
[~,Name] = xlsread('exp1.xlsx',2,'B3:B45698','basic');
A = cellstr(Title);
GN = cellstr(Name);
BinaryMatrix = false(45696,6264);
for i=1:1:45696
for j=1:1:6264
if (~isempty(ismember(A,GN)))
BinaryMatrix(i,j)= true;
end
end
end
the problem with this code is that it never finishes running, although there are no suggestions within matlab.
You can use third output of unique to get numbers corresponding to each string element and use bsxfun to compare numbers.
GN = cellstr(Name);
A = cellstr(Title);
B = [ GN(:); A(:)];
[~,~,u]= unique(B);
BinaryaMatrix = bsxfun(#eq, u(1:numel(GN)),u(numel(GN)+1:end).');
ismember can handle cell arrays of character vectors. Its second output tells you the information you need, from which you can build the result using sparse (it could also be done by preallocating and using [sub2ind):
[~, m] = ismember(Titles, Names);
BinaryMatrix = full(sparse(nonzeros(m), find(m), true, numel(Names), numel(Titles)));

How to store the multiple positions of a character in a string inside an array in free format RPGLE?

In standard RPGLE, my code looks like this. This statement stores the positions of the commas in Data in ComArr array.
C ',' Scan Data ComArr
I tried doing it in free format like this. But all the indices of ComArr array is loaded with the first position of comma in Data. This is because %Scan returns only one position and upon saving it to an array ends up loading the whole array with a single value.
ComArr = %Scan(',':Data) ;
Is there any other method to process SCAN in free format RPGLE like it does in C spec? Basically I want to split the string separated by a delimiter.
One possibility is to keep the C-spec as-is. If the code block needs an array of delimiter positions, and one line of code already does that, put a comment above the fixed-format spec describing what it does and leave it in there.
If /free is required and you don't want to replace the entire block of code, you will need to roll your own loop to build the array of delimiters.
I don't personally convert from fixed to /free unless I am re-writing the block of code to be functionally different. That is, I would almost certainly write a different algorithm in /free than I would have written in fixed. So the entire process of building an array of delimiter positions and then splitting the string based on that array is not something I would do in /free.
I would write a new sub-procedure that returns an array of strings given one delimited input string. The code inside that sub-procedure would make one pass through the input, looking for delimiters with %scan(), and for each one found, split the substring into the next available output array element. There's no need for an array of delimiter positions with this sort of algorithm.
This is probably a little late, but if anyone else needs to split a string by a given delimeter, this code should do what you need.
If you assign a value to an array using wildcard eval array(*) = ..., it applies to every element of the array.
Declare the prototype in your source:
D split pr 1024a varying
D string 65535a varying const options(*varsize)
D delims 50a varying const
D pos 10i 0
Declare a couple of variables.
This assumes your input string is 1000 characters and each separated item is 10 characters maximum:
D idx s 10i 0
D list s 1000a
D splitAry s 10a dim(100)
This is how you split the string.
This tells the routine your delimeter is a comma:
c eval idx = 0
c eval splitAry(*) = split(list:',':idx)
Define the procedure that does the work:
*-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
* split - Split delimited string
*-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Psplit b export
D split pi 1024a varying
D iString 65535a varying const options(*varsize)
D iDelims 50a varying const
D iPos 10i 0
*
D result s 1024a varying
D start s 10i 0
D char s 1a
*-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
c eval start = iPos + 1
c eval %len(result) = 0
*
c for iPos = start to %len(iString)
c eval char = %subst(iString:iPos:1)
c if %check(iDelims:char) = 1
c eval result = result + char
c else
c leave
c endif
c endfor
*
c return result
Psplit e
Don't forget to add dftactgrp(*no) to your H spec if you're defining and using this in the same module!

How to substring of a string in matlab array

I have a matlab cell array of size 20x1 elements. And all the elements are string like 'a12345.567'.
I want to substitute part of the string (start to 9th index) of all the cells.
so that the element in matrix will be like 'a12345.3'.
How can I do that?
You can use cellfun:
M = { 'a12345.567'; 'b12345.567' }; %// you have 20 entries like these
MM = cellfun( #(x) [x(1:7),'3'], M, 'uni', 0 )
Resulting with
ans =
a12345.3
b12345.3
For a more advanced string replacement functionality in Matlab, you might want to explore strrep, and regexprep.
Another method that you can use is regexprep. Use regular expressions and find the positions of those numbers that appear after the . character, and replace them with whatever you wish. In this case:
M = { 'a12345.567'; 'b12345.567' }; %// you have 20 entries like these - Taken from Shai
MM = regexprep(M, '\d+$', '3');
MM =
'a12345.3'
'b12345.3'
Regular expressions is a framework that finds substrings within a larger string that match a particular pattern. In our case, \d is the regular expression for a single digit (0-9). The + character means that we want to find at least one or more digits chained together. Finally the $ character means that this pattern should appear at the end of the string. In other words, we want to find a pattern in each string such that there is a number that appears at the end of the string. regexprep will find these patterns if they exist, and replace them with whatever string you want. In this case, we chose 3 as per your example.

Efficient allocation of cell array in matlab

I have some which converts a cell array of strings into a cell array of characters.
Note. For a number of reasons, both the input (C) and the output (C_itemised) must be cell arrays.
The cell array of strings (C) is as follows:
>> C(1:10)
ans =
't1416933446'
''
't1416933446'
''
't1416933446'
''
't1416933446'
''
't1416933446'
''
I have only shown a portion of the array here. In reality it is ~28,000 rows in length.
I have some code which does this, although it is very inefficient. The cellstr function takes up 72% of the code's time, as it is currently called thousands of times. The code is as follows:
C_itemised=cell(length(C),500);
for i=3:length(C)
temp=char(C{i});
for j=1:length(temp)
C(i-2,j)=cellstr(temp(j));
end
end
I have a feeling that some minor modifications could take out the inner loop, thus cutting down the overall running time substantially. I have tried a number of ways to do this, but I think I keep getting confused about whether to use {} or (), and haven't been able to find anything online that can help me. Can anyone see a way to make the code more efficient?
Please also note that this function is used in conjunction with other functions, and does work, although it is running slower than would be ideal. Therefore, I do not wish to change the format of C_itemised.
EDIT:
(A sample of) the output of my current function is:
C_itemised(1,1:12)
ans =
Columns 1 through 12
't' '1' '4' '1' '6' '9' '3' '3' '4' '4' '6' []
One thing I can suggest is to use the undocumented function sprintfc. This function is hidden from normal use in MATLAB, but it is used internally with a variety of other functions. Mainly, if you tried doing help sprintfc, it'll say that there's no function found! It's cool to sniff around the source sometimes!
How sprintfc works is that you provide it a formatting string, much like printf, and the data you want printed. It will take each individual element in the data and place them into individual cell arrays. As an example, supposing I had a string D = 'abcdefg';, if we did:
out = sprintfc('%c', D);
We get:
>> celldisp(out)
out{1} =
a
out{2} =
b
out{3} =
c
out{4} =
d
out{5} =
e
out{6} =
f
out{7} =
g
As such, it takes each element in your string and places them as individual characters serving as individual elements in a new cell array. The %c formatting string means that we want to print a single character per element. Check out the link to Undocumented MATLAB that I posted above if you want to learn more!
Therefore, try simplifying your loop to this:
C_itemised=cell(length(C));
for i=1:length(C)
C_itemised{i} = sprintfc('%c', C{i});
end
C_itemised will be a cell array, where each element C_itemised{i} is another cell array, with each element in this cell array being a single character that is composed of the string C{i}.
Minor Note
You said you were confused about {} and () in MATLAB for cells. {} is used to access individual elements inside the cell. So doing C{1} for example will grab whatever is stored in the first element of the cell array. () is used to slice and index into the cells. For example, if you wanted to make another cell array that is a subset of the current one, you would do something like C(1:3). This will create a three element cell array which is composed of the first three cells in C.

Matlab, find common elements of two cell arrays

I have two cell arrays, the sizes are 1x20033 and 1x19. Let's call these two cell arrays as A and B. I want to compare each cell of A with each cell of B to see if there is any common element.
Finally, I need to build a binary matrix and put one when there is a match.
I tried this:
BinaryMatrix=zeros(20033,19);
for i=1:1:20033
for j=1:1:19
match=find(ismember(A{i},B{j}));
if match==1
BinaryMatrix(i,j)= 1;
end
end
end
but I faced this error: "Input A of class double and input B of class cell must be
cell arrays of strings, unless one is a string."
Please tell me What should I do to solve it?
The code that you have almost works. What I would recommend you do is split up the strings found in A and B by spaces. As such, A and B would then be cell arrays of elements where each element in A or B is a single word. The spaces will serve as delimiters for separating out the words.
Once you do this, use intersect to see if there are any common words between the words in A and the words in B. intersect works by considering two arrays (these can be numeric arrays, cell arrays, etc.) C and D as sets, and it returns the set intersection between these two arrays.
In our case, C and D would be a cell array of words separated by spaces from A and B. intersect(C,D) will return a cell array of strings where each element in the output is a string found in both C and D. As such, should this cell array be non-empty, we have found at least one common word between C and D. If this is the case, then set your binary flag at the location of your matrix to 1. In other words:
BinaryMatrix = false(20033,19);
for i=1:1:20033
for j=1:1:19
Asplit = strsplit(A{i});
Bsplit = strsplit(B{j});
if (~isempty(intersect(Asplit, Bsplit)))
BinaryMatrix(i,j)= true;
end
end
end
You'll notice that I have changed your matrix from zeros(20033,19), to false(20033,19). The reason why is because by doing zeros, you are allocating 8 bytes per number in your matrix as this will create your matrix in double precision. By doing false, this will be a logical matrix instead, and you are allocating 1 byte per number. Seeing as how you want BinaryMatrix to be either true or false, don't use double - use logical. I don't know how large both cell arrays are, and so doing this will cut down your memory consumption by 8.
Minor Note
strsplit is only available from R2013a and onwards. If you have a version of MATLAB that is R2012b and lower, replace strsplit with regexp. As such, you would replace the two lines in the for loop with:
Asplit = regexp(A{i}, ' ', 'split');
Bsplit = regexp(B{j}, ' ', 'split');

Resources