storing the longest string after strsplit - arrays

I am trying to store the longest resultant string after using the function strsplit unable to do so
eg: I have input strings such as
'R.DQDEGNFRRFPTNAVSMSADENSPFDLSNEDGAVYQRD.L'or
'L.TSNKDEEQRELLKAISNLLD'
I need store the string only between the dots (.)
If there is no dot then I want the entire string.
Each string may have zero, one or two dots.
part of the code which I am using:
for i=1:700
x=regexprep(txt(i,1), '\([^\(\)]*\)','');
y=(strsplit(char(x),'.'));
for j=1:3
yValues(1,j)=y{1,j};
end
end
But the string yValues is not storing the value of y, instead showing the following error:
Assignment has more non-singleton rhs dimensions than non-singleton subscripts
What am I doing wrong and are there any suggestions on how to fix it?

The issue is that y is a cell array and each element contains an entire string and it therefore can't be assigned to a single element in a normal array yvalues(1,j).
You need yvalues to be a cell array and then you can assign into it just fine.
yValues{j} = y{j};
Or more simply
% Outside of your loop
yValues = cell(1,3);
% Then inside of your loop
yValues(j) = y(j);
Alternately, if you just want the longest output of strsplit, you can just do something like this.
% Split the string
parts = strsplit(mystring, '.');
% Find the length of each piece and figure out which piece was the longest
[~, ind] = max(cellfun(#numel, parts));
% Grab just the longest part
longest = parts{ind};

Related

Efficient way to filter an array based on element index

I have an array of arrays. So each element of my first array contains a comma separated list of values. If I use the split function, I can get an array from this comma separated list. What I need to do is filter out this second array based on element position. For example only keep columns one, three, five and nine.
One way to do this is loop thru my first array, for each element do a split on the element to get my second array. Then loop thru this second array, increment a counter to track the current element index. If the counter is equal to one of the columns I want to keep, then concat the element to a string variable.
This is very inefficient and takes forever to run on large arrays. Does anyone have any ideas on a better way to do this? I hope I explained this clearly.
There are some built in array actions like “Filter” and “Join” as you mention, but for something this specific I imagine you’ll need to call some code (e.g. azure function) to quickly do manipulation and return result
For the first loop I don't know of any alternate but for second loop, Instead of looping through the second array,you can simply access the elements with index that you require.
Assuming the size is not an issue.
string[] Arr1 = new string[] { "0_zero,0_One,0_Two,0_Three,0_Four,0_Five,0_six,0_seven,0_eight,0_nine",
"1_zero,1_One,1_Two,1_Three,1_Four,1_Five,1_six,1_seven,1_eight,1_nine" };
string myString = string.Empty;
foreach(var a in Arr1)
{
var sp = a.Split(',');
myString= string.Concat(myString, sp[0], sp[3], sp[5], sp[9]);
}
Console.WriteLine(myString); //gives "0_One0_Three0_Five0_nine1_One1_Three1_Five1_nine"
In case we're not sure of length of each string, we can use if else ladder with decreasing order from maximum index that we want to use like so
foreach(var a in Arr1)
{
var sp = a.Split(',');
int len = sp.Length;
if (len >= 10) myString= string.Concat(myString, sp[1], sp[3], sp[5], sp[9]);
else if (len >= 6) myString = string.Concat(myString, sp[1], sp[3], sp[5]);
else if (len >= 4) myString = string.Concat(myString, sp[1], sp[3]);
else if (len >= 2) myString = string.Concat(myString, sp[1]);
}
So that we don't face IndexOutofBoundsException

Matlab concatenate variable string without curly braces

I'm trying to concatenate a series of strings in a loop into a variable array but the resulting strings are always within curly braces. Why does this happen, and how can I concatenate the string without them? Thanks
subs = {'abc001' 'abc002' 'abc003' 'abc004'};
for i = 1:size(subs,2)
subject = subs(i);
files_in(i).test = strcat('/home/data/','ind/',subject,'/test_ind_',subject,'.mat');
end
files_in(1)
% ans =
% test: {'/home/data/ind/abc001/test_ind_abc001.mat'}
I would like it to be:
test: '/home/data/ind/abc001/test_ind_abc001.mat'
subs is a cell array. If you index it using () notation, you will also get a cell array.
a = {'1', '2', '3'};
class(a(1))
% cell
To get the string inside the cell array you need to use {} notation to index into it.
class(a{1})
% char
When you use strcat with cell arrays, the result will be a cell array. When you use it with strings, the resut will be a string. So if we switch out (k) with {k} we get what you expect.
for k = 1:numel(subs)
subject = subs{k};
files_in(k).test = strcat('/home/data/ind/', subject, '/test_ind_', subject, '.mat');
end
A few side notes:
Don't use i as a variable. i and j are used in MATLAB to indicate sqrt(-1).
It is recommended to use fullfile to construct file paths rather than strcat.

Error when copying a word to array character by character

I'm trying to copy an unknown length of characters into an array, but I keep getting an error. I'm getting this from a website converted to text. Site is the position of the first character of the word (I want to copy 4 words), and result is the whole text file.
I keep getting this error:
Subscript indices must either be real positive integers or logicals.
for this line: webget = result(sites(i)+n);
for i = 0:3; %for finding first 4
webget = 'p'; %placeholder
website = []; %blank
while strcmp(webget,' ') == 0;
for n = 0:150; %letter by letter, arbitrary search length
webget = result(sites(i)+n);
website = strcat(website,webget);
end
end
website(i) = website;
end
Could anyone help?
Matlab arrays index starting from 1, not 0. On your first loop iteration, i=0, so your request for the 0th entry in the sites array is not valid.
Consider using i = 1:4.

Attempting to create cell array from individual string values, receive subscript indices must be either positive integers or logicals..?

Here is my code
%This function takes an array of 128Bvalue numbers and returns the equivalent binary
%values in a string phrase
function [PhrasePattern] = GetPatternForValuesFinal(BarcodeNumberValues)
load code128B.mat;
Pattern = {};
%The correspomnding binary sequence for each barcode value is
%identified and fills the cell array
for roll = 1:length(BarcodeNumberValues);
BarcodeRow = str2double(BarcodeNumberValues{1,roll}) + 1;
Pattern{1,roll} = code128B{BarcodeRow,3};
end
%The individual patterns are then converted into a single string
PhrasePattern = strcat(Pattern{1,1:length(Pattern)});
end
The intention of the function is to use an array of numbers, and convert them into a cell array of corresponding binary string values, then concatenate these strings.
The error comes from line 11 column 26,
Pattern{1,roll} = code128B{BarcodeRow,3}; subscript indices must be either positive integers or logicals
Does this mean I can't create a cell array of strings..?
You can create a cell array of strings.
The problem in your code is that BarcodeRow is not a positive integer. Your input BarcodeNumberValues is likely the problem. Make sure it does not contain decimal values.
As chappjc pointed out, the problem is probably that BarcodeNumberValues contains non-integer or non-positive values.
The following code tries to convert BarcodeNumberValues into an integer and aborts if it doesn't work.
[BarcodeRow, ~, ~, nextindex] = sscanf(BarcodeNumberValues{1,roll}, '%d', 1) + 1;
assert(nextindex == length(BarcodeNumberValues{1,roll}) + 1);

Search Algorithm with Incomplete Input

I need an algorithm which will search an array for a string, but the string may not be exactly the same as one of the items in the array.
For example,
Array = {"Stack", "Over", "Flow", "Stake"}
input = "Sta"
It will need to recognize that Stack and Stake both match the parameters and then choose the one which is first in alphabetical order.
How can I do this?
I would use List, do binarySearch on that list.
List<String> arr = new ArrayList<>();
add elements, while adding elements you can do the following.
int x = Collections.binarySearch(arr, key);
if(x < 0)
arr.add(-x-1, key);
//for n element this takes n.log_n time.
you can do binary search in the list, if the result of binarySearch is > 0, then the key exists in your list, else (-x-1) is the location of the key when it is inserted. go tru each element who begins with input string.
For example, arr is your array and you are searching for input.
arr = {"Flow", "Over", "Stack", "Stake"}
input = "Sta";
int x = Collections.binarySearch(arr, input);
if(x < 0)
x = -x-1;
if(arr.get(x).subString(0,input.length()).equals(input));
System.out.println(arr.get(x))
else
System.out.println("there is no element starting with input string");
Time complexity is O(logn) where n is array's length.
Loop over the sorted array, compute the Levenshtein distance between each string and your target string, and if it is sufficiently small, return.
What constitutes "sufficiently small" is up to you. You'll probably have to do some testing.
Simply loop through each element in the array and compare it to the input, determining if the input is contained in the element. Remove any element that does not meet this prerequisite. Finally go through the remaining elements and pick the one that is first alphabetically.
Loop through all the index values of the array and find the substring match of the input. Find all the matches and print the one whose index value is the lowest.
For example you will find the substring match for Array[0] and Array[3]. Now you have two matches at 0 and 3. Find the next alphabet of the substirng match. At Arrary[0] the next alphabet to Sta is 'c' but at Array[3] the next alphabet is 'k', here a < k, so the output is Array[0]
You may find Trie data structure useful. It is very efficient to find all words you need.
But memory overhead can be significant if you have many words in the list.

Resources