How to split string based on pre-defined values from Array

How to split string based on pre-defined values from Array - arrays

I want to split a string based on an array that I define as a constant at the start:
class Query
OPERATOR = [':','=','<','>','<=','>=']
def initialize(params)
#Here i want to split given params if it contains any
#of the operators from OPERATOR
end
end
Query.new(["Status<=xyz","Org=abc"])
How can I do this?

OPERATOR = ['<=','=>',':','=','<','>']
r = /\s*#{ Regexp.union(OPERATOR) }\s*/
#=> /\s*(?-mix:<=|=>|:|=|<|>)\s*/
str = "Now: is the =time for all <= to =>"
str.split(r)
#=> ["Now", "is the", "time for all", "to"]
Note that I reordered the elements of OPERATOR so that '<=' and '=>' (each comprised of two strings of length one in the array) are at the beginning. If that is not done,
OPERATOR = [':','=','<','>','<=','>=']
r = /\s*#{ Regexp.union(OPERATOR) }\s*/
#=> /\s*(?-mix::|=|<|>|<=|>=)\s*/
str.split(r)
#=> ["Now", "is the", "time for all", "", "to"]
str.split(r)
See Regexp::union.

Related

Ruby - return boolean for matching one or both elements in an array element

I have an array that consists of strings with text separated by commas. I need to return a boolean that indicates if it is empty or if one or both of two other strings is the only value contained in the array element.
text1 = John
text2 = Doe
array1['element'] = 'John, Doe' #true
array2['element'] = 'Bob, Buck' #false
array3['element'] = 'John, Buck' #false
array4['element'] = 'John' #true
array5['element'] = 'John, John' #true
array6['element'] = '' #true
I can match one at a time or an empty element, but I'm not sure how to handle making sure only my matches are included and not other text.
foo = 'John,Doe,Buck'
if foo['John']
foo <= 'Set to Repeat'
elsif foo['Doe']
foo <= 'Set to Repeat'
elsif foo['John,Doe']
foo <= 'Set to Repeat'
elsif foo['']
foo <= 'Set to Repeat'
else foo
end
Using this code I get a match, but I need to reject it because of the presence of 'Buck'.

The question is a bit confusing because array1['element'] is invalid if array is an array. The method Array#[] must take one integer, two integers or a range as its argument(s). Giving it an argument that is a string will raise an exception (TypeError (no implicit conversion of String into Integer).
Suppose:
text = ['John', 'Doe']
arr = ['John, Doe', 'Bob, Buck', 'John , Buck', 'John', 'John, John', '']
and you wish to know which elements of arr (strings) contain only the words in the array text. You can do that as follows.
arr.map { |s| s.split(/ *, +/).all? { |ss| text.include?(ss) } }
#=> [true, false, false, true, true, true]
If, for example, s = 'Bob, Buck', then
s.split(/, +/)
#=> ["Bob", "Buck"]
Similarly,
'Bob'.split(/, +/)
#=> ["Bob"]
''.split(/, +/)
#=> []
See String#split, Array#all? and Array#include?. The regular expression / *, +/ reads, "match zero or more (*) spaces followed by a comma followed by one or more (+) spaces". (If 'John,Doe' is to be permitted use / *, */.)
Alternatively, one could write
arr.map { |s| s.split(',').all? { |ss| text.include?(ss.strip) } }
#=> [true, false, false, true, true, true]
Here
'John , Buck'.split(',')
#=> ["John ", " Buck"]
but then
"John ".strip
#=> "John"
" Buck".strip
#=> "Buck"
See String#strip.
You may wonder why I used Array#all? considering that all?'s receiver is an array containing only two elements (e.g., a = ['John', 'Doe']. It is simply because it is easier to regard a as an array of arbitrary size than to have one statement that requires text to include a[0] and another that requires text to include a[1].
Lastly, another variant would be to use String#scan:
arr.map { |s| s.scan(/[, ]+/).all? { |ss| text.include?(ss) } }
#=> [true, false, false, true, true, true]
scan takes an argument that is a regular expression that reads, "match one or more (+) characters, each of which is a character other than (^) a comma or a space". The brackets denote a character class, meaning that a character must match any character in the class. The ^ at the beginning of the class definition means "other than the characters that follow".

Working with Transpose functions result in error

consider the following array
arr = [["Locator", "Test1", "string1","string2","string3","string4"],
["$LogicalName", "Create Individual Contact","value1","value2"]]
Desired result:
[Test1=>{"string1"=>"value1","string2"=>"value2","string3"=>"","string4"=>""}]
When I do transpose, it gives me the error by saying second element of the array is not the length of the first element in the array,
Uncaught exception: element size differs (2 should be 4)
so is there any to add empty string in the place where there is no element and can perform the transpose and then create the hash as I have given above? The array may consist of many elements with different length but according to the size of the first element in the array, every other inner array has to change by inserting empty string and then I can do the transpose. Is there any way?

It sounds like you might want Enumerable#zip:
headers, *data_rows = input_data
headers.zip(*data_rows)
# => [["Locator", "$LogicalName"], ["Test1", "Create Individual Contact"],
# ["string1", "value1"], ["string2", "value2"], ["string3", nil], ["string4", nil]]

If you wish to transpose an array of arrays, each element of the array must be the same size. Here you would need to do something like the following.
arr = [["Locator", "Test1", "string1","string2","string3","string4"],
["$LogicalName", "Create Individual Contact","value1","value2"]]
keys, vals = arr
#=> [["Locator", "Test1", "string1", "string2", "string3", "string4"],
# ["$LogicalName", "Create Individual Contact", "value1", "value2"]]
idx = keys.index("Test1") + 1
#=> 2
{ "Test1" => [keys[idx..-1],
vals[idx..-1].
concat(['']*(keys.size - vals.size))].
transpose.
to_h }
#=> {"Test1"=>{"string1"=>"value1", "string2"=>"value2", "string3"=>"", "string4"=>""}}
It is not strictly necessary to define the variables keys and vals, but that avoids the need to create those arrays multiple times. It reads better as well, in my opinion.
The steps are as follows. Note keys.size #=> 6 and vals.size #=> 4.
a = vals[idx..-1]
#=> vals[2..-1]
#=> ["value1", "value2"]
b = [""]*(keys.size - vals.size)
#=> [""]*(4 - 2)
#=> ["", ""]
c = a.concat(b)
#=> ["value1", "value2", "", ""]
d = keys[idx..-1]
#=> ["string1", "string2", "string3", "string4"]
e = [d, c].transpose
#=> [["string1", "value1"], ["string2", "value2"], ["string3", ""], ["string4", ""]]
f = e.to_h
#=> {"string1"=>"value1", "string2"=>"value2", "string3"=>"", "string4"=>""}
f = e.to_h
#=> { "Test1" => f }

Find the longest Element in your Array and make sure every other element has the same length - loop and add maxLength - element(i).length amount of "" elements.

How do I combine elements in an array matching a pattern?

I have an array of strings
["123", "a", "cc", "dddd", "mi hello", "33"]
I want to join by a space consecutive elements that begin with a letter, have at least two characters, and do not contain a space. Applying that logic to the above would yield
["123", "a", "cc dddd", "mi hello", "33"]
Similarly if my array were
["mmm", "3ss", "foo", "bar", "foo", "55"]
I would want the result to be
["mm", "3ss", "foo bar foo", "55"]
How do I do this operation?

There are many ways to solve this; ruby is a highly expressive language. It would be most beneficial for you to show what you have tried so far, so that we can help debug/fix/improve your attempt.
For example, here is one possible implementation that I came up with:
def combine_words(array)
array
.chunk {|string| string.match?(/\A[a-z][a-z0-9]+\z/i) }
.flat_map {|concat, strings| concat ? strings.join(' ') : strings}
end
combine_words(["aa", "b", "cde", "f1g", "hi", "2j", "l3m", "op", "q r"])
# => ["aa", "b", "cde f1g hi", "2j", "l3m op", "q r"]
Note that I was a little unclear exactly how to interpret your requirement:
begin with a letter, have at least two characters, and do not contain a space
Can strings contain punctuation? Underscores? Utf-8 characters? I took it to mean "only a-z, A-Z or 0-9", but you may want to tweak this.
A literal interpretation of your requirement could be: /\A[[:alpha:]][^ ]+\z/, but I suspect that's not what you meant.
Explanation:
Enumerable#chunk will iterate through the array and collect terms by the block's response value. In this case, it will find sequential elements that match/don't match the required regex.
String#match? checks whether the string matches the pattern, and returns a boolean response. Note that if you were using ruby v2.3 or below, you'd have needed some workaround such as !!string.match, to force a boolean response.
Enumerable#flat_map then loops through each "result", joining the strings if necessary, and flattens the result to avoid returning any nested arrays.
Here is another, similar, solution:
def word?(string)
string.match?(/\A[a-z][a-z0-9]+\z/i)
end
def combine_words(array)
array
.chunk_while {|x, y| word?(x) && word?(y)}
.map {|group| group.join(' ')}
end
Or, here's a more "low-tech" solution - which only uses more basic language features. (I'm re-using the same word? method here):
def combine_words(array)
previous_was_word = false
result = []
array.each do |string|
if previous_was_word && word?(string)
result.last << " #{string}"
else
result << string
end
previous_was_word = word?(string)
end
result
end

You can use Enumerable#chunk.
def chunk_it(arr)
arr.chunk { |s|
(s.size > 1) && (s[0].match?(/\p{Alpha}/)) && !s.include?(' ')}.
flat_map { |tf,a| tf ? a.join(' ') : a }
end
chunk_it(["123", "a", "cc", "dddd", "mi hello", "33"])
#=> ["123", "a", "cc dddd", "mi hello", "33"]
chunk_it ["mmm", "3ss", "foo", "bar", "foo", "55"]

How to split string without defined delimeter

I have a string that looks like this:
bar = "Bar 01/12/15"
foo = "Foo02/15/87"
How can a split those variables so that resulting array contains:
bar_array = ["Bar", "01/12/15"]
foo_array = ["Foo","02/15/87"]

r = /(?<=[[:alpha:]]) ?(?=\d)/
"Bar 01/12/15".split(r)
#=> ["Bar", "01/12/15"]
"Foo02/15/87".split(r)
#=> ["Foo", "02/15/87"]
The regular expression reads
match a letter in a positive lookbehind
match 0 or 1 spaces
match a digit in a positive lookahead

If your string will always have that dd/mm/yy format at the end, you can create a method that takes the last 8 characters from the string and return both values (remaining string and date) as an array, something like this:
def to_array(string)
date = string[-8..-1]
[string.delete(date).strip, date]
end
to_array(bar)
#=> ["Bar", "01/12/15"]
to_array(foo)
# => ["Foo", "02/15/87"]

Given that the provided string (as in your examples):
Contains one word and a date (with zero or more spaces between them)
The date is formed with 8 characters (i.e. ##/##/## format)
The date is at the end of the string
You could do the following:
bar.sub(/(.{8})\z/, ' \1').split
#=> ["Bar", "01/12/15"]
sub(/(.{8})\z/, ' \1') will add a space before the date
split will split the string where a space (or more) is found

regex works
"a,b'c d".split /\s|'|,/
# => ["a", "b", "c", "d"]
here's some documentation on regular expressions
http://rubylearning.com/satishtalim/ruby_regular_expressions.html

Your variable bar = "Bar 01/12/15" includes a space " "
If variable foo also should include a space as foo = "Foo 02/15/87"
You can just use .split on bar without entering a delimiter.
It will return ["Bar", "01/12/15"](remember to set your variable bar_array equal to it.)
However if you have a string like "1,2,3", you would need to enter a delimiter "," : "1,2,3".split(",") in order to get ["1","2","3"]. Otherwise, it will return ["1,2,3"]

How about a regex to match the date form and whatever is before it:
bar = "Bar 01/12/15"
foo = "Foo02/15/87"
pattern = /^(.*?)([0-9]{2}\/[0-9]{2}\/[0-9]{2})/
bar.scan(pattern).flatten.map(&:strip)
=> ["Bar", "01/12/15"]
foo.scan(pattern).flatten.map(&:strip)
=> ["Foo", "02/15/87"]

Split string to array with ruby

I have the string: "how to \"split string\" to \"following array\"" (how to "split string" to "following array").
I want to get the following array:
["how", "to", "split string", "to", "following array"]
I tried split(' ') but the result is:
["how", "to", "\"split", "string\"", "to", "\"following", "array\""]

x.split('"').reject(&:empty?).flat_map do |y|
y.start_with?(' ') || y.end_with?(' ') ? y.split : y
end
Explanation:
split('"') will partition the string in a way that non-quoted strings will have a leading or trailing space and the quoted ones wouldn't.
The following flat_map will further split an individual string by space only if it falls in the non-quoted category.
Note that if there are two consecutive quoted strings, the space in between will be it's own string after the first space and will completely disappear after the second. Aka:
'foo "bar" "baz"'.split('"') # => ["foo ", "bar", " ", "baz"]
' '.split # => []
The reject(&:empty?) is needed in case we start with a quoted string as
'"foo"'.split('"') # => ["", "foo"]

With x as your string:
x.split(?").each_slice(2).flat_map{|n, q| a = n.split; (a << q if q) || a }
When you split on quotes, you know for certain that each string in the array goes: non-quoted, quoted, non-quoted, quoted, non-quoted etc...
If we group these into pairs then we get one of the following two scenarios:
[ "non-quoted", "quoted" ]
[ "non-quoted", nil ] (only ever for the last pair of an unbalanced string)
For example 1, we split nq and append q
For example 2, we split nq and discard q
i.e.: a = n.split; (a << q if q) || q
Then we join all the pairs back up (the flat part of flat_map)