How do I combine elements in an array matching a pattern? - arrays

I have an array of strings
["123", "a", "cc", "dddd", "mi hello", "33"]
I want to join by a space consecutive elements that begin with a letter, have at least two characters, and do not contain a space. Applying that logic to the above would yield
["123", "a", "cc dddd", "mi hello", "33"]
Similarly if my array were
["mmm", "3ss", "foo", "bar", "foo", "55"]
I would want the result to be
["mm", "3ss", "foo bar foo", "55"]
How do I do this operation?

There are many ways to solve this; ruby is a highly expressive language. It would be most beneficial for you to show what you have tried so far, so that we can help debug/fix/improve your attempt.
For example, here is one possible implementation that I came up with:
def combine_words(array)
array
.chunk {|string| string.match?(/\A[a-z][a-z0-9]+\z/i) }
.flat_map {|concat, strings| concat ? strings.join(' ') : strings}
end
combine_words(["aa", "b", "cde", "f1g", "hi", "2j", "l3m", "op", "q r"])
# => ["aa", "b", "cde f1g hi", "2j", "l3m op", "q r"]
Note that I was a little unclear exactly how to interpret your requirement:
begin with a letter, have at least two characters, and do not contain a space
Can strings contain punctuation? Underscores? Utf-8 characters? I took it to mean "only a-z, A-Z or 0-9", but you may want to tweak this.
A literal interpretation of your requirement could be: /\A[[:alpha:]][^ ]+\z/, but I suspect that's not what you meant.
Explanation:
Enumerable#chunk will iterate through the array and collect terms by the block's response value. In this case, it will find sequential elements that match/don't match the required regex.
String#match? checks whether the string matches the pattern, and returns a boolean response. Note that if you were using ruby v2.3 or below, you'd have needed some workaround such as !!string.match, to force a boolean response.
Enumerable#flat_map then loops through each "result", joining the strings if necessary, and flattens the result to avoid returning any nested arrays.
Here is another, similar, solution:
def word?(string)
string.match?(/\A[a-z][a-z0-9]+\z/i)
end
def combine_words(array)
array
.chunk_while {|x, y| word?(x) && word?(y)}
.map {|group| group.join(' ')}
end
Or, here's a more "low-tech" solution - which only uses more basic language features. (I'm re-using the same word? method here):
def combine_words(array)
previous_was_word = false
result = []
array.each do |string|
if previous_was_word && word?(string)
result.last << " #{string}"
else
result << string
end
previous_was_word = word?(string)
end
result
end

You can use Enumerable#chunk.
def chunk_it(arr)
arr.chunk { |s|
(s.size > 1) && (s[0].match?(/\p{Alpha}/)) && !s.include?(' ')}.
flat_map { |tf,a| tf ? a.join(' ') : a }
end
chunk_it(["123", "a", "cc", "dddd", "mi hello", "33"])
#=> ["123", "a", "cc dddd", "mi hello", "33"]
chunk_it ["mmm", "3ss", "foo", "bar", "foo", "55"]

Related

Sorting an Array with Sort and Sort_by

I am trying to build a mental model around how the code below works and it is confusing me. I am trying to sort a string so every duplicate letter is together, but the capital letter comes first. It is solved, the method below will do it, but I want to know why do you have to sort it first? Does it keep the same position from the first sort? So when you call sort_by it then sorts by lowercase but the capital letters stay where they originally were? Can anyone break down step by step what is happening so I can understand this better?
def alpha(str)
str.chars.sort.sort_by { |ele| ele.downcase }.join
end
alpha("AaaaaZazzz") == "AaaaaaZzzz"
Let's rewrite your method as follows.
def alpha(str)
sorted_chars_by_case = str.chars.sort
puts "sorted_chars = #{sorted_chars}"
sorted_chars_by_downcased = sorted_chars_by_case.sort_by(&:downcase)
puts "sorted_chars_by_downcased = #{sorted_chars_by_downcased}"
sorted_chars_by_downcased.join
end
Then:
alpha("AaaaaZazzz")
sorted_chars_by_case = ["A", "Z", "a", "a", "a", "a", "a", "z", "z", "z"]
sorted_chars_indifferent = ["A", "a", "a", "a", "a", "a", "Z", "z", "z", "z"]
#=> "AaaaaaZzzz"
As you see, the first step, after converting the string to an array of characters, is to form an array whose first elements are upper-case letters, in order, followed by lower-case letters, also ordered.1 The second step is sort sorted_chars_by_case without reference to case. That array is then joined to return the desired string, "AaaaaaZzzz".
While this gives the desired result, it is only happenstance that it does. A different sorting method could well have returned, say, "aaaaAazZzz", because "A" is treated the same as "a" in the second sort.
What you want is a two-level sort; sort by characters, case-indifferent, then when there are ties ("A" and "a", for example), sort the upper-case letter first. You can do that by sorting two-element arrays.
def alpha(str)
str.each_char.sort_by { |ele| [ele.downcase, ele] }.join
end
Then
alpha("AaaaaZazzz")
#=> "AaaaaaZzzz"
When sorting arrays the method Array#<=> is used to order two arrays. Note in particular the third paragraph of that doc.
If "A" and "z" are being ordered, for example, Ruby compares the arrays
a1 = ["a", "A"]
a2 = ["z", "z"]
As a1.first < a2.first #=> true, we see that a1 <=> a2 #=> -1, so "A" precedes "z" in the sort. Here a1.last and a2.last are not examined.
Now suppose "z" and "Z" are being ordered. Ruby compares the arrays
a1 = ["z", "z"]
a2 = ["z", "Z"]
As a1.first equals a2.first, a1.last and a2.last are compared to break the tie. Since "z" > "Z" #=> true, a1 <=> a2 #=> 1, so "Z" precedes "z" in the sort.
Note that I replace str.chars with str.each_char. It's generally a small thing, but String#chars returns an array of characters, whereas String#each_char returns an enumerator, and therefore is more space-efficient.
Sometimes you need to return an array, and therefore you must use chars. An example is str.chars.cycle, where you are chaining to the Array method cycle. On the other hand, if you are chaining to an enumerator (an instance of the class Enumerator), you must use each_char, an example being str.each_char.with_object([]) ....
Often, however, you have a choice: str.chars.sort, using Array#sort, or str.each_char.sort, using Enumerable#sort. In those situations each_char is preferred because of the reduced memory requirement. The rule, therefore, is to use chars when you are chaining to an Array method, otherwise use each_char.
1. sort_by(&:downcase) can be thought of as shorthand for sort_by { |ele| ele.downcase }.
You can't depend on the stability of sort in Ruby
This is an interesting question. Whether or not a sort preserves the order of equal elements is its "stability." A sort is stable if it is guaranteed to preserve the order of equal elements, and unstable if it has no such guarantee. An unstable sort may by chance return equal elements in their original order, or not.
In MRI 2.7.1, sort happens to be stable, but it is actually implementation defined whether or not it is. See https://stackoverflow.com/a/44486562/238886 for all the juicy details, including code you can run in your Ruby to find out if your sort happens to be stable. But whether or not your sort is stable, you should not depend on it.
A stable sort does indeed return the result you are expecting, and it does so whether or not you include the .sort:
2.7.1 :035 > "AaaaaZazzz".chars.sort_by { |ele| ele.downcase }.join
=> "AaaaaaZzzz"
2.7.1 :036 > "AaaaaZazzz".chars.sort.sort_by { |ele| ele.downcase }.join
=> "AaaaaaZzzz"
But you can make sort act stable when you need
In order to not depend up on the stability of the sort, which could change when you move your code to another Ruby version or implementation, you can enforce stability like this:
"AaaaaZazzz".chars.sort_by.with_index { |ele, i| [ele.downcase, i] }.join
=> "AaaaaaZzzz"
How does unstable sort behave
We can force Ruby 2.7.1's sort to be unstable by adding a random number as a secondary sort order:
2.7.1 :040 > "AaaaaZazzz".chars.sort.sort_by { |ele, i| [ele.downcase, rand] }.join
=> "AaaaaaZzzz"
2.7.1 :041 > "AaaaaZazzz".chars.sort.sort_by { |ele, i| [ele.downcase, rand] }.join
=> "aaaaAazzZz"
Note how we got the same answer as stable sort the first time, but then a different answer? That's a demonstration of how an unstable sort can, by chance, give you the same results as a stable sort. But you can't count on it.
First you sort in this code all characters alphabetically based on the collating sequence of the underlying encoding, and then you sort the characters in a way that upper and lower case characters are treated equivalent. This cancels the effect of the first sort. Hence the output is equivalent to str.chars.sort_by(&:downcase), which would IMO a more sensible way to write the expression.
The first sort has no effect and is therefore just a cycle stealer. BTW: Since the stability of Ruby sort is unspecified, and in particular MRI Ruby is known to be unstable, you have no control about the relative order of individual characters which are considered equivalent in sort order. Note also that the result depends on the locale, because this decides whether - for instance - the letters Б and б are considered the same in sort order or different.

How to match a string from an array at the beginning of another string

I want to case-insensitively match a string from my array, TOKENS, at the beginning of another string followed by a space or the end of the line.
My tokens array looks like:
2.4.0 :013 > TOKENS = ["m", "o"]
=> ["m", "o"]
When I try to match each element from my array, it is picking out the wrong results:
2.4.0 :009 > data_col = ["M", "b", "Mabc", "abc m b"]
=> ["M", "b", "Mabc", "abc m b"]
...
2.4.0 :015 > data_col.select{|string| string =~ /^[#{Regexp.union(TOKENS)}]([[:space:]]|$)/i }
=> ["M", "b"]
This is matching both the "M" and the "b" entries even though "b" does not appear in my list of TOKENS. How do I modify my regular expression so that only the proper value, "M" will be matched?
I'm using Ruby 2.4.
I'd use:
TOKENS = ["m", "o"]
DATA_COL = ["M", "b", "Mabc", "abc m b"]
RE = /^(?:#{Regexp.union(TOKENS).source})(?: |$)/i
DATA_COL.select{ |string| string[RE] }
# => ["M"]
Breaking it down a bit:
Regexp.union(TOKENS).source # => "m|o"
/^(?:#{Regexp.union(TOKENS).source})(?: |$)/i # => /^(?:m|o)(?: |$)/i
/^[#{Regexp.union(TOKENS)}]([[:space:]]|$)/i isn't a good idea inside a loop. Each time through you force Ruby to create the pattern; Efficiency is important inside loops, especially big ones, so create the pattern outside the loop then refer to the pattern inside.
The next problem is that Regexp.union has a concept of the correct case it should match:
Regexp.union(TOKENS).to_s # => "(?-mix:m|o)"
The (?-mix: part is how the Regular Expression engine remembers the flags for the pattern. When the pattern is embedded inside another pattern they continue to know what they should look for, causing us to gnash our teeth and weep:
/#{Regexp.union(TOKENS)}/i # => /(?-mix:m|o)/i
The trailing i is telling the pattern it should ignore case, but the embedded i is not set so it's honoring case. And that's what is breaking your pattern.
The fix is to use source when embedding like I did above.
See the Regex "options" section for more information.

Split string to array with ruby

I have the string: "how to \"split string\" to \"following array\"" (how to "split string" to "following array").
I want to get the following array:
["how", "to", "split string", "to", "following array"]
I tried split(' ') but the result is:
["how", "to", "\"split", "string\"", "to", "\"following", "array\""]
x.split('"').reject(&:empty?).flat_map do |y|
y.start_with?(' ') || y.end_with?(' ') ? y.split : y
end
Explanation:
split('"') will partition the string in a way that non-quoted strings will have a leading or trailing space and the quoted ones wouldn't.
The following flat_map will further split an individual string by space only if it falls in the non-quoted category.
Note that if there are two consecutive quoted strings, the space in between will be it's own string after the first space and will completely disappear after the second. Aka:
'foo "bar" "baz"'.split('"') # => ["foo ", "bar", " ", "baz"]
' '.split # => []
The reject(&:empty?) is needed in case we start with a quoted string as
'"foo"'.split('"') # => ["", "foo"]
With x as your string:
x.split(?").each_slice(2).flat_map{|n, q| a = n.split; (a << q if q) || a }
When you split on quotes, you know for certain that each string in the array goes: non-quoted, quoted, non-quoted, quoted, non-quoted etc...
If we group these into pairs then we get one of the following two scenarios:
[ "non-quoted", "quoted" ]
[ "non-quoted", nil ] (only ever for the last pair of an unbalanced string)
For example 1, we split nq and append q
For example 2, we split nq and discard q
i.e.: a = n.split; (a << q if q) || q
Then we join all the pairs back up (the flat part of flat_map)

How can i get the value from array without using regex in ruby?

From the array of string I need to get string which starts with age- followed by maximum of 2 digit number and optional '+' sign.
Ex: age-1, age-22, age55, age-1+, age-15+
Following is my array:
arr = ["vintage-colllections","age-5"]
or
arr = ["vintage-colllections","age-51+"]
I will extract age "age-5" or "age-51+" from the array.
I tried following things:
arr.find {|e| e.include?"age-"}
Works well for other scenarios but in above the 1st element of array also includes (vint)age- failing there.
arr.find { |e| /age-\d*\+?/ =~ e}
Works fine but I am trying to avoid regex.
Is there any other better approach ?.
Any suggestions are welcome.
Use start_with?:
arr.find { |e| e.start_with?("age-") }
I must grit my teeth to not use a regex, but here goes. I assume the question is as described in a comment I left on the question.
def find_str(arr)
arr.map { |str| str_match(str) }.compact
end
def str_match(str)
return nil unless str[0,4] == "age-"
last = str[-1] == '+' ? -2 : -1
Integer(str[4..last]) rescue return nil
str
end
find_str ["cat", "age-5"] #=> ["age-5"]
find_str ["cat", "age-51+"] #=> ["age-51+"]
find_str ["cat", "age-5t1+"] #=> []
find_str ["cat", "xage-51+"] #=> []

Sort an array that has normal string elements and "number-like" string elements

I have an array that includes float-like strings like "4.5", and regular strings like "Hello". I want to sort the array so that regular strings come at the end and the float-like strings come before them and are sorted by their float value.
I did:
#arr.sort {|a,b| a.to_f <=> b.to_f }
arr = ["21.4", "world", "6.2", "1.1", "hello"]
arr.sort_by { |s| Float(s) rescue Float::INFINITY }
#=> ["1.1", "6.2", "21.4", "world", "hello"]
sort in ruby 1.9+
["1.2", "World", "6.7", "3.4", "Hello"].sort
will return
["1.2", "3.4", "6.7", "Hello", "World"]
You can use #cary solution for certain edge cases eg ["10.0","3.2","hey","world"]
Quick and dirty:
arry = ["1", "world", "6", "21", "hello"]
# separate "number" strings from other strings
tmp = arry.partition { |x| Float(x) rescue nil }
# sort the "numbers" by their numberic value
tmp.first.sort_by!(&:to_f)
# join them all in a single array
tmp.flatten!
Will probably suit your needs

Resources