Split by multiple delimiters in Ruby - arrays

I need to make an array from a string, and I have to use multiple delimiters (apart from the space):
! # $ # % ^ & * ( ) - = _ + [ ] : ; , . / < > ? \ |
I read here and here, and the solution seems to be to use:
my_string.split(/[\s!#$#%^&*()-=_+[]:;,./<>?\|]/)
This is the exercise:
Given a sentence, return an array containing every other word.
Punctuation is not part of the word unless it is a contraction.
In order to not have to write an actual language parser, there won't be any punctuation too complex.
There will be no " ' " that is not part of a contraction.
Assume each of these charactsrs are not to be considered:
! # $ # % ^ & * ( ) - = _ + [ ] : ; , . / < > ? \ |
Examples:
alternate_words("Lorem ipsum dolor sit amet.") # => ["Lorem", "dolor", "amet"]
alternate_words("Can't we all get along?") # => ["Can't", "all", "along"]
alternate_words("Elementary, my dear Watson!") # => ["Elementary", "dear"]
This is how I'm trying to do it:
def every_other_word(sentence)
my_words = []
words = sentence.split(/[\s!#$^&*()-=_+[\]:;,.\/#%<>?\|]/)
words.each_with_index do |w, i|
next if i.odd?
my_words << w
end
my_words
end
This is the error I get:
$ ruby ./session2/3-challenge/7_array.rb
./session2/3-challenge/7_array.rb:14: premature end of char-class: /[\s!#$^&*()-=_+[\]:;,.\/#%<>?\|]/

Most of the mentioned delimiting characters have special meaning in regular expression literals. For example, ] isn't the ] character but the end of a character class. The linked page should list all of them and explain their meaning.
Those characters need to be escaped in regular expression literals by preceding each with \. In this character class -, [, ], / and \ need to be escaped (^ only needs to if it's the first character and - only if it isn't the last character which it isn't):
/[\s!#$#%^&*()\-=_+\[\]:;,.\/<>?\\|]/
You can also let Ruby do the work with Regexp.escape (aka Regexp.quote). It escapes every special character but the resulting regular expression will be equivalent:
escaped_characters = Regexp.escape('!#$#%^&*()-=_+[]:;,./<>?\|')
/[\s#{escaped_characters}]/
By the way, \s isn't just space like within double-quoted string literals (a weird feature), it matches other ASCII whitespace characters too (\n, \t, \r, \f and \v).

You have been told there are no apostrophes and you are to disregard:
BADDIES = '!#$#%^&*()-=_+[]:;,./<>?\|'
so why not:
remove the BADDIES with String#ydelete;
split the string into words with String#split;
group the words in pairs with Enumerable#each_slice; and
select the first word of each pair Enumerable#first and Enumerable#map.
We can write:
str = "Now it the time for all good Rubiests to come to the aid of their " +
"fellow coders (except for Bob)! Is that not true?"
str.delete(BADDIES).split.each_slice(2).map(&:first)
#=> ["Now", "the", "for", "good", "to", "to", "aid", "their",
# "coders", "for", "Is", "not"]
Look, Ma! No regex!

Related

How to split string without defined delimeter

I have a string that looks like this:
bar = "Bar 01/12/15"
foo = "Foo02/15/87"
How can a split those variables so that resulting array contains:
bar_array = ["Bar", "01/12/15"]
foo_array = ["Foo","02/15/87"]
r = /(?<=[[:alpha:]]) ?(?=\d)/
"Bar 01/12/15".split(r)
#=> ["Bar", "01/12/15"]
"Foo02/15/87".split(r)
#=> ["Foo", "02/15/87"]
The regular expression reads
match a letter in a positive lookbehind
match 0 or 1 spaces
match a digit in a positive lookahead
If your string will always have that dd/mm/yy format at the end, you can create a method that takes the last 8 characters from the string and return both values (remaining string and date) as an array, something like this:
def to_array(string)
date = string[-8..-1]
[string.delete(date).strip, date]
end
to_array(bar)
#=> ["Bar", "01/12/15"]
to_array(foo)
# => ["Foo", "02/15/87"]
Given that the provided string (as in your examples):
Contains one word and a date (with zero or more spaces between them)
The date is formed with 8 characters (i.e. ##/##/## format)
The date is at the end of the string
You could do the following:
bar.sub(/(.{8})\z/, ' \1').split
#=> ["Bar", "01/12/15"]
sub(/(.{8})\z/, ' \1') will add a space before the date
split will split the string where a space (or more) is found
regex works
"a,b'c d".split /\s|'|,/
# => ["a", "b", "c", "d"]
here's some documentation on regular expressions
http://rubylearning.com/satishtalim/ruby_regular_expressions.html
Your variable bar = "Bar 01/12/15" includes a space " "
If variable foo also should include a space as foo = "Foo 02/15/87"
You can just use .split on bar without entering a delimiter.
It will return ["Bar", "01/12/15"](remember to set your variable bar_array equal to it.)
However if you have a string like "1,2,3", you would need to enter a delimiter "," : "1,2,3".split(",") in order to get ["1","2","3"]. Otherwise, it will return ["1,2,3"]
How about a regex to match the date form and whatever is before it:
bar = "Bar 01/12/15"
foo = "Foo02/15/87"
pattern = /^(.*?)([0-9]{2}\/[0-9]{2}\/[0-9]{2})/
bar.scan(pattern).flatten.map(&:strip)
=> ["Bar", "01/12/15"]
foo.scan(pattern).flatten.map(&:strip)
=> ["Foo", "02/15/87"]

How do I replace consecutive occurrences of white space in each element of my array?

Using Ruby 2.4. I have an array of strings. I want to strip off non-breaking and breaking space from the end of each item in the array as well as replace multiple consecutive occurrences of white space with a single white space. I thought teh below was the way, but I get an error
> words = ["1", "HUMPHRIES \t\t\t\t\t\t\t\t\t\t\t\t\t\t, \t\t\t\t\t\t\t\t\t\t\t\t\tJASON", "328", "FAIRVIEW, OR (US)", "US", "M", " 27 ", "00:27:30.00 \t\t\t\t\t\t\t\t\t\t\t \n"]
> words.map{|word| word ? word.gsub!(/\A\p{Space}+|\p{Space}+\z/, '').gsub!(/[[:space:]]+/, ' ') : nil }
NoMethodError: undefined method `gsub!' for nil:NilClass
from (irb):4:in `block in irb_binding'
from (irb):4:in `map'
from (irb):4
from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands/console.rb:65:in `start'
from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands/console_helper.rb:9:in `start'
from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands/commands_tasks.rb:78:in `console'
from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands/commands_tasks.rb:49:in `run_command!'
from /Users/nataliab/.rvm/gems/ruby-2.4.0/gems/railties-5.0.2/lib/rails/commands.rb:18:in `<top (required)>'
from bin/rails:4:in `require'
from bin/rails:4:in `<main>'
How can I properly replace consecutive occurrences of white space as well as strip it off from each word in the array?
Do it with simple gsub not gsub!
words.map do |w|
#respond_to?(:gsub) if you are not sure that array only from strings
w.gsub(/(?<=[^\,\.])\s+|\A\s+/, '') if w.respond_to?(:gsub)
end
Because gsub! can return nil if don't change the string and then you try to do gsub! again with nil. That's why you get an undefined method gsub!' for nil:NilClass error.
From gsub! explanation in ruby doc:
Performs the substitutions of String#gsub in place, returning str, or
nil if no substitutions were performed. If no block and no replacement
is given, an enumerator is returned instead.
As mentioned #CarySwoveland in comments \s doesn't handle non-breaking spaces. To handle it you should use [[:space:]] insted of \s.
You can use the following:
words.map { |w| w.gsub(/(?<=[^\,\.])\s+/,'') }
#=> ["1", "HUMPHRIES, JASON", "328", "FAIRVIEW,
# OR(US)", "US", "M", " 27", "00:27:30.00"]
I assume all whitespace and non-breaking spaces at the send of each string are to be removed and, of what's left, all substrings of whitespace characters and non-breaking spaces is to be replaced by one space. (Natalia, if that's not correct please let me know in a comment.)
words =
["1",
"HUMPHRIES \t\t\t\, \t\t\t\t\t\t\t\t\t\t\t\t\tJASON",
" M\u00A0 \u00A0",
" 27 ",
"00:27:30.00 \t\t\t\t\t\t\t\t\t\t\t \n"]
R = /
[[:space:]] # match a POSIX bracket expression for one character
(?=[[:space:]]) # match a POSIX bracket expression for in a positive lookahead
| # or
[[:space:]]+ # match a POSIX bracket expression one or more times
\z # match end of string
/x # free-spacing regex definition mode
words.map { |w| w.gsub(R, '').gsub(/[[:space:]]/, ' ') }
#=> ["1", "HUMPHRIES , JASON", " M", " 27", "00:27:30.00"]
Note that the POSIX [[:space:]] includes ASCII whitespace and Unicode's non-breaking space character, \u00A0.
To see why the second gsub is needed, note that
words.map { |w| w.gsub(R, '') }
#=> ["1", "HUMPHRIES\t,\tJASON", " M", " 27", "00:27:30.00"]

Split string to array with ruby

I have the string: "how to \"split string\" to \"following array\"" (how to "split string" to "following array").
I want to get the following array:
["how", "to", "split string", "to", "following array"]
I tried split(' ') but the result is:
["how", "to", "\"split", "string\"", "to", "\"following", "array\""]
x.split('"').reject(&:empty?).flat_map do |y|
y.start_with?(' ') || y.end_with?(' ') ? y.split : y
end
Explanation:
split('"') will partition the string in a way that non-quoted strings will have a leading or trailing space and the quoted ones wouldn't.
The following flat_map will further split an individual string by space only if it falls in the non-quoted category.
Note that if there are two consecutive quoted strings, the space in between will be it's own string after the first space and will completely disappear after the second. Aka:
'foo "bar" "baz"'.split('"') # => ["foo ", "bar", " ", "baz"]
' '.split # => []
The reject(&:empty?) is needed in case we start with a quoted string as
'"foo"'.split('"') # => ["", "foo"]
With x as your string:
x.split(?").each_slice(2).flat_map{|n, q| a = n.split; (a << q if q) || a }
When you split on quotes, you know for certain that each string in the array goes: non-quoted, quoted, non-quoted, quoted, non-quoted etc...
If we group these into pairs then we get one of the following two scenarios:
[ "non-quoted", "quoted" ]
[ "non-quoted", nil ] (only ever for the last pair of an unbalanced string)
For example 1, we split nq and append q
For example 2, we split nq and discard q
i.e.: a = n.split; (a << q if q) || q
Then we join all the pairs back up (the flat part of flat_map)

Append strings to array if found in paragraph using `.match` in Ruby

I'm attempting to search a paragraph for each word in an array, and then output a new array with only the words that could be found.
But I've been unable to get the desired output format so far.
paragraph = "Japan is a stratovolcanic archipelago of 6,852 islands.
The four largest are Honshu, Hokkaido, Kyushu and Shikoku, which make up about ninety-seven percent of Japan's land area.
The country is divided into 47 prefectures in eight regions."
words_to_find = %w[ Japan archipelago fishing country ]
words_found = []
words_to_find.each do |w|
paragraph.match(/#{w}/) ? words_found << w : nil
end
puts words_found
Currently the output I'm getting is a vertical list of printed words.
Japan
archipelago
country
But I would like something like, ['Japan', 'archipelago', 'country'].
I don't have much experience matching text in a paragraph and am not sure what I'm doing wrong here. Could anyone give some guidance?
this is because you are using puts to print the elements of the array . appending "\n" to the end of every element "word":
#!/usr/bin/env ruby
def run_me
paragraph = "Japan is a stratovolcanic archipelago of 6,852 islands.
the four largest are Honshu, Hokkaido, Kyushu and Shikoku, which make up about ninety-seven percent of Japan's land area.
the country is divided into 47 prefectures in eight regions."
words_to_find = %w[ Japan archipelago fishing country ]
find_words_from_a_text_file paragraph , words_to_find
end
def find_words_from_a_text_file( paragraph , *words_to_find )
words_found = []
words_to_find.each do |w|
paragraph.match(/#{w}/) ? words_found << w : nil
end
# print array with enum .
words_found.each { |x| puts "with enum and puts : : #{x}" }
# or just use "print , which does not add anew line"
print "with print :"; print words_found "\n"
# or with p
p words_found
end
run_me
outputs :
za:ruby_dir za$ ./fooscript.rb
with enum and puts : : ["Japan", "archipelago", "fishing", "country"]
with print :[["Japan", "archipelago", "fishing", "country"]]
Here are a couple of ways to do that. Both are case-indifferent.
Use a regular expression
r = /
\b # Match a word break
#{ Regexp.union(words_to_find) } # Match any word in words_to_find
\b # Match a word break
/xi # Free-spacing regex definition mode (x)
# and case-indifferent (i)
#=> /
# \b # Match a word break
# (?-mix:Japan|archipelago|fishing|country) # Match any word in words_to_find
# \b # Match a word break
# /ix # Free-spacing regex definition mode (x)
# and case-indifferent (i)
paragraph.scan(r).uniq(&:itself)
#=> ["Japan", "archipelago", "country"]
Intersect two arrays
words_to_find_hash = words_to_find.each_with_object({}) { |w,h| h[w.downcase] = w }
#=> {"japan"=>"Japan", "archipelago"=>"archipelago", "fishing"=>"fishing",
"country"=>"country"}
words_to_find_hash.values_at(*paragraph.delete(".;:,?'").
downcase.
split.
uniq & words_to_find_hash.keys)
#=> ["Japan", "archipelago", "country"]

How to split a delimited string in Ruby and convert it to an array?

I have a string
"1,2,3,4"
and I'd like to convert it into an array:
[1,2,3,4]
How?
>> "1,2,3,4".split(",")
=> ["1", "2", "3", "4"]
Or for integers:
>> "1,2,3,4".split(",").map { |s| s.to_i }
=> [1, 2, 3, 4]
Or for later versions of ruby (>= 1.9 - as pointed out by Alex):
>> "1,2,3,4".split(",").map(&:to_i)
=> [1, 2, 3, 4]
"1,2,3,4".split(",") as strings
"1,2,3,4".split(",").map { |s| s.to_i } as integers
For String Integer without space as String
arr = "12345"
arr.split('')
output: ["1","2","3","4","5"]
For String Integer with space as String
arr = "1 2 3 4 5"
arr.split(' ')
output: ["1","2","3","4","5"]
For String Integer without space as Integer
arr = "12345"
arr.split('').map(&:to_i)
output: [1,2,3,4,5]
For String
arr = "abc"
arr.split('')
output: ["a","b","c"]
Explanation:
arr -> string which you're going to perform any action.
split() -> is an method, which split the input and store it as array.
'' or ' ' or ',' -> is an value, which is needed to be removed from given string.
the simplest way to convert a string that has a delimiter like a comma is just to use the split method
"1,2,3,4".split(',') # "1", "2", "3", "4"]
you can find more info on how to use the split method in the ruby docs
Divides str into substrings based on a delimiter, returning an array
of these substrings.
If pattern is a String, then its contents are used as the delimiter
when splitting str. If pattern is a single space, str is split on
whitespace, with leading whitespace and runs of contiguous whitespace
characters ignored.
If pattern is a Regexp, str is divided where the pattern matches.
Whenever the pattern matches a zero-length string, str is split into
individual characters. If pattern contains groups, the respective
matches will be returned in the array as well.
If pattern is omitted, the value of $; is used. If $; is nil (which is
the default), str is split on whitespace as if ` ‘ were specified.
If the limit parameter is omitted, trailing null fields are
suppressed. If limit is a positive number, at most that number of
fields will be returned (if limit is 1, the entire string is returned
as the only entry in an array). If negative, there is no limit to the
number of fields returned, and trailing null fields are not
suppressed.
"12345".each_char.map(&:to_i)
each_char does basically the same as split(''): It splits a string into an array of its characters.
hmmm, I just realize now that in the original question the string contains commas, so my answer is not really helpful ;-(..

Resources