How to split string without defined delimeter - arrays

I have a string that looks like this:
bar = "Bar 01/12/15"
foo = "Foo02/15/87"
How can a split those variables so that resulting array contains:
bar_array = ["Bar", "01/12/15"]
foo_array = ["Foo","02/15/87"]

r = /(?<=[[:alpha:]]) ?(?=\d)/
"Bar 01/12/15".split(r)
#=> ["Bar", "01/12/15"]
"Foo02/15/87".split(r)
#=> ["Foo", "02/15/87"]
The regular expression reads
match a letter in a positive lookbehind
match 0 or 1 spaces
match a digit in a positive lookahead

If your string will always have that dd/mm/yy format at the end, you can create a method that takes the last 8 characters from the string and return both values (remaining string and date) as an array, something like this:
def to_array(string)
date = string[-8..-1]
[string.delete(date).strip, date]
end
to_array(bar)
#=> ["Bar", "01/12/15"]
to_array(foo)
# => ["Foo", "02/15/87"]

Given that the provided string (as in your examples):
Contains one word and a date (with zero or more spaces between them)
The date is formed with 8 characters (i.e. ##/##/## format)
The date is at the end of the string
You could do the following:
bar.sub(/(.{8})\z/, ' \1').split
#=> ["Bar", "01/12/15"]
sub(/(.{8})\z/, ' \1') will add a space before the date
split will split the string where a space (or more) is found

regex works
"a,b'c d".split /\s|'|,/
# => ["a", "b", "c", "d"]
here's some documentation on regular expressions
http://rubylearning.com/satishtalim/ruby_regular_expressions.html

Your variable bar = "Bar 01/12/15" includes a space " "
If variable foo also should include a space as foo = "Foo 02/15/87"
You can just use .split on bar without entering a delimiter.
It will return ["Bar", "01/12/15"](remember to set your variable bar_array equal to it.)
However if you have a string like "1,2,3", you would need to enter a delimiter "," : "1,2,3".split(",") in order to get ["1","2","3"]. Otherwise, it will return ["1,2,3"]

How about a regex to match the date form and whatever is before it:
bar = "Bar 01/12/15"
foo = "Foo02/15/87"
pattern = /^(.*?)([0-9]{2}\/[0-9]{2}\/[0-9]{2})/
bar.scan(pattern).flatten.map(&:strip)
=> ["Bar", "01/12/15"]
foo.scan(pattern).flatten.map(&:strip)
=> ["Foo", "02/15/87"]

Related

How do I combine elements in an array matching a pattern?

I have an array of strings
["123", "a", "cc", "dddd", "mi hello", "33"]
I want to join by a space consecutive elements that begin with a letter, have at least two characters, and do not contain a space. Applying that logic to the above would yield
["123", "a", "cc dddd", "mi hello", "33"]
Similarly if my array were
["mmm", "3ss", "foo", "bar", "foo", "55"]
I would want the result to be
["mm", "3ss", "foo bar foo", "55"]
How do I do this operation?
There are many ways to solve this; ruby is a highly expressive language. It would be most beneficial for you to show what you have tried so far, so that we can help debug/fix/improve your attempt.
For example, here is one possible implementation that I came up with:
def combine_words(array)
array
.chunk {|string| string.match?(/\A[a-z][a-z0-9]+\z/i) }
.flat_map {|concat, strings| concat ? strings.join(' ') : strings}
end
combine_words(["aa", "b", "cde", "f1g", "hi", "2j", "l3m", "op", "q r"])
# => ["aa", "b", "cde f1g hi", "2j", "l3m op", "q r"]
Note that I was a little unclear exactly how to interpret your requirement:
begin with a letter, have at least two characters, and do not contain a space
Can strings contain punctuation? Underscores? Utf-8 characters? I took it to mean "only a-z, A-Z or 0-9", but you may want to tweak this.
A literal interpretation of your requirement could be: /\A[[:alpha:]][^ ]+\z/, but I suspect that's not what you meant.
Explanation:
Enumerable#chunk will iterate through the array and collect terms by the block's response value. In this case, it will find sequential elements that match/don't match the required regex.
String#match? checks whether the string matches the pattern, and returns a boolean response. Note that if you were using ruby v2.3 or below, you'd have needed some workaround such as !!string.match, to force a boolean response.
Enumerable#flat_map then loops through each "result", joining the strings if necessary, and flattens the result to avoid returning any nested arrays.
Here is another, similar, solution:
def word?(string)
string.match?(/\A[a-z][a-z0-9]+\z/i)
end
def combine_words(array)
array
.chunk_while {|x, y| word?(x) && word?(y)}
.map {|group| group.join(' ')}
end
Or, here's a more "low-tech" solution - which only uses more basic language features. (I'm re-using the same word? method here):
def combine_words(array)
previous_was_word = false
result = []
array.each do |string|
if previous_was_word && word?(string)
result.last << " #{string}"
else
result << string
end
previous_was_word = word?(string)
end
result
end
You can use Enumerable#chunk.
def chunk_it(arr)
arr.chunk { |s|
(s.size > 1) && (s[0].match?(/\p{Alpha}/)) && !s.include?(' ')}.
flat_map { |tf,a| tf ? a.join(' ') : a }
end
chunk_it(["123", "a", "cc", "dddd", "mi hello", "33"])
#=> ["123", "a", "cc dddd", "mi hello", "33"]
chunk_it ["mmm", "3ss", "foo", "bar", "foo", "55"]

Split string to array with ruby

I have the string: "how to \"split string\" to \"following array\"" (how to "split string" to "following array").
I want to get the following array:
["how", "to", "split string", "to", "following array"]
I tried split(' ') but the result is:
["how", "to", "\"split", "string\"", "to", "\"following", "array\""]
x.split('"').reject(&:empty?).flat_map do |y|
y.start_with?(' ') || y.end_with?(' ') ? y.split : y
end
Explanation:
split('"') will partition the string in a way that non-quoted strings will have a leading or trailing space and the quoted ones wouldn't.
The following flat_map will further split an individual string by space only if it falls in the non-quoted category.
Note that if there are two consecutive quoted strings, the space in between will be it's own string after the first space and will completely disappear after the second. Aka:
'foo "bar" "baz"'.split('"') # => ["foo ", "bar", " ", "baz"]
' '.split # => []
The reject(&:empty?) is needed in case we start with a quoted string as
'"foo"'.split('"') # => ["", "foo"]
With x as your string:
x.split(?").each_slice(2).flat_map{|n, q| a = n.split; (a << q if q) || a }
When you split on quotes, you know for certain that each string in the array goes: non-quoted, quoted, non-quoted, quoted, non-quoted etc...
If we group these into pairs then we get one of the following two scenarios:
[ "non-quoted", "quoted" ]
[ "non-quoted", nil ] (only ever for the last pair of an unbalanced string)
For example 1, we split nq and append q
For example 2, we split nq and discard q
i.e.: a = n.split; (a << q if q) || q
Then we join all the pairs back up (the flat part of flat_map)

How to split string based on pre-defined values from Array

I want to split a string based on an array that I define as a constant at the start:
class Query
OPERATOR = [':','=','<','>','<=','>=']
def initialize(params)
#Here i want to split given params if it contains any
#of the operators from OPERATOR
end
end
Query.new(["Status<=xyz","Org=abc"])
How can I do this?
OPERATOR = ['<=','=>',':','=','<','>']
r = /\s*#{ Regexp.union(OPERATOR) }\s*/
#=> /\s*(?-mix:<=|=>|:|=|<|>)\s*/
str = "Now: is the =time for all <= to =>"
str.split(r)
#=> ["Now", "is the", "time for all", "to"]
Note that I reordered the elements of OPERATOR so that '<=' and '=>' (each comprised of two strings of length one in the array) are at the beginning. If that is not done,
OPERATOR = [':','=','<','>','<=','>=']
r = /\s*#{ Regexp.union(OPERATOR) }\s*/
#=> /\s*(?-mix::|=|<|>|<=|>=)\s*/
str.split(r)
#=> ["Now", "is the", "time for all", "", "to"]
str.split(r)
See Regexp::union.

Remove array elements that are present in another array

There is a list of words and list of banned words. I want to go through the word list and redact all the banned words. This is what I ended up doing (notice the catched boolean):
puts "Give input text:"
text = gets.chomp
puts "Give redacted word:"
redacted = gets.chomp
words = text.split(" ")
redacted = redacted.split(" ")
catched = false
words.each do |word|
redacted.each do |redacted_word|
if word == redacted_word
catched = true
print "REDACTED "
break
end
end
if catched == true
catched = false
else
print word + " "
end
end
Is there any proper/efficient way?
It also can works.
words - redacted
+, -, &, these methods are very simple and useful.
irb(main):016:0> words = ["a", "b", "a", "c"]
=> ["a", "b", "a", "c"]
irb(main):017:0> redacted = ["a", "b"]
=> ["a", "b"]
irb(main):018:0> words - redacted
=> ["c"]
irb(main):019:0> words + redacted
=> ["a", "b", "a", "c", "a", "b"]
irb(main):020:0> words & redacted
=> ["a", "b"]
You can use .reject to exclude all banned words that are present in the redacted array:
words.reject {|w| redacted.include? w}
Demo
If you want to get the list of banned words that are present in the words array, you can use .select:
words.select {|w| redacted.include? w}
Demo
This might be a bit more 'elegant'. Whether it's more or less efficient than your solution, I don't know.
puts "Give input text:"
original_text = gets.chomp
puts "Give redacted word:"
redacted = gets.chomp
redacted_words = redacted.split
print(
redacted_words.inject(original_text) do |text, redacted_word|
text.gsub(/\b#{redacted_word}\b/, 'REDACTED')
end
)
So what's going on here?
I'm using String#split without an argument, because ' ' is the default, anyway.
With Array#inject, the following block (staring at do and ending at end is executed for each element in the array—in this case, our list of forbidden words.
In each round, the second argument to the block will be the respective element from the array
The first argument to the block will be the block's return value from the previous round. For the first round, the argument to the inject function (in our case original_text) will be used.
The block's return value from the last round will be used as return value of the inject function.
In the block, I replace all occurrences of the currently handled redacted word in the text.
String#gsub performs a global substitution
As the pattern to be substituted, I use a regexp literal (/.../). Except, it's not really a literal as I'm performing a string substitution (#{...}) on it to get the currently handled redacted word into it.
In the regexp, I'm surrounding the word to be redacted with \b word boundary matchers. They match the boundary between alphanumeric and non-alphanumeric characters (or vice verca), without matching any of the characters themselves. (They match the zero-lenght 'position' between the characters.) If a string starts or ends with alphanumeric characters, \b will also match the start or end of the string, respectively, so that we can use it to match whole words.
The result of inject (which is the result of the last execution of the block, i.e., the text when all the substitutions have taken place) is passed as an argument to print, which will output the now redacted text.
Note that, other than your solution, mine will not consider punctuation as parts of adjacent words.
Also note that my solution will be vulnerable to regex injection.
Example 1:
Give input text:
A fnord is a fnord.
Give redacted word:
ford fnord foo
My output:
A REDACTED is a REDACTED.
Your output:
A REDACTED is a fnord.
Example 2:
Give input text:
A fnord is a fnord.
Give redacted word:
fnord.
My output:
A REDACTEDis a fnord.
(Note how the . was interpreted to match any character.)
Your output:
A fnord is a REDACTED.

How to split a delimited string in Ruby and convert it to an array?

I have a string
"1,2,3,4"
and I'd like to convert it into an array:
[1,2,3,4]
How?
>> "1,2,3,4".split(",")
=> ["1", "2", "3", "4"]
Or for integers:
>> "1,2,3,4".split(",").map { |s| s.to_i }
=> [1, 2, 3, 4]
Or for later versions of ruby (>= 1.9 - as pointed out by Alex):
>> "1,2,3,4".split(",").map(&:to_i)
=> [1, 2, 3, 4]
"1,2,3,4".split(",") as strings
"1,2,3,4".split(",").map { |s| s.to_i } as integers
For String Integer without space as String
arr = "12345"
arr.split('')
output: ["1","2","3","4","5"]
For String Integer with space as String
arr = "1 2 3 4 5"
arr.split(' ')
output: ["1","2","3","4","5"]
For String Integer without space as Integer
arr = "12345"
arr.split('').map(&:to_i)
output: [1,2,3,4,5]
For String
arr = "abc"
arr.split('')
output: ["a","b","c"]
Explanation:
arr -> string which you're going to perform any action.
split() -> is an method, which split the input and store it as array.
'' or ' ' or ',' -> is an value, which is needed to be removed from given string.
the simplest way to convert a string that has a delimiter like a comma is just to use the split method
"1,2,3,4".split(',') # "1", "2", "3", "4"]
you can find more info on how to use the split method in the ruby docs
Divides str into substrings based on a delimiter, returning an array
of these substrings.
If pattern is a String, then its contents are used as the delimiter
when splitting str. If pattern is a single space, str is split on
whitespace, with leading whitespace and runs of contiguous whitespace
characters ignored.
If pattern is a Regexp, str is divided where the pattern matches.
Whenever the pattern matches a zero-length string, str is split into
individual characters. If pattern contains groups, the respective
matches will be returned in the array as well.
If pattern is omitted, the value of $; is used. If $; is nil (which is
the default), str is split on whitespace as if ` ‘ were specified.
If the limit parameter is omitted, trailing null fields are
suppressed. If limit is a positive number, at most that number of
fields will be returned (if limit is 1, the entire string is returned
as the only entry in an array). If negative, there is no limit to the
number of fields returned, and trailing null fields are not
suppressed.
"12345".each_char.map(&:to_i)
each_char does basically the same as split(''): It splits a string into an array of its characters.
hmmm, I just realize now that in the original question the string contains commas, so my answer is not really helpful ;-(..

Resources