How to Iterate through multiple strings in an array - arrays

I have an array of strings.
a = "Apple Banana oranges grapes. free free free phones. deals deals time.black white grey"
b = a.split(/\./)
I want to check if a word "free" or "deals" is present in each string, and count and store it to a new array.
c = 0
d = 0
g = Array.new
t = Array.new
b.each do |i|
if /free/.match(i)
t.push i
c = c + 1
elsif /deals/.match(i)
t.push i
d = d + 1
else
g.push i
end
end
p d
p c
p g
p t
But it doesn't show the exact count. Is there any other way to parse a string inside an array?

With a little variable renaming, and some trickery of scan instead of match, this might work?
a = "Apple Banana oranges grapes. free free free phones. deals deals time.black white grey"
sentences = a.split(/\./) # Split on `.`
# Initialize variables
free_count = 0
deal_count = 0
no_match = []
matches = []
sentences.each do |i| # Go through each sentence
if (m = i.scan(/free/)) && m.any? # Try matching "free", and check if there is any matches
matches << i # Add match to matches array
free_count += m.length # Count up amounts of `free` spotted.
elsif (m = i.scan(/deals/)) && m.any? # Try matching "deals", and check if there is any matches
matches << i # Add match to matches array
deal_count += m.length # Count up amounts of `deals` spotted.
else
no_match << i # Count up for nothing spotted
end
end
p free_count #=> 3
p deal_count #=> 2
p no_match #=> ["Apple Banana oranges grapes", "black white grey"]
p matches #=> [" free free free phones", " deals deals time"]

Your approach counts the sentences, not the occurrences of a word in the matched sentences. Use c += i.split.count "free" to count the actual occurrences of the word "free". Avoid using single-letter variable names unless the meaning is clear.
However, this all seems like a bit of extra lifting; you can perform counts and select/reject items matching a pattern using builtin array methods:
a = "Apple Banana oranges grapes. free free free phones. deals deals time.black white grey"
p a.split(".").grep /\bfree\b|\bdeals\b/
p a.split(".").reject {|e| e =~ /\bfree\b|\bdeals\b/}
p a.split.count "free"
p a.split.count "deals"
Output:
[" free free free phones", " deals deals time"]
["Apple Banana oranges grapes", "black white grey"]
3
2
Try it!

Possibly a solution that follows your original code:
a = "Apple Banana oranges grapes. free free free phones. deals deals time.black white grey"
b = a.split(/\W+/)
c = 0
d = 0
g = Array.new
t = Array.new
b.each do |i|
if i == "free"
t.push i
c = c + 1
elsif i == "deals"
t.push i
d = d + 1
else
g.push i
end
end
puts "Original words: #{b}"
puts "Counts of 'free': #{c}"
puts "Counts of 'deals': #{d}"
puts "Array of matches: #{t}"
puts "Array of non matches: #{g}"
And the output is:
# Original words: ["Apple", "Banana", "oranges", "grapes", "free", "free", "free", "phones", "deals", "deals", "time", "black", "white", "grey"]
# Counts of 'free': 3
# Counts of 'deals': 2
# Array of matches: ["free", "free", "free", "deals", "deals"]
# Array of non matches: ["Apple", "Banana", "oranges", "grapes", "phones", "time", "black", "white", "grey"]
An example of counting using Ruby tools:
counts = {}
a.split(/\W+/).tap { |b| b.uniq.map { |w| counts[w.downcase] = b.count(w) } }
counts #=> {"apple"=>1, "banana"=>1, "oranges"=>1, "grapes"=>1, "free"=>3, "phones"=>1, "deals"=>2, "time"=>1, "black"=>1, "white"=>1, "grey"=>1}
Then you can access data, for example:
counts.keys #=> ["apple", "banana", "oranges", "grapes", "free", "phones", "deals", "time", "black", "white", "grey"]
counts['free'] #=> 3
counts['deals'] #=> 2
counts.select { |_, v| v > 1} #=> {"free"=>3, "deals"=>2}

Related

How to iterate over an array a certain amount of times?

array = [apple, orange]
number = 4
desired output:
apple
orange
apple
orange
So far, I have:
array.each do |x|
puts x
end
I'm just not sure how to iterate over the array 4 times.
array = ["apple", "orange"]
iter_count = 4
array.cycle.take(iter_count).each { |x|
puts x
}
array.cycle gives us an infinite enumerable that repeats the elements of array. Then we take the first iter_count elements from it and iterate over that.
Enumerable has a ton of goodies that perform neat tasks like this. Once you familiarize yourself with the module, you'll find you can do a lot of array- and stream- oriented processes much more easily.
ar = ["apple", "orange"]
n = 4
n.times { ar.each{|a| p a} }
array = ["apple", "orange"]
numOfIteration=4
for i in 0..numOfIteration-1
puts array[i%array.size]
end
A fun way to achieve this:
4.times { |n| p array[n % array.count] }
Definitely not the best: every iteration we are counting the number of elements in array and also processing that n is dividable by the number of elements. It's also not very readable, as there is some cognitive processing required to understand the statement.
A nicer way to achieve this:
print(arr.cycle.take(4).join("\n"))
apple
orange
apple
orange
another non-idiomatic for loop, 3 dots removes need for explicit subtraction
array = ['apple', 'orange']
number = 4
for i in 0...number
puts array[i % array.size]
end
and some silliness with lambdas and recursion :D
array = ['apple', 'orange']
number = 4
loop = lambda do |list, count|
return if count == number
puts list[count % list.size]
loop.(list, count + 1)
end
loop.(array, 0)

How to find which items in a MASSIVE array appear more than once?

This is a very simple question; which items appear in the list more than once?
array = ["mike", "mike", "mike", "john", "john", "peter", "clark"]
The correct answer is ["mike", "john"].
Seems like we can just do:
array.select{ |e| ary.count(e) > 1 }.uniq
Problems solved. But wait! What if the array is REALLY big:
1_000_000.times { array.concat("1234567890abcdefghijklmnopqrstuvwxyz".split('')) }
It just so happens I need to figure out how to do this in a reasonable amount of time. We're talking millions and millions of records.
For what it's worth, this massive array is actually a sum of 10-20 smaller arrays. If it's easier to compare those, let me know - I'm stumped.
We're talking 10,000 to 10,000,000 lines per file, hundreds of files.
Does something like
items = 30_000_000
array = items.times.map do
rand(10_000_000)
end
puts "Done with seeding"
puts
puts "Checking what items appear more than once. Size: #{array.size}"
puts
t1 = Time.now
def more_than_once(array)
counts = Hash.new(0)
array.each do |item|
counts[item] += 1
end
counts.select do |_, count|
count > 1
end.keys
end
res = more_than_once(array)
t2 = Time.now
p res.size
puts "Took #{t2 - t1}"
work for you?
The duration is about 40s on my machine.
Here are two more solutions with a benchmark comparison of these and #Pascal's methods.
Use sets
require 'set'
def multi_set(arr)
s1 = Set.new
arr.each_with_object(Set.new) { |e, smulti| smulti.add(e) unless s1.add?(e) }.to_a
end
arr = ["mike", "mike", "mike", "john", "john", "peter", "clark"]
multi(arr)
#=> ["mike", "john"]
s1 is being built to include all distinct elements of arr. s1.add?(e) returns nil if s1 already contains e, in which case e is added to smulti if smulti does not already contain that element. (See Set#add?.) smulti is returned by the method.
Use Array#difference
Array#difference is a method I've proposed be added to Ruby's core. See also my answer here.
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
def multi_difference(arr)
arr.difference(arr.uniq).uniq
end
Benchmark
def more_than_once(arr)
counts = Hash.new { |hash, key| hash[key] = 0 }
arr.each do |item|
counts[item] += 1
end
counts.select do |_, count|
count > 1
end.keys
end
require 'fruity'
items = 30_000_000
arr = items.times.map { rand 10_000_000 }
compare do
Pascal { more_than_once(arr) }
Set { multi_set(arr) }
Difference { multi_difference(arr) }
end
Running each test once. Test will take about 4 minutes.
Pascal is faster than Set by 19.999999999999996% ± 10.0%
Set is faster than Difference by 30.000000000000004% ± 10.0%
Of course, difference, if part of the Ruby core, would be coded in C and optimized.

Append strings to array if found in paragraph using `.match` in Ruby

I'm attempting to search a paragraph for each word in an array, and then output a new array with only the words that could be found.
But I've been unable to get the desired output format so far.
paragraph = "Japan is a stratovolcanic archipelago of 6,852 islands.
The four largest are Honshu, Hokkaido, Kyushu and Shikoku, which make up about ninety-seven percent of Japan's land area.
The country is divided into 47 prefectures in eight regions."
words_to_find = %w[ Japan archipelago fishing country ]
words_found = []
words_to_find.each do |w|
paragraph.match(/#{w}/) ? words_found << w : nil
end
puts words_found
Currently the output I'm getting is a vertical list of printed words.
Japan
archipelago
country
But I would like something like, ['Japan', 'archipelago', 'country'].
I don't have much experience matching text in a paragraph and am not sure what I'm doing wrong here. Could anyone give some guidance?
this is because you are using puts to print the elements of the array . appending "\n" to the end of every element "word":
#!/usr/bin/env ruby
def run_me
paragraph = "Japan is a stratovolcanic archipelago of 6,852 islands.
the four largest are Honshu, Hokkaido, Kyushu and Shikoku, which make up about ninety-seven percent of Japan's land area.
the country is divided into 47 prefectures in eight regions."
words_to_find = %w[ Japan archipelago fishing country ]
find_words_from_a_text_file paragraph , words_to_find
end
def find_words_from_a_text_file( paragraph , *words_to_find )
words_found = []
words_to_find.each do |w|
paragraph.match(/#{w}/) ? words_found << w : nil
end
# print array with enum .
words_found.each { |x| puts "with enum and puts : : #{x}" }
# or just use "print , which does not add anew line"
print "with print :"; print words_found "\n"
# or with p
p words_found
end
run_me
outputs :
za:ruby_dir za$ ./fooscript.rb
with enum and puts : : ["Japan", "archipelago", "fishing", "country"]
with print :[["Japan", "archipelago", "fishing", "country"]]
Here are a couple of ways to do that. Both are case-indifferent.
Use a regular expression
r = /
\b # Match a word break
#{ Regexp.union(words_to_find) } # Match any word in words_to_find
\b # Match a word break
/xi # Free-spacing regex definition mode (x)
# and case-indifferent (i)
#=> /
# \b # Match a word break
# (?-mix:Japan|archipelago|fishing|country) # Match any word in words_to_find
# \b # Match a word break
# /ix # Free-spacing regex definition mode (x)
# and case-indifferent (i)
paragraph.scan(r).uniq(&:itself)
#=> ["Japan", "archipelago", "country"]
Intersect two arrays
words_to_find_hash = words_to_find.each_with_object({}) { |w,h| h[w.downcase] = w }
#=> {"japan"=>"Japan", "archipelago"=>"archipelago", "fishing"=>"fishing",
"country"=>"country"}
words_to_find_hash.values_at(*paragraph.delete(".;:,?'").
downcase.
split.
uniq & words_to_find_hash.keys)
#=> ["Japan", "archipelago", "country"]

Ruby match elements from the first with second array

I have two arrays. The first one will be an array of string with a name and the amount. The second is an array of letters.
a1 = ["ASLK 50", "BSKD 150", "ZZZZ 100", "BSDF 50"]
a2 = ["B", "Z"]
I want to create a third array to sort the contents from a1 based off a2 and return the number based on the information from first array. Since a2 has "B" and "Z", I need to scan first array for all entry starting with letter B and Z and add up the numbers.
My ultimate goal is to return the sum on third array, something like this:
a3 = ["B = 200", "Z = 100"]
Since "A" was not on a2, it is not counted.
I was able to extract the information from a1:
arr = a1.map{|el| el[0] + " : " + el.gsub(/\D/, '\1')}
#=> ["A : 50", "B : 150", "Z : 100", "B : 50"]
I am having trouble comparing a1with a2. I have tried different methods, such as:
a1.find_all{|i| i[0] == a2[0]} #=> only returns the first element of a2. How can I iterate through a2?
alternatively,
i = 0
arr_result = []
while i < (arr.length + 1)
#(append to arr_result the number from a1 if a1[i][0] matches a2[i])
I think either would solve it, but I can't put neither idea down to working code. How can I implement either method? Is there a more efficient way to do it?
Running with your requirements, that you want to turn this:
a1 = ["ASLK 50", "BSKD 150", "ZZZZ 100", "BSDF 50"]
a2 = ["B", "Z"]
into this: a3 = ["B = 200", "Z = 100"]
a3 = a2.map do |char|
sum = a1.reduce(0) do |sum, item|
name, price = item.split(" ")
sum += price.to_i if name[0].eql?(char)
sum
end
"#{char} = #{sum}"
end
Here is how I would do this:
a1 = ["ASLK 50", "BSKD 150", "ZZZZ 100", "BSDF 50"]
a2 = ["B", "Z"]
a3 = a1.each_with_object(Hash.new(0)) do |a,obj|
obj[a[0]] += a.split.last.to_i if a2.include?(a[0])
end.map {|a| a.join(" = ")}
#=>["B = 200", "Z = 100"]
First step adds them all into a Hash by summing the values by each first letter that is contained in the second Array.
Second step provides the desired output
If you want a Hash instead just take off the last call to map and you'll have.
{"B" => 200, "Z" => 100}
Rather than defining an Array like ["B = 200", "Z = 100"], it would be more sensible to define this as a mapping - i.e. the following Hash object:
{"B" => 200, "Z" => 100}
As for the implementation, there are many ways to do it. Here is just one approach:
a1 = ["ASLK 50", "BSKD 150", "ZZZZ 100", "BSDF 50"]
a2 = ["B", "Z"]
result = a2.map do |letter|
[
letter,
a1.select {|str| str[0] == letter}
.inject(0) {|sum, str| sum += str[/\d+/].to_i}
]
end.to_h
puts result # => {"B"=>200, "Z"=>100}
Explanation:
At the top level, I've used Array#to_h to convert the array of pairs: [["B", 200], ["Z", 100]] into a Hash: {"B" => 200, "Z" => 100}.
a1.select {|str| str[0] == letter} selects only the elements from a1 whose first letter is that of the hash key.
inject(0) {|sum, str| sum += str[/\d+/].to_i} adds up all the numbers, with safe-guards to default to zero (rather than having nil thrown around unexpectedly).
This is quite similar to #engineersmnky's answer.
r = /
\A[A-Z] # match an upper case letter at the beginning of the string
| # or
\d+ # match one or more digits
/x # free-spacing regex definition mode
a1.each_with_object(Hash.new(0)) do |s,h|
start_letter, value = s.scan(r)
h[start_letter] += value.to_i if a2.include?(start_letter)
end.map { |k,v| "#{k} = #{v}" }
#=> ["B = 200", "Z = 100"]
Hash.new(0) is often referred to as a "counting hash". See the doc for the class method Hash::new for an explanation.
The regex matches the first letter of the string or one or more digits. For example,
"ASLK 50".scan(r)
#=> ["A", "50"]

How to print a 2D array with fixed column width

I have an array:
animals = [
["cats", "dogs"],
["verrylongcat", "dog"],
["shortcat", "verrylongdog"],
["cat", "dog"]
]
And I would like to display it nicely. Is there an easy way to make the colums a fixed width so I get something like this:
cats dogs
verrylongcat dog
shortcat verrylongdog
cat dog
animals is just an example, my array could also have 3, or 4 columns or even more.
You are looking for String#ljust:
max_cat_size = animals.map(&:first).max_by(&:size).size
animals.each do |cat, dog|
puts "#{cat.ljust(max_cat_size)} #{dog}"
end
If you want more than one space just add the corresponding amount in the interpolation.
Assuming your array is n × m and not 2 × m:
animal_max_sizes = animals.first.size.times.map do |index|
animals.transpose[index].map(&:to_s).max_by(&:size).size
end
animals.map do |animal_line|
animal_line.each.with_index.reduce('') do |animal_line, (animal, index)|
animal_line + animal.to_s.ljust(animal_max_sizes[index].next)
end
end.each { |animal_line_stringified| puts animal_line_stringified }
Note: The to_ses are used in case your arrays contain nils, numbers, etc.
Another way to do this is with printf-style formatting. If you know you will always have exactly 2 words in each line then you can do this:
#!/usr/bin/env ruby
lines = [
' cats dogs',
' verrylongcat dog',
'shortcat verrylongdog ',
' cat dog ',
]
lines.map(&:strip).each do |line|
puts "%-14s%s" % line.split
end
Outputs:
cats dogs
verrylongcat dog
shortcat verrylongdog
cat dog
If you need to calculate the column width based on the data, then you'd have to do a little more work:
# as #ndn showed:
first_col_width = lines.map(&:split).map(&:first).max_by(&:size).size + 2
lines.map(&:strip).each do |line|
puts "%-#{first_col_width}s%s" % line.split
end
Here's another attempt for a variable numbers of columns. Given this array:
animals = [
['Cats', 'Dogs', 'Fish'],
['Mr. Tinkles', 'Buddy', 'Nemo'],
['Calico', 'Butch', 'Marlin'],
['Ginger', 'Ivy', 'Dory']
]
We can calculate the width of each column via transpose, map, length and max:
widths = animals.transpose.map { |x| x.map(&:length).max }
#=> [11, 5, 6]
Based on this, we can generate a format string that can be passed to sprintf (or its shortcut %):
row_format = widths.map { |w| "%-#{w}s" }.join(' ')
#=> "%-11s %-5s %-6s"
%s denotes a string argument, 11, 5 and 6 are our widths and - left-justifies the result.
Let's try it:
row_format % animals[0] #=> "Cats Dogs Fish "
row_format % animals[1] #=> "Mr. Tinkles Buddy Nemo "
row_format % animals[2] #=> "Calico Butch Marlin"
That looks good, we should use a loop and wrap everything it in a method:
def print_table(array)
widths = array.transpose.map { |x| x.map(&:length).max }
row_format = widths.map { |w| "%-#{w}s" }.join(' ')
array.each do |row_values|
puts row_format % row_values
end
end
print_table(animals)
Output:
Cats Dogs Fish
Mr. Tinkles Buddy Nemo
Calico Butch Marlin
Ginger Ivy Dory
More complex formatting
With a little tweaking, you can also output a MySQL style table:
def print_mysql_table(array)
widths = array.transpose.map { |x| x.map(&:length).max }
row_format = '|%s|' % widths.map { |w| " %-#{w}s " }.join('|')
separator = '+%s+' % widths.map { |w| '-' * (w+2) }.join('+')
header, *rows = array
puts separator
puts row_format % header
puts separator
rows.each do |row_values|
puts row_format % row_values
end
puts separator
end
print_mysql_table(animals)
Output:
+-------------+-------+--------+
| Cats | Dogs | Fish |
+-------------+-------+--------+
| Mr. Tinkles | Buddy | Nemo |
| Calico | Butch | Marlin |
| Ginger | Ivy | Dory |
+-------------+-------+--------+

Resources