Append strings to array if found in paragraph using `.match` in Ruby - arrays

I'm attempting to search a paragraph for each word in an array, and then output a new array with only the words that could be found.
But I've been unable to get the desired output format so far.
paragraph = "Japan is a stratovolcanic archipelago of 6,852 islands.
The four largest are Honshu, Hokkaido, Kyushu and Shikoku, which make up about ninety-seven percent of Japan's land area.
The country is divided into 47 prefectures in eight regions."
words_to_find = %w[ Japan archipelago fishing country ]
words_found = []
words_to_find.each do |w|
paragraph.match(/#{w}/) ? words_found << w : nil
end
puts words_found
Currently the output I'm getting is a vertical list of printed words.
Japan
archipelago
country
But I would like something like, ['Japan', 'archipelago', 'country'].
I don't have much experience matching text in a paragraph and am not sure what I'm doing wrong here. Could anyone give some guidance?

this is because you are using puts to print the elements of the array . appending "\n" to the end of every element "word":
#!/usr/bin/env ruby
def run_me
paragraph = "Japan is a stratovolcanic archipelago of 6,852 islands.
the four largest are Honshu, Hokkaido, Kyushu and Shikoku, which make up about ninety-seven percent of Japan's land area.
the country is divided into 47 prefectures in eight regions."
words_to_find = %w[ Japan archipelago fishing country ]
find_words_from_a_text_file paragraph , words_to_find
end
def find_words_from_a_text_file( paragraph , *words_to_find )
words_found = []
words_to_find.each do |w|
paragraph.match(/#{w}/) ? words_found << w : nil
end
# print array with enum .
words_found.each { |x| puts "with enum and puts : : #{x}" }
# or just use "print , which does not add anew line"
print "with print :"; print words_found "\n"
# or with p
p words_found
end
run_me
outputs :
za:ruby_dir za$ ./fooscript.rb
with enum and puts : : ["Japan", "archipelago", "fishing", "country"]
with print :[["Japan", "archipelago", "fishing", "country"]]

Here are a couple of ways to do that. Both are case-indifferent.
Use a regular expression
r = /
\b # Match a word break
#{ Regexp.union(words_to_find) } # Match any word in words_to_find
\b # Match a word break
/xi # Free-spacing regex definition mode (x)
# and case-indifferent (i)
#=> /
# \b # Match a word break
# (?-mix:Japan|archipelago|fishing|country) # Match any word in words_to_find
# \b # Match a word break
# /ix # Free-spacing regex definition mode (x)
# and case-indifferent (i)
paragraph.scan(r).uniq(&:itself)
#=> ["Japan", "archipelago", "country"]
Intersect two arrays
words_to_find_hash = words_to_find.each_with_object({}) { |w,h| h[w.downcase] = w }
#=> {"japan"=>"Japan", "archipelago"=>"archipelago", "fishing"=>"fishing",
"country"=>"country"}
words_to_find_hash.values_at(*paragraph.delete(".;:,?'").
downcase.
split.
uniq & words_to_find_hash.keys)
#=> ["Japan", "archipelago", "country"]

Related

Is this the best way for the below needed Ruby application? + [Help in Arrays]

Please advise if the code is correct or there is a better version and also how to sort the Array in descending orders & the output to be like that:
A program that takes a user’s input, then builds a hash from that input. Each key in the hash will be a word from the user; each value will be the number of times that word occurs. For example, if our program gets the string “the rain in Spain falls mainly on the plain,”
print "What is the sentence that needs to be analysed? "
sentence = gets.chomp
#puts sentence
#Text to Array
my_array = sentence.split
#Create the Hash
word_frequency = Hash.new(0)
#Iterate through the given array in the built Hash
my_array.each { |word|
word_frequency[word] += 1
}
puts word_frequency
Needed Output to be if we run the code:
I love coding to the moon and love web coding
Love 2
coding 2
I 1
to 1
..... etc.
sentence = "Is this the best way for the below needed Ruby application?"
sentence.split.tally.sort_by { |_word, count| - count }.to_h
# => {"the"=>2, "Is"=>1, "this"=>1, "best"=>1, "way"=>1, "for"=>1, "below"=>1, "needed"=>1, "Ruby"=>1, "application?"=>1}
Firstly tokenize sentence to words array, than tally makes hash where key is word and value is count, than sort descending
You can also sort like this
sentence.split.tally.sort_by(&:last).reverse.to_h
Also you can use unified case, for example with String#upcase method and to avoid capturing punctuation it's possible to use String#scan method
sentence.upcase.scan(/[[:word:]-]+/).tally.sort_by(&:last).reverse.to_h
After many trails & #mechnicov I had reached the below and hope to hear reviews:
print 'What is the sentence that needs to be analysed? '
sentence = gets.chomp
# Convert Text to Array:
my_array = sentence.split
# Create the Hash called histogram
histogram = Hash.new(0) # zero added as a counter for iterate through word and add on it if exist
my_array.each do |word|
histogram[word] += 1
end
# Sort descending by values
sorted_histogram = histogram.sort_by { |_k, v| v }.reverse
#Print final output:
sorted_histogram.each do |_word, _count|
puts _word + ' ' + _count.to_s
end

ruby - How to make an array of arrays of letters (a-z) of varying lengths with maximum length five

So I'm trying to make an array of all possible permutations of the alphabet letters (all lowercase), in which the letters can repeat and vary in length from 1 to 5. So for example these are some possibilities that would be in the array:
['this','is','some','examp','le']
I tried this, and it gets all the variations of words 5 letters long, but I don't know how to find varying length.
("a".."z").to_a.repeated_permutation(5).map(&:join)
EDIT:
I'm trying to do this in order to crack a SHA1 encrypted string:
require 'digest'
def decrypt_string(hash)
("a".."z").to_a.repeated_permutation(5).map(&:join).find {|elem| Digest::SHA1.hexdigest(elem) == hash}
end
Hash being the SHA1 encryption of the word, such as 'e6fb06210fafc02fd7479ddbed2d042cc3a5155e'
You can modify your method slightly.
require 'digest'
def decrypt_string(hash)
arr = ("a".."z").to_a
(1..5).each do |n|
arr.repeated_permutation(n) do |a|
s = a.join
return s if Digest::SHA1.hexdigest(s) == hash
end
end
end
word = "cat"
hash = Digest::SHA1.hexdigest(word)
#=> "9d989e8d27dc9e0ec3389fc855f142c3d40f0c50"
decrypt_string(hash)
#=> "cat"
word = "zebra"
hash = Digest::SHA1.hexdigest(word)
#=> "38aa53de31c04bcfae9163cc23b7963ed9cf90f7"
decrypt_string(hash)
#=> "zebra"
Calculations for "cat" took well under one second on my 2020 Macbook Pro; those for "zebra" took about 15 seconds.
Note that join should be applied within repeated_permutation's block, as repeated_permutation(n).map(&:join) would create a temporary array having as many as 26**5 #=> 11,881,376 elements (for n = 5).
If you do not mind the possibility of repeating strings then
e = Enumerator.new do |y|
r = ('a'..'z').to_a * 5
loop do
y << r.shuffle.take(rand(4)+1).join
end
end
Should work. Then you can call as
e.take(10)
#=> ["bz", "tnld", "jv", "s", "ngrm", "phiy", "ar", "zq", "ajjn", "cn"]
This:
Creates an Array of a through z repeated 5 times
Continually shuffles said Array
Then takes the first 1 to 5 ("random number") elements from the shuffled Array and joins them together

Deleting specific string lines of items in array Ruby

I have an array of 10 items containing of a several lines string like
one string
two string
some string
any string
I want to delete lines containing words some and two. I made code like that:
search_text_domain = %r{some|two}
groups_data.each do |line|
line.each_line do |num|
domain_users_name << (num) unless num =~ search_text_domain
end
end
It works fine but it puts all lines to one big array like
domain_users_name = ["one string", "any string", "big string", "another_s....] and I want tu put it in array of arrays like
domain_users_name = [["one string", "any string"], ["big string", ""another_s...."], [........
I need version that permanently modify groups_data array. Any ideas?
input = ["one string\ntwo string\nsome string\nany string",
"one string\ntwo string\nsome string\nany string"]
input.map { |a| a.split("\n").reject { |e| e =~ %r{some|two} } }
# or
# input.map { |a| a.each_line.map(&:strip).reject { |e| e =~ %r{some|two} } }
# or (smiley-powered version, see the method’s tail)
# input.map { |a| a.each_line.map(&:strip).reject(&%r{some|two}.method(:=~)) }
#⇒ [["one string", "any string"], ["one string", "any string"]]
So you want to delete a group if one of the group elements matches the filter regexp?
groups = [['some', 'word'], ['other', 'word'], ['unrelated', 'list', 'of', 'things']]
filter = %r{word|some}
filtered = groups.delete_if do |group|
group.any? do |word|
word =~ filter
end
end
p filtered
Does this do what you want?

Ruby parsing CSV rows in a loop

I'm trying to write an CSV parser. Each line has multiple fields in which I need to process. Each line represents patient data, so I need each line processed by itself. Once I'm finished processing each line I need to go to the next until the end of the file is reached.
I've successfully started writing the parser in Ruby. The data is getting imported and it's creating an array of arrays (each line is an array).
The problem I'm having is properly looping through the data line by line. So, right now I can successfully process the first line and parse each field. I start running into a problem when I add another line with new patient data. The second line gets processed and added to the new array that has been created. For example, line 1 and line 2 once processed, get added to one big array instead of an array of arrays. The data imported needs to output in the same structure.
Here is my code so far:
original_data = Array.new
converted_data = Array.new
Dir.chdir 'convert'
CSV.foreach('CAREPRODEMO.CSV') do |raw_file|
original_data << raw_file
end
# Needed at beginning of array for each patient
converted_data.insert(0, 'Acvite', 'ACT')
# Start processing fields
original_data.each do |o|
# BEGIN Check for nil in original data and replace with empty string
o.map! { |x| x ? x : ''}
converted_data << o.slice(0)
# Remove leading zeros from account number
converted_data[2].slice!(0)
if converted_data[2].slice(1) == '0'
converted_data[2].slice!(1)
end
# Setup patient name to be processed
patient_name = Array.new
patient_name << o.slice(3..4)
converted_data << patient_name.join(' ')
# Setup patient address to be processed
patient_address = Array.new
patient_address << o.slice(5)
converted_data << patient_address.join(' ')
# END Check for nil in converted data and replace with empty string
converted_data.map! { |x| x ? x : ''}
end
# For debugging
p converted_data
Output:
["Acvite", "ACT", "D65188596", "SILLS DALTON H", "16243 B L RD", "00D015188596", "BALLARD DAVE H", "243 H L RD", "", "", ""]
Wanted:
["Acvite", "ACT", "D65188596", "SILLS DALTON H", "16243 B L RD"]
["Acvite", "ACT", "D15188596", "BALLARD DAVE H", "243 H L RD"]
You need to use array of array for storing results, you are using single array, hence the output that you have mentioned.
Move converted_data array inside the loop, and define a new array for collecting output of each loop. A possible approach is shown below.
original_data = Array.new
# Changed the variable name from converted_data
final_data = Array.new
...
original_data.each do |o|
converted_data = Array.new
...
# END Check for nil in converted data and replace with empty string
converted_data.map! { |x| x ? x : ''}
final_data << converted_data
end
p final_data

Extract the contents from CSV into an array

I have a CSV file with contents:
John,1,2,4,67,100,41,234
Maria,45,23,67,68,300,250
I need to read this content and separate these data into two sections:
1.a Legend1 = John
1.b Legend2 = Maria
2.a Data_array1 = [1,2,4,67,100,41,234]
2.b Data_array2 = [45,23,67,a,67,300,250]
Here is my code; it reads the contents and separates the contents from ','.
testsample = CSV.read('samples/linechart.csv')
CSV.foreach('samples/linechart.csv') do |row|
puts row
end
Its output results in a class of array elements. I am stuck in pursuing it further.
I would recommend not using CSV.read for this it's too simple for that - instead, use File.open and read each line and treat it as a big string.
eg:
# this turns the file into an array of lines
# eg you now have: ["John,1,2,4,67,100,41,234", "Maria,45,23,67,a,67,300,250"]
lines = File.readlines('samples/linechart.csv')
# if you want to do this for each line, just iterate over this array:
lines.each do |line|
# now split each line by the commas to turn it into an array of strings
# eg you have: ["john","1","2","4","67","100","41","234"]
values = line.split(',')
# now, grab the first one as you name and the rest of them as an array of strings
legend = values[0] # "john"
data_array = values[1..-1] # ["1","2","4","67","100","41","234"]
# now do what you need to do with the name/numbers eg
puts "#{legend}: [#{data_array.join(',')}]"
# if you want the second array to be actual numbers instead of strings, you can convert them to numbers using to_i (or to_f if you want floats instead of integers)
# the following says "take each value and call to_i on it and return the set of new values"
data_array = data_array.map(&:to_i)
end # end of iterating over the array
First get the data out of csv like:
require 'csv'
csv_text = File.read('/tmp/a.csv')
csv = CSV.parse(csv_text)
# => [["John", "1", "2", "4", "67", "100", "41", "234"], ["Maria", "45", "23", "67", "a", "67", "300", "250"]]
Now you can format output as per your requirements. Eg:
csv.each.with_index(1){ |a, i|
puts "Legend#{i.to_s} = #{a[0]}"
}
# Legend1 = John
# Legend2 = Maria
You may looking for this,
csv = CSV.new(body)
csv.to_a
You can have a look at http://technicalpickles.com/posts/parsing-csv-with-ruby/
Reference this, too, if needed.
Over-engineered version ;)
class Lines
class Line
attr_reader :legend, :array
def initialize(line)
#line = line
parse
end
private
def parse
#legend, *array = #line.strip.split(",")
#array = array.map(&:to_i)
end
end
def self.parse(file_name)
File.readlines(file_name).map do |line|
Line.new(line)
end
end
end
Lines.parse("file_name.csv").each do |o|
p o.legend
p o.array
puts
end
# Result:
#
# "John"
# [1, 2, 4, 67, 100, 41, 234]
#
# "Maria"
# [45, 23, 67, 68, 300, 250]
Notes:
Basically, Lines.parse("file_name.csv") will give you an array of objects that will respond to the methods: legend and array; which holds the name and array of numbers respectively.
Jokes aside, I think OO will help maintainability.

Resources