Using arrays in regular expressions? - arrays

Does anyone know if there is a way to use an array in a regular expression? suppose I want to find out if somefile.txt contains one of an array's elements. Obviously the code below doesn't work, but is there something similar that does work?
array = [thing1 thing2 thing3]
file = File.open("somefile.txt")
file.each_do |line|
if /array/.match(line)
puts line
end
Basically I've got a file that contains a list of words that I need to use as search terms in another large file, and I'd like to avoid something like this:
($somefile =~ /(thing1|thing2|thing3)/)

You can use Regexp.union, it returns a Regexp that matches any of the given regex. The argument patterns could be either String or Regexp:
Regexp.union(%w(thing1 thing2 thing3))
#=> /thing1|thing2|thing3/
or
Regexp.union(/thing1/, /thing2/, /thing3/)
#=> /(?-mix:thing1)|(?-mix:thing2)|(?-mix:thing3)/

Use:
x = ['qwe', 'asd', 'zxc']
file = File.open("somefile.txt")
regexp = /(#{x.join '|'})/
file.each_do |line|
puts line if regexp.match(line)
end

Related

Using flatten! on an array of arrays not working

I am building a script that takes in a column from a CSV that can contain 0 or more ID numbers. I have created an array of the column, however, since some cells have no ID number and some have multiple,I have an array of arrays.
I want to create an array where each element is a single ID (i.e split the IDs from each element in the array to a single element).
Here is my code so far:
require 'csv'
class MetadataTherapyParser
def initialize (csv)
#csv = csv
end
def parse_csv
therapy_array = []
CSV.foreach(#csv) do |csv_row|
therapy_array << csv_row[0]
end
therapy_array
end
def parse_therapies(therapy_array)
parsed_therapy_array = therapy_array.flatten!
end
end
metadata_parse = MetadataTherapyParser.new ("my_path.csv")
therapy_array = metadata_parse.parse_csv
metadata_parse.parse_therapies(therapy_array)
p therapy_array
However, the output is still an array of arrays. I am thinking it may have something to do with nil values? I have tried looking for answers online to no avail.
If you could give me some advice as how to fix this problem, it would be greatly appreciated!
Thank you in advance.
EDIT
I have posted a snippet of my output below. It still appears to be a nested array.
[nil, nil, "57e923a0f5c3c85c9200052b, 58b828f4f5c3c806490046a6", "57e923a0f5c3c85c9200052b, 4ffaf15af758862fb10155e3, 58b828f4f5c3c806490046a6", "57e923a0f5c3c85c9200052b, 4ffaf15af758862fb10155e3, 58b828f4f5c3c806490046a6", nil, nil, nil, nil, nil, "5f9176e50cf19216d6da9289", "6082f6bd0cf19225863fc985", "6082f6fd0cf192258d3fce0e", "6082f69e0cf19225ac3fc551", "6082f6a60cf19225a23fd3e4, 6082f6d30cf192258d3fce0a, 6082f7fa0cf19225953fc77c"]
You say you have "an array of arrays" but your example array is "57e923a0f5c3c85c9200052b, 4ffaf15af758862fb10155e3, 58b828f4f5c3c806490046a6" ... that's not an array. That's a string. You probably want to split strings that have commas into separate array elements.
So instead of
therapy_array << csv_row[0]
try instead
therapy_array << csv_row[0].to_s.split(',').map(&:strip)
the flatten is working perfectly. the issue you are having is that you have lots of strings in your output, with commas in them.
having not got a copy of your CSV, I'm going to assume that it has been parsed correctly, and that you do want to keep the contents of the first cell as it is:
def parse_therapies(therapy_array)
parsed_therapy_array = therapy_array.map { |x| x && x.split(/,/) }.flatten.compact
therapy_array.replace(parsed_therapy_array)
end
this will also remove all the nil elements, assuming you don't want them, using the compact procedure.

Ruby, print all the hashes "subfield" in one row

I have a JSON array structured like this:
{"elements":[{"ECL001":{"description":"First Element", "max_level":3, "size":10}},{"ECL002":{"description":"Second Element", "max_level":4, "size":1}}]}
I'm parsing my structure and then I print data if condition are satisfied.
require 'json'
x = JSON.parse(File.open('data_elements.dat').read)
elements = x["elements"]
elements.each do |elem_specific|
elem_specific.each do |id, data|
if data['max_level'] > 3
puts "#{data['description']}, #{data['max_level']}, #{data[size]}"
end
end
end
It's work correctly, but is there a faster solution to prints data?
I mean ... Is possible replace this
puts "#{data['description']}, #{data['max_level']}, #{data[size]}"
with something like
puts "#{data[*ALL]}"
I solved it!
I found this:
puts "#{data.values}" # Print all Values
puts "#{data.keys}" # Print all Keys

Ruby Iterating Through Array for Method Names

I have an object that I want to output a bunch of methods' results from. I have an array of method names and want to iterate through them:
img = Magick::Image::read('/Users/rich/Projects/imagemagick/orig/IMG_4677.jpg')[0]
atts = "background_color base_columns base_filename base_rows bias black_point_compensation".split(' ')
atts.each do |i|
puts img.i # problem evaluating i
end
I have tried string interpolation and eval but I can't get it to realize it's a method call.
Is there a way I can apply an array as method names to an object?
Try using public_send:
atts = "upcase downcase".split
atts.each do |i|
puts 'hEllO'.public_send(i)
end
#HELLO
#hello

How do I split on a regex and get my elements into an array?

I'm using Ruby 2.4. Is there any way I can split on a regex and get the resulting elements in an array? I thought this was the way
2.4.0 :003 > word = "4.ARTHUR"
=> "4.ARTHUR"
2.4.0 :004 > word.split(/^\d+\./)
=> ["", "ARTHUR"]
but as you see, the first element of my array is an empty string despite the fact that the pattern matches. I would like the output to be
["4.", "ARTHUR"]
Note that split splits the string where a match is found. So, ^\d+\. matches 4. in 4.ARTHUR at the beginning and thus, the result is an empty string (the beginning of the string) and ARTHUR. To keep the match obtained during split operation with a regex, you need to wrap the whole pattern with a capturing group and to get rid of the empty items, you can just remove them later:
word.split(/^(\d+\.)/).reject { |x| x.empty? }
Why not just do
irb(main):004:0> word = "4.ARTHUR"
=> "4.ARTHUR"
irb(main):005:0> word.split('.')
=> ["4", "ARTHUR"]
When you want to split up a string but keep all of its parts, String#scan is often a better fit than String#split:
word = "4.ARTHUR"
word.scan(/^\d+\.|.+/)
# => ["4.", "ARTHUR"]
See it on repl.it: https://repl.it/F90q
Try this
a, b = word.match(/^(\d+\.)?(.*)/).captures
The split method is meant to be used with a separator, as for example a comma. What you are doing is splitting the string into two parts without a separator, just use capture groups.

How can I remove a numerical extension from an array of filenames?

I want to remove the last 11 characters of strings inside of an array. The array is:
["cool.mp3?3829483927", "wow.mp3?3872947629", "woa.mp3?8392748308"]
I want to convert the strings to this:
["cool.mp3", "wow.mp3", "woa.mp3"]
Is there a method specifically for this in Ruby? I know of chop and chomp, but nothing that can access each string in an array.
TL;DR
There are lots of ways to transform your string, including #slice, #split, #sub, and #partition to name a few. What you're really missing is the Array#map method, which applies a method or block to each element of an array.
Partition Your Filenames
One way to modify your array elements is to map the Enumerable#partition method onto each element, which splits your filenames into an array of components. Ordinarily, this would return an array of arrays where each partitioned string is a sub-array, but you can have the #map block return just the components you want. In this case, what you want is the first element of each partitioned array.
This may sound complicated, but it's actually very simple. For example:
files = ['cool.mp3?3829483927', 'wow.mp3?3872947629', 'woa.mp3?8392748308']
files.map { |filename| filename.partition('?').first }
#=> ["cool.mp3", "wow.mp3", "woa.mp3"]
A Minified Version
If you value compactness over readability, you can get the same result as the solution above with:
files = %w(cool.mp3?3829483927 wow.mp3?3872947629 woa.mp3?8392748308)
files.map { |f| f.partition(??)[0] }
#=> ["cool.mp3", "wow.mp3", "woa.mp3"]
If you want what is before the first '?', do this :
arr.map{|thing| thing.split('?')[0]}
If it's always 11 chars you can do:
arr=["cool.mp3?3829483927","wow.mp3?3872947629", "woa.mp3?8392748308"]
new=arr.map{|thing| thing[0...-11]}
String::slice can take a regex as an argument, so you could extract the names that you want (rather than dropping the last 11 characters):
arr = ["cool.mp3?3829483927","wow.mp3?3872947629", "woa.mp3?8392748308"]
arr.map! {|x| x.slice(/\w+\.mp3/) }
#=> ["cool.mp3", "wow.mp3", "woa.mp3"]
If arr is your array of strings:
arr.map { |s| s[/[^?]+/] }
# => ["cool.mp3", "wow.mp3", "woa.mp3"]
The regular expression /[^?]+/ matches one or more characters that are not (^ at the beginning of the character class) question marks. This uses the methods Array#map and String#[].

Resources