I'm trying to write a program in Ruby that allows one array to receive information from another array. Basically, I have a multidimensional array called "student_array" that contains information on a few students
student_array = [["Mike", 13, "American", "male"],
["Grace", 12, "Canadian", "female"],
["Joey", 13, "American", "male"],
["Lily", 13, "American", "female"]
]
I also initialized two other arrays that will count nationalities:
nationality_array = Array.new
nationality_count = Array.new
The purpose of this program is to loop through student array, count the different nationalities of the students, and create a CSV file that will contain the headers of the different nationalities, and a count for each one.
Expected output.csv
American, Canadian
3, 1
Here is the code I have so far
student_array.each do |student|
#pushes the nationality string into the nationality array
nationality_array.push(student[2])
end
so the nationality_array should currently look like this:
nationality_array = ["American", "Canadian", "American", "American"];
nationality_array.uniq = ["American", "Canadian"];
So I will have two headers - "American" and "Canadian"
Now I need a way to loop through the student_array, count up each instance of "American" and "Canadian", and somehow assign it back to the nationality array. I'm having a hard time visualizing how to go about this. This is what I have so far--
american_count = 0;
canadian_count = 0;
student_array.each do |student|
if student[2] = "American"
american_count++
elsif student[2] = "Canadian"
canadian_count++
end
end
nationality_count.push(american_count);
nationality_count.push(canadian_count);
Okay, now I have those counts in the nationality_count array, but how can I pass it to a CSV, making sure that they are assigned to the right headers? I have a feeling that my code is very awkward and could be much more streamlined as well.
It would probably look something like this?
CSV.open("output/redemptions.csv", "wb") do |csv|
csv << [nationality_array]
csv << [nationality_count]
end
Can anyone provide any insight into a cleaner way to go about this?
You could use a Hash to group the counts by nationality instead of different arrays.
nationalities_count = student_array.each_with_object(Hash.new(0)) do |student, hash|
nationality = student[2]
hash[nationality] += 1
end
That will give you a Hash that would look like
{ "American" => 2, "Canadian" => 1 }
You could then use Hash#to_a and Array#transpose like so:
hsh = { "American" => 2, "Canadian" => 1 }
=> {"American"=>2, "Canadian"=>1}
2.4.2 :002 > hsh.to_a
=> [["American", 2], ["Canadian", 1]]
2.4.2 :003 > hsh.to_a.transpose
=> [["American", "Canadian"], [2, 1]]
Finally, to output the CSV file all you need to do is write the arrays into the file
nationalities_with_count = hash.to_a.transpose
CSV.open("output/redemptions.csv", "wb") do |csv|
csv << nationalities_with_count[0]
csv << nationalities_with_count[1]
end
Array#group_by in Ruby core and Hash#transform_values in ActiveSupport are two very versitile methods that can be used here:
require 'active_support/all'
require 'csv'
student_array = [
["Mike", 13, "American", "male"],
["Grace", 12, "Canadian", "female"],
["Joey", 13, "American", "male"],
["Lily", 13, "American", "female"]
]
counts = student_array.group_by { |attrs| attrs[2] }.transform_values(&:length)
# => => {"American"=>3, "Canadian"=>1}
CSV.open("output/redemptions.csv", "wb") do |csv|
csv << counts.keys
csv << counts.values
end
puts File.read "output/redemptions.csv"
# => American,Canadian
# 3,1
.group_by { |attrs| attrs[2] } turns the array into a hash, where keys are the unique values for attrs[2], and values are a list of elements that have that attrs[2]. At this point you can use transform_values to turn those values into numbers representing their length (meaning, how many elements have that specific attrs[2]). The keys and values can then be extracted from the hash as separate arrays.
You even don’t need a CSV tool here:
result =
student_array.
map { |a| a[2] }. # get nationalities
group_by { |e| e }. # hash
map { |n, c| [n, c.count] }. # map values to count
transpose. # put data in rows
map { |row| row.join ',' }. # join values in a row
join($/) # join rows
#⇒ American,Canadian
# 3,1
Now you have a string that is valid CSV, just spit it out to the file.
Related
So, I have an array of hashes, let's use this as an example
arr = [
{:series=>"GOT", :rating=>"Good", :type=>"Fantasy"},
{:series=>"BB", :rating=>"Great", :type=>"Crime"},
{:series=>"E", :rating=>"Poor", :type=>"Drama"}
]
I'm trying to loop over this array so that I can compare each member with all following members.
E.g. Hash 1 compares with Hash 2 and Hash 3, Hash 2 compares with Hash 3
The actual comparison function I already have written:
output = (data[X].keys & data[Y].keys).select { |k| data[X][k] == data[Y][k] }
X would be the current array and Y is the next element we are comparing to.
EDIT
Here's what I've got so far
for i in 0..data.length
for j in i..data.length
# puts data[j + 1]
output = (data[j].keys & data[j+1].keys).select { |k| data[j][k] == data[j+1][k] }
puts data[j]
puts data[j+1]
puts output
end
puts "*****"
end
My desired output is to print the hash and the other hash we are comparing with, as well as what keys they share a value for.
For example this array:
{:series=>"GOT", :rating=>"Great", :type=>"Fantasy"}
{:series=>"BB", :rating=>"Great", :type=>"Crime"}
Should print this:
{:series=>"GOT", :rating=>"Great", :type=>"Fantasy"}
{:series=>"BB", :rating=>"Great", :type=>"Crime"}
rating
If the key is nil, it should not compare. I think that's why I'm also getting this error when running the above code:
Traceback (most recent call last):
4: from match_users.rb:18:in `<main>'
3: from match_users.rb:18:in `each'
2: from match_users.rb:19:in `block in <main>'
1: from match_users.rb:19:in `each'
match_users.rb:21:in `block (2 levels) in <main>': undefined method `keys' for nil:NilClass (NoMethodError)
Note that you're not using "i" inside your loop, which looks like a bug. Also, your index "j+1" is going off the end of the array, resulting in accessing the nil element. Actually, even "j" goes off the end of the array. Arrays are accessed from 0...length-1, whereas "0..data.length" will access the element at index data.length. I'm guessing you mean something more like:
for i in 0..data.length-2
for j in i+1..data.length-1
output = (data[i].keys & data[j].keys).select { |k| data[i][k] == data[j][k] }
end
end
Create an Iterable Collection
First of all, the data you posted isn't a valid Ruby array of hashes; it's just a sequential list of Hash objects, so you can't iterate over it. You need to wrap your Hash objects into something iterable first, such as an Array. Here's an example:
titles =
[{:series=>"GOT", :rating=>"Good", :type=>"Fantasy"},
{:series=>"BB", :rating=>"Great", :type=>"Crime"},
{:series=>"E", :rating=>"Poor", :type=>"Drama"}]
Relative Comparisons of Consecutive Elements
Now that you have an iterable collection, you can use Enumerable#each_cons (which is already mixed into Array in Ruby's core) in order to iterate over each sequential pair of Hash objects. For demonstration purposes, I've chosen to store the relative comparisons as part of an Array within each title. For example, using the Array of Hash objects stored in titles as above:
STAR_MAPPINGS = {'great' => 5, 'good' => 4, 'fair' => 2,
'poor' => 1, 'unwatchable' => 0}.freeze
COMPARISON_MAP = {
-1 => 'is worse than',
0 => 'the same as',
1 => 'is better than'
}.freeze
def compare_ratings_for title_1, title_2
fmt = '_%s_ %s _%s_'
series_1, series_2 = title_1[:series], title_2[:series]
ratings_1, ratings_2 =
STAR_MAPPINGS[title_1[:rating].downcase],
STAR_MAPPINGS[title_2[:rating].downcase]
comparison_str = COMPARISON_MAP[ratings_1 <=> ratings_2]
format fmt, series_1, comparison_str, series_2
end
titles.each_cons(2).each do |h1, h2|
# Array#| return an ordered, deduplicated union of keys
matching_keys = (h1.keys | h2.keys).flatten.uniq
next if matching_keys.none?
# perform whatever comparisons you want here; this example
# compares ratings by assigning stars to each rating
h1[:comparisons] =
h1.fetch(:comparisons, []) << compare_ratings_for(h1, h2)
h2[:comparisons] =
h2.fetch(:comparisons, []) << compare_ratings_for(h2, h1)
end
titles
The titles variable now holds and returns the following data:
[{:series=>"GOT", :rating=>"Good", :type=>"Fantasy", :comparisons=>["_GOT_ is worse than _BB_"]},
{:series=>"BB", :rating=>"Great", :type=>"Crime", :comparisons=>["_BB_ is better than _GOT_", "_BB_ is better than _E_"]},
{:series=>"E", :rating=>"Poor", :type=>"Drama", :comparisons=>["_E_ is worse than _BB_"]}]
Here's the same data again, but this time titles was pretty-printed with amazing_print for improved readability:
[
{
:series => "GOT",
:rating => "Good",
:type => "Fantasy",
:comparisons => [
"_GOT_ is worse than _BB_"
]
},
{
:series => "BB",
:rating => "Great",
:type => "Crime",
:comparisons => [
"_BB_ is better than _GOT_",
"_BB_ is better than _E_"
]
},
{
:series => "E",
:rating => "Poor",
:type => "Drama",
:comparisons => [
"_E_ is worse than _BB_"
]
}
]
See Also
Array#|
Array#<=>
I have some data that I need to export as csv. It is currently about 10,000 records and will keep growing hence I want an efficient way to do the iteration especially with regards to running several each loop, one after the other.
My question is that is there a away to avoid the many each loops i describe below and if not is there something else I can use beside Ruby's each/map to keep processing time constant irrespective of data size.
For instance:
First i will loop through the whole data to flatten and rename the fields that hold array values so that fields like issue that hol array value will be come issue_1 and issue_1 if it contains only two items in the array.
Next I will do another loop to get all the unique keys in the array of hashes.
Using the unique keys from step 2, I will do another loop to sort this unique keys using a different array that holds the order that the keys should be arranged in.
Finally another loop to generate the CSV
So I have iterated over the data 4 times using Ruby's each/map every time and the time to complete this loops will increase with data size.
Original data is in the form below :
def data
[
{"file"=> ["getty_883231284_200013331818843182490_335833.jpg"], "id" => "60706a8e-882c-45d8-ad5d-ae898b98535f", "date_uploaded" => "2019-12-24", "date_modified" => "2019-12-24", "book_title_1"=>"", "title"=> ["haha"], "edition"=> [""], "issue" => ["nov"], "creator" => ["yes", "some"], "publisher"=> ["Library"], "place_of_publication" => "London, UK"]},
{"file" => ["getty_883231284_200013331818843182490_335833.jpg"], "id" => "60706a8e-882c-45d8-ad5d-ae898b98535f", "date_uploaded" => "2019-12-24", "date_modified"=>"2019-12-24", "book_title"=> [""], "title" => ["try"], "edition"=> [""], "issue"=> ["dec", 'ten'], "creator"=> ["tako", "bell", 'big mac'], "publisher"=> ["Library"], "place_of_publication" => "NY, USA"}]
end
Remapped date by flattening arrays and renaming the keys holding those array
def csv_data
#csv_data = [
{"file_1"=>"getty_883231284_200013331818843182490_335833.jpg", "id"=>"60706a8e-882c-45d8-ad5d-ae898b98535f", "date_uploaded"=>"2019-12-24", "date_modified"=>"2019-12-24", "book_title_1"=>"", "title_1"=>"haha", "edition_1"=>"", "issue_1"=>"nov", "creator_1"=>"yes", "creator_2"=>"some", "publisher_1"=>"Library", "place_of_publication_1"=>"London, UK"},
{"file_1"=>"getty_883231284_200013331818843182490_335833.jpg", "id"=>"60706a8e-882c-45d8-ad5d-ae898b98535f", "date_uploaded"=>"2019-12-24", "date_modified"=>"2019-12-24", "book_title_1"=>"", "title_1"=>"try", "edition_1"=>"", "issue_1"=>"dec", "issue_2" => 'ten', "creator_1"=>"tako", "creator_2"=>"bell", 'creator_3' => 'big mac', "publisher_1"=>"Library", "place_of_publication_1"=>"NY, USA"}]
end
Sorting the headers for the above data
def csv_header
csv_order = ["id", "edition_1", "date_uploaded", "creator_1", "creator_2", "creator_3", "book_title_1", "publisher_1", "file_1", "place_of_publication_1", "journal_title_1", "issue_1", "issue_2", "date_modified"]
headers_object = []
sorted_header = []
all_keys = csv_data.lazy.flat_map(&:keys).force.uniq.compact
#resort using ordering by suffix eg creator_isni_1 comes before creator_isni_2
all_keys = all_keys.sort_by{ |name| [name[/\d+/].to_i, name] }
csv_order.each {|k| all_keys.select {|e| sorted_header << e if e.start_with? k} }
sorted_header.uniq
end
The generate the csv which also involves more loop:
def to_csv
data = csv_data
sorted_headers = csv_header(data)
csv = CSV.generate(headers: true) do |csv|
csv << sorted_header
csv_data.lazy.each do |hash|
csv << hash.values_at(*sorted_header)
end
end
end
To be honest, I was more intrigued to see if I am able to find out what your desired logic is without further description, than about the programming part alone (but of course i enjoyed that as well, it has been ages i did some Ruby, this was a good refresher). Since the mission is not clearly stated, it has to be "distilled" by reading your description, input data and code.
I think what you should do is to keep everything in very basic and lightweight arrays and do the heavy lifting while reading the data in one single big step.
I also made the assumption that if a key ends with a number, or if a value is an array, you want it to be returned as {key}_{n}, even if there's only one value present.
So far i came up with this code (Logic described in comments) and repl demo here
class CustomData
# #keys array structure
# 0: Key
# 1: Maximum amount of values associated
# 2: Is an array (Found a {key}_n key in feed,
# or value in feed was an array)
#
# #data: is a simple array of arrays
attr_accessor :keys, :data
CSV_ORDER = %w[
id edition date_uploaded creator book_title publisher
file place_of_publication journal_title issue date_modified
]
def initialize(feed)
#keys = CSV_ORDER.map { |key| [key, 0, false]}
#data = []
feed.each do |row|
new_row = []
# Sort keys in order to maintain the right order for {key}_{n} values
row.sort_by { |key, _| key }.each do |key, value|
is_array = false
if key =~ /_\d+$/
# If key ends with a number, extract key
# and remember it is an array for the output
key, is_array = key[/^(.*)_\d+$/, 1], true
end
if value.is_a? Array
# If value is an array, even if the key did not end with a number,
# we remember that for the output
is_array = true
else
value = [value]
end
# Find position of key if exists or nil
key_index = #keys.index { |a| a.first == key }
if key_index
# If you could have a combination of _n keys and array values
# for a key in your feed, you need to change this portion here
# to account for all previous values, which would add some complexity
#
# If current amount of values is greater than the saved one, override
#keys[key_index][1] = value.length if #keys[key_index][1] < value.length
#keys[key_index][2] = true if is_array and not #keys[key_index][2]
else
# It is a new key in #keys array
key_index = #keys.length
#keys << [key, value.length, is_array]
end
# Add value array at known key index
# (will be padded with nil if idx is greater than array size)
new_row[key_index] = value
end
#data << new_row
end
end
def to_csv_data(headers=true)
result, header, body = [], [], []
if headers
#keys.each do |key|
if key[2]
# If the key should hold multiple values, build the header string
key[1].times { |i| header << "#{key[0]}_#{i+1}" }
else
# Otherwise it is a singular value and the header goes unmodified
header << key[0]
end
end
result << header
end
#data.each do |row|
new_row = []
row.each_with_index do |value, index|
# Use the value counter from #keys to pad with nil values,
# if a value is not present
#keys[index][1].times do |count|
new_row << value[count]
end
end
body << new_row
end
result << body
end
end
I have two arrays of hashes which are related by a common set of keys:
Array 1 is:
[
{0=>"pmet-add-install-module-timings.patch"},
{1=>"pmet-change-sample-data-load-order.patch"},
{2=>"pmet-configurable-recurring.patch"},
{3=>"pmet-consumers-run-staggered-by-sleep.patch"},
{4=>"pmet-dynamic-block-segment-display.patch"},
{5=>"pmet-fix-admin-label-word-breaking.patch"},
{6=>"pmet-fix-invalid-module-dependencies.patch"},
{7=>"pmet-fix-invalid-sample-data-module-dependencies.patch"},
{8=>"pmet-fix-module-loader-algorithm.patch"},
{9=>"pmet-fix-sample-data-code-generator.patch"},
{10=>"pmet-remove-id-requirement-from-layout-update-file.patch"},
{11=>"pmet-specify-store-id-for-order.patch"},
{12=>"pmet-staging-preview-js-fix.patch"},
{13=>"pmet-stop-catching-sample-data-errrors-during-install.patch"},
{14=>"pmet-visitor-segment.patch"}
]
Array 2 is:
[
{0=>"magento2-base"},
{1=>"magento/module-sample-data"},
{2=>"magento/module-configurable-sample-data"},
{3=>"magento/module-message-queue"},
{4=>"magento/module-banner"},
{5=>"magento/theme-adminhtml-backend"},
{6=>"magento/module-staging"},
{7=>"magento/module-gift-registry-sample-data"},
{8=>"magento2-base"},
{9=>"magento/module-downloadable-sample-data"},
{10=>"magento/module-catalog"},
{11=>"magento/module-sales-sample-data"},
{12=>"magento/module-staging"},
{13=>"magento2-base"},
{14=>"magento/module-customer"}
]
The hashes in these arrays have the same set of indexes, and the second array has duplicate values in keys 0, 8, and 13 as well as in 6 and 12.
My goal is to stitch the values from these two data sets together into a set of nested hashes. Wherever there is a duplicated value in Array 2, I need to collect its associated values from Array 1 and include them in a nested hash.
For example, take the magento2-base values from Array 2 and the key-associated values from Array 1. The hash structure in Ruby would look like:
hash = {
"magento2-base" => [
{0 => "m2-hotfixes/pmet-add-install-module-timings.patch"},
{8 => "m2-hotfixes/pmet-fix-module-loader-algorithm.patch"},
{13 => "m2-hotfixes/pmet-stop-catching-sample-data-errrors-during-install.patch"}
]
}
The same would hold true for any other duplicated values from Array 2, so, for example, magento/module-staging would be:
hash = {
"magento/module-staging" => [
{6 => "pmet-fix-invalid-module-dependencies.patch"},
{12 => "pmet-staging-preview-js-fix.patch"}
]
}
A larger excerpt of the resultant hash which combines these needs together would look like this:
hash = {
"magento2-base" =>
[
{0 => "m2-hotfixes/pmet-add-install-module-timings.patch"},
{8 => "m2-hotfixes/pmet-fix-module-loader-algorithm.patch"},
{13 => "m2-hotfixes/pmet-stop-catching-sample-data-errrors-during-install.patch"}
],
"magento/module-sample-data" =>
{0 => "pmet-change-sample-data-load-order.patch"},
"magento/module-configurable-sample-data" =>
{2 => "pmet-configurable-recurring.patch"},
"magento/module-message-queue" =>
{3 => "pmet-consumers-run-staggered-by-sleep.patch"}
"magento/module-staging" =>
[
{6 => "pmet-fix-invalid-module-dependencies.patch"},
{12 => "pmet-staging-preview-js-fix.patch"}
],
...
}
I used a nested loop which combines both arrays to link up the keys, and attempted to pull out the duplicates from Array 2, and was thinking I'd need to maintain both an array of the duplicate values from array 2 as well as an array of their associated values from Array 1. Then, I'd use some array merging magic to put it all back together.
Here's what I have:
found_modules_array = []
duplicate_modules_array = []
duplicate_module_hash = {}
file_collection_array = []
modules_array.each do |module_hash|
module_hash.each do |module_hash_key, module_hash_value|
files_array.each do |file_hash|
file_hash.each do |file_hash_key, file_hash_value|
if module_hash_key == file_hash_key
if found_modules_array.include?(module_hash_value)
duplicate_module_hash = {
module_hash_key => module_hash_value
}
duplicate_modules_array << duplicate_module_hash
end
found_modules_array << module_hash_value
end
end
end
end
end
In this code, files_array is Array 1 and modules_array is Array 2. found_modules_array is a bucket to hold any duplicates before pushing them into a duplicate_module_hash which would then be pushed into the duplicates_modules_array.
This solution:
Doesn't work
Doesn't take advantage of the power of Ruby
Isn't performant
EDIT
The path to the above data structure is explained in full detail in the following post: Using array values as hash keys to create nested hashes in Ruby
I'll summarize it below:
I have a directory of files. The majority of them are .patch files, although some of them are not. For each patch file, I need to scan the first line which is always a string and extract a portion of that line. With a combination of each file's name, that portion of each first line, and a unique identifier for each file, I need to create a hash which I will then convert to json and write to a file.
Here are examples:
Directory of Files:
|__ .gitkeep
|__ pmet-add-install-module-timings.patch
|__ pmet-change-sample-data-load-order.patch
First Line Examples:
File Name: `pmet-add-install-module-timings.patch`
First Line: `diff --git a/setup/src/Magento/Setup/Model/Installer.php b/setup/src/Magento/Setup/Model/Installer.php`
File Name: `pmet-change-sample-data-load-order.patch`
First Line: `diff --git a/vendor/magento/module-sample-data/etc/module.xml b/vendor/magento/module-sample-data/etc/module.xml`
File Name: `pmet-stop-catching-sample-data-errrors-during-install.patch`
First Line: `diff --git a/vendor/magento/framework/Setup/SampleData/Executor.php b/vendor/magento/framework/Setup/SampleData/Executor.php`
File Name: `pmet-fix-admin-label-word-breaking.patch`
First Line: `diff --git a/vendor/magento/theme-adminhtml-backend/web/css/styles-old.less b/vendor/magento/theme-adminhtml-backend/web/css/styles-old.less`
Example Json File:
{
"patches": {
"magento/magento2-base": {
"Patch 1": "m2-hotfixes/pmet-add-install-module-timings.patch"
},
"magento/module-sample-data": {
"Patch 2": "m2-hotfixes/pmet-change-sample-data-load-order.patch"
},
"magento/theme-adminhtml-backend": {
"Patch 3": "m2-hotfixes/pmet-fix-admin-label-word-breaking.patch"
},
"magento/framework": {
"Patch 4": "m2-hotfixes/pmet-stop-catching-sample-data-errrors-during-install.patch"
}
}
}
The problem I encountered is that while json allows for duplicate keys, ruby hashes don't, so items were removed from the json file because they were removed from the hash. To solve this, I assumed I needed to create the array structure I specified so as to keep the IDs as the consistent identifier between the files scraped and the corresponding data belonging to them so that I could put the data together in a different arrangement. Now I realize this isn't the case, so I have switched the approach to use the following:
files.each_with_index do |file, key|
value = File.open(file, &:readline).split('/')[3]
if value.match(/module-/) || value.match(/theme-/)
result = "magento/#{value}"
else
result = "magento2-base"
end
file_array << file
module_array << result
end
This yields the flat hashes that have been suggested below.
So first of all, the structure
arr1 = [
{0=>"pmet-add-install-module-timings.patch"},
{1=>"pmet-change-sample-data-load-order.patch"},
{2=>"pmet-configurable-recurring.patch"},
{3=>"pmet-consumers-run-staggered-by-sleep.patch"},
# etc
]
is a little odd. It's easier to work with as a flat hash, e.g.
h1 = {
0 => "pmet-add-install-module-timings.patch",
1 => "pmet-change-sample-data-load-order.patch",
2 => "pmet-configurable-recurring.patch",
3 =>"pmet-consumers-run-staggered-by-sleep.patch",
# etc
}
Fortunately it's quite easy to transform between the two:
h1 = arr1.reduce(&:merge)
h2 = arr2.reduce(&:merge)
From this point, Enumerable methods (in this case, the ever-useful map, group_by, and transform_values) will take you the rest of the way:
indexed_by_val = h2.
group_by { |k,v| v }.
transform_values { |vals| vals.map(&:first) }
Which gives you a map of val to indexes:
{
"magento2-base"=>[0, 8, 13],
"magento/module-sample-data"=>[1],
"magento/module-configurable-sample-data"=>[2],
# etc
}
and then we can replace those lists of indexes with the corresponding values in h1:
result = indexed_by_val.transform_values do |indexes|
indexes.map do |idx|
{ idx => h1[idx] }
end
end
which produces your desired data structure:
{
"magento2-base"=>[
{0=>"pmet-add-install-module-timings.patch"},
{8=>"pmet-fix-module-loader-algorithm.patch"},
{13=>"pmet-stop-catching-sample-data-errrors-during-install.patch"}
],
"magento/module-sample-data"=>[
{1=>"pmet-change-sample-data-load-order.patch"}
],
"magento/module-configurable-sample-data"=>[
{2=>"pmet-configurable-recurring.patch"}
],
# etc
}
I did notice that in your expected output that you specified, the values are hashes or arrays. I would recommend against this practice. It's much better to have a uniform data type for all a hash's keys and values. But, if you really did want to do this for whatever reason, it's not too difficult:
# I am not advising this approach
result2 = result.transform_values do |arr|
arr.length > 1 ? arr : arr[0]
end
By the way, I know this kind of functional programming / enumerable chaining code can be a bit hard to decipher. so I would recommend running it line-by-line for your understanding.
Assuming you're using the unified data structure I mentioned above, I would recommend calling .transform_values { |vals| vals.reduce(&:merge) } on your final result so that the values are single hashes instead of multiple hashes:
{
"magento2-base"=>{
0=>"pmet-add-install-module-timings.patch",
8=>"pmet-fix-module-loader-algorithm.patch",
13=>"pmet-stop-catching-sample-data-errrors-during-install.patch"
},
"magento/module-sample-data"=>{
1=>"pmet-change-sample-data-load-order.patch"
],
"magento/module-configurable-sample-data"=>{
2=>"pmet-configurable-recurring.patch"
},
# etc
}
Let arr1 and arr2 be your two arrays. Due to the fact that they are the same size and that for each index i, arr1[i][i] and arr2[i][i] are the values of the key i of the hashes arr1[i] and arr2[i], the desired result can be obtained quite easily:
arr2.each_with_index.with_object({}) do |(g,i),h|
(h[g[i]] ||= []) << arr1[i][i]
end
#=> {
# "magento2-base"=>[
# "pmet-add-install-module-timings.patch",
# "pmet-fix-module-loader-algorithm.patch",
# "pmet-stop-catching-sample-data-errrors-during-install.patch"
# ],
# "magento/module-sample-data"=>[
# "pmet-change-sample-data-load-order.patch"
# ],
# ...
# "magento/module-staging"=>[
# "pmet-fix-invalid-module-dependencies.patch",
# "pmet-staging-preview-js-fix.patch"
# ],
# "magento/module-customer"=>[
# "pmet-visitor-segment.patch"
# ]
# }
The fragment
h[g[i]] ||= []
is effectively expanded to
h[g[i]] = h[g[i]] || [] # *
If the hash h has no key [g[i]],
h[g[i]] #=> nil
so * becomes
h[g[i]] = nil || [] #=> []
after which
h[g[i]] << "cat"
#=> ["cat"]
(which works with "dog" as well). The above expression can instead be written:
arr2.each_with_index.with_object(Hash.new {|h,k| h[k]=[]}) do |(g,i),h|
h[g[i]] << arr1[i][i]
end
This uses the form of Hash::new that employs a block (here {|h,k| h[k]=[]}) that is called when the hash is accessed by a value that is not one of its keys.
An alternative method is:
arr2.each_with_index.with_object({}) do |(g,i),h|
h.update(g[i]=>[arr1[i][i]]) { |_,o,n| o+n }
end
This uses the form of Hash#update (aka merge!) that employs a block to determine the values of keys that are in both hashes being merged.
A third way is to use Enumerable#group_by:
arr2.each_with_index.group_by { |h,i| arr2[i][i] }.
transform_values { |a| a.map { |_,i| arr1[i][i] } }
I am looking for a way to select only duplicate entries from multiple Arrays of Hashes.
Say I have a project with an attribute called "exchange_rate":
project.exchange_rate #=>
[{"name"=>"USD", "rate"=>1.0},
{"name"=>"EUR", "rate"=>0.91},
{"name"=>"CNY", "rate"=>6.51},
{"name"=>"NOK", "rate"=>1},
{"name"=>"DKK", "rate"=>1},
{"name"=>"JPY", "rate"=>113.24}]
Now I have multiple projects which have the same construct, just with a little more/less entries in the Array. The "rate" within the Hash isn't important at all. I just need to iterate over all projects and their exchange_rates and find those entries that are in each and every of the Arrays.
So to speak, if I had the following project_2:
project_2.exchange_rate #=>
[{"name"=>"USD", "rate"=>1.0},
{"name"=>"GBP", "rate"=>0.7},
{"name"=>"SGD", "rate"=>1.38},
{"name"=>"HKD", "rate"=>7.76},
{"name"=>"CNY", "rate"=>0.94},
{"name"=>"DE", "rate"=>0.86},
{"name"=>"JPY", "rate"=>113.24}]
After comparing these two entries, I'd like to end up with an Array that looks like so:
# => ["USD", "CNY", "JPY"]
Because these three names are in both of the projects. This should, of course, be dynamic and work with whatever number of projects and exchange_rates.
I can't seem to find a way of doing this.
I tried the following already:
er = projects.map { |e| e[:exchange_rate] }.inject(:+)
founds = er.find_all { |x| er.count(x) > 1 }.uniq
But it comes up with a huge Array that includes all kind of values, not just duplicates.
TL;DR:
I need to iterate over all projects and their exchange_rates
I need to find all duplicated entries of these
I need to end up with just the "name" value of these
I have an unknown amount of projects as well as exchange_rates bound to each project
Thank you very much in advance!
I figured this isn't exactly what I need, so I changed my mind and did it differently.
Still, the question might be viable for others to get answered. If you have an answer, go ahead and post it :)
My (completely off-topic) result:
names = projects.map{|p| p[:exchange_rates].map{|er| er["name"] } }
final = names.flatten.uniq
# from => [["USD", "EUR", "GBR"], [], ["MYR", "GBR"], ["USD"], ...]
# to ["USD", "EUR", "GBR", "MYR"]
you can simply use project_1.exchange_rate & project_2.exchange_rate
, which gives you [{"name"=>"USD", "rate"=>1.0}, {"name"=>"JPY", "rate"=>113.24}], i.e common entries from both the arrays whose key and value match in both arrays.
But if you're looking for finding only the common elements in terms of keys of the hashes in the two arrays, you can try something like this
project_1.exchange_rate.map {|e| e["name"]} &
project_2.exchange_rate.map {|e| e["name"]}
#=> ["USD", "CNY", "JPY"]
If you have multiple arrays like you said, try something like this:
def get_duplicate_keys(*rates)
all_rates = rates.inject([]) { |s, e| s + e }
temp = all_rates.group_by { |e| e["name"] }
temp.select { |k,v| v.count > 1 }.keys
end
r1 = [{"name"=>"USD", "rate"=>1.0},
{"name"=>"EUR", "rate"=>0.91},
{"name"=>"CNY", "rate"=>6.51},
{"name"=>"NOK", "rate"=>1},
{"name"=>"DKK", "rate"=>1},
{"name"=>"JPY", "rate"=>113.24}]
r2 = [{"name"=>"USD", "rate"=>1.0},
{"name"=>"GBP", "rate"=>0.7},
{"name"=>"SGD", "rate"=>1.38},
{"name"=>"HKD", "rate"=>7.76},
{"name"=>"CNY", "rate"=>0.94},
{"name"=>"DE", "rate"=>0.86},
{"name"=>"JPY", "rate"=>113.24}]
r3 = [{"name"=>"GBP", "rate"=>0.7},
{"name"=>"SGD", "rate"=>1.38}]
p get_duplicate_keys(r1 + r2 + r3)
#=> ["USD", "CNY", "JPY", "GBP", "SGD"]
You can try this solution,
duplicates = project.exchange_rate & project_2.exchange_rate
and then
duplicates.map{|er| er["name"]}
This returns result
=> ["USD", "CNY", "JPY"]
OR You can try below solution.....
Firstly you find array of of names for both projects
proj1_names = []
project.exchange_rates.each{ |er| proj1_names << er["name"] }
proj2_names = []
project_2.exchange_rates.each{ |er| proj2_names << er["name"]}
this gives result like
proj1_names = ["USD","EUR","CNY","NOK","DKK","JPY"]
proj2_names = ["USD","GBP","SGD","HKD","CNY","DE","JPY"]
and then try below method
proj1_names.select{|name| proj2_names.include?(name)}
this returns duplicate names as result
i.e => ["USD", "CNY", "JPY"]
May this helps you..
I have a CSV file with contents:
John,1,2,4,67,100,41,234
Maria,45,23,67,68,300,250
I need to read this content and separate these data into two sections:
1.a Legend1 = John
1.b Legend2 = Maria
2.a Data_array1 = [1,2,4,67,100,41,234]
2.b Data_array2 = [45,23,67,a,67,300,250]
Here is my code; it reads the contents and separates the contents from ','.
testsample = CSV.read('samples/linechart.csv')
CSV.foreach('samples/linechart.csv') do |row|
puts row
end
Its output results in a class of array elements. I am stuck in pursuing it further.
I would recommend not using CSV.read for this it's too simple for that - instead, use File.open and read each line and treat it as a big string.
eg:
# this turns the file into an array of lines
# eg you now have: ["John,1,2,4,67,100,41,234", "Maria,45,23,67,a,67,300,250"]
lines = File.readlines('samples/linechart.csv')
# if you want to do this for each line, just iterate over this array:
lines.each do |line|
# now split each line by the commas to turn it into an array of strings
# eg you have: ["john","1","2","4","67","100","41","234"]
values = line.split(',')
# now, grab the first one as you name and the rest of them as an array of strings
legend = values[0] # "john"
data_array = values[1..-1] # ["1","2","4","67","100","41","234"]
# now do what you need to do with the name/numbers eg
puts "#{legend}: [#{data_array.join(',')}]"
# if you want the second array to be actual numbers instead of strings, you can convert them to numbers using to_i (or to_f if you want floats instead of integers)
# the following says "take each value and call to_i on it and return the set of new values"
data_array = data_array.map(&:to_i)
end # end of iterating over the array
First get the data out of csv like:
require 'csv'
csv_text = File.read('/tmp/a.csv')
csv = CSV.parse(csv_text)
# => [["John", "1", "2", "4", "67", "100", "41", "234"], ["Maria", "45", "23", "67", "a", "67", "300", "250"]]
Now you can format output as per your requirements. Eg:
csv.each.with_index(1){ |a, i|
puts "Legend#{i.to_s} = #{a[0]}"
}
# Legend1 = John
# Legend2 = Maria
You may looking for this,
csv = CSV.new(body)
csv.to_a
You can have a look at http://technicalpickles.com/posts/parsing-csv-with-ruby/
Reference this, too, if needed.
Over-engineered version ;)
class Lines
class Line
attr_reader :legend, :array
def initialize(line)
#line = line
parse
end
private
def parse
#legend, *array = #line.strip.split(",")
#array = array.map(&:to_i)
end
end
def self.parse(file_name)
File.readlines(file_name).map do |line|
Line.new(line)
end
end
end
Lines.parse("file_name.csv").each do |o|
p o.legend
p o.array
puts
end
# Result:
#
# "John"
# [1, 2, 4, 67, 100, 41, 234]
#
# "Maria"
# [45, 23, 67, 68, 300, 250]
Notes:
Basically, Lines.parse("file_name.csv") will give you an array of objects that will respond to the methods: legend and array; which holds the name and array of numbers respectively.
Jokes aside, I think OO will help maintainability.