Find duplicated values in an Array of Hashes - arrays

I am looking for a way to select only duplicate entries from multiple Arrays of Hashes.
Say I have a project with an attribute called "exchange_rate":
project.exchange_rate #=>
[{"name"=>"USD", "rate"=>1.0},
{"name"=>"EUR", "rate"=>0.91},
{"name"=>"CNY", "rate"=>6.51},
{"name"=>"NOK", "rate"=>1},
{"name"=>"DKK", "rate"=>1},
{"name"=>"JPY", "rate"=>113.24}]
Now I have multiple projects which have the same construct, just with a little more/less entries in the Array. The "rate" within the Hash isn't important at all. I just need to iterate over all projects and their exchange_rates and find those entries that are in each and every of the Arrays.
So to speak, if I had the following project_2:
project_2.exchange_rate #=>
[{"name"=>"USD", "rate"=>1.0},
{"name"=>"GBP", "rate"=>0.7},
{"name"=>"SGD", "rate"=>1.38},
{"name"=>"HKD", "rate"=>7.76},
{"name"=>"CNY", "rate"=>0.94},
{"name"=>"DE", "rate"=>0.86},
{"name"=>"JPY", "rate"=>113.24}]
After comparing these two entries, I'd like to end up with an Array that looks like so:
# => ["USD", "CNY", "JPY"]
Because these three names are in both of the projects. This should, of course, be dynamic and work with whatever number of projects and exchange_rates.
I can't seem to find a way of doing this.
I tried the following already:
er = projects.map { |e| e[:exchange_rate] }.inject(:+)
founds = er.find_all { |x| er.count(x) > 1 }.uniq
But it comes up with a huge Array that includes all kind of values, not just duplicates.
TL;DR:
I need to iterate over all projects and their exchange_rates
I need to find all duplicated entries of these
I need to end up with just the "name" value of these
I have an unknown amount of projects as well as exchange_rates bound to each project
Thank you very much in advance!
I figured this isn't exactly what I need, so I changed my mind and did it differently.
Still, the question might be viable for others to get answered. If you have an answer, go ahead and post it :)
My (completely off-topic) result:
names = projects.map{|p| p[:exchange_rates].map{|er| er["name"] } }
final = names.flatten.uniq
# from => [["USD", "EUR", "GBR"], [], ["MYR", "GBR"], ["USD"], ...]
# to ["USD", "EUR", "GBR", "MYR"]

you can simply use project_1.exchange_rate & project_2.exchange_rate
, which gives you [{"name"=>"USD", "rate"=>1.0}, {"name"=>"JPY", "rate"=>113.24}], i.e common entries from both the arrays whose key and value match in both arrays.
But if you're looking for finding only the common elements in terms of keys of the hashes in the two arrays, you can try something like this
project_1.exchange_rate.map {|e| e["name"]} &
project_2.exchange_rate.map {|e| e["name"]}
#=> ["USD", "CNY", "JPY"]
If you have multiple arrays like you said, try something like this:
def get_duplicate_keys(*rates)
all_rates = rates.inject([]) { |s, e| s + e }
temp = all_rates.group_by { |e| e["name"] }
temp.select { |k,v| v.count > 1 }.keys
end
r1 = [{"name"=>"USD", "rate"=>1.0},
{"name"=>"EUR", "rate"=>0.91},
{"name"=>"CNY", "rate"=>6.51},
{"name"=>"NOK", "rate"=>1},
{"name"=>"DKK", "rate"=>1},
{"name"=>"JPY", "rate"=>113.24}]
r2 = [{"name"=>"USD", "rate"=>1.0},
{"name"=>"GBP", "rate"=>0.7},
{"name"=>"SGD", "rate"=>1.38},
{"name"=>"HKD", "rate"=>7.76},
{"name"=>"CNY", "rate"=>0.94},
{"name"=>"DE", "rate"=>0.86},
{"name"=>"JPY", "rate"=>113.24}]
r3 = [{"name"=>"GBP", "rate"=>0.7},
{"name"=>"SGD", "rate"=>1.38}]
p get_duplicate_keys(r1 + r2 + r3)
#=> ["USD", "CNY", "JPY", "GBP", "SGD"]

You can try this solution,
duplicates = project.exchange_rate & project_2.exchange_rate
and then
duplicates.map{|er| er["name"]}
This returns result
=> ["USD", "CNY", "JPY"]
OR You can try below solution.....
Firstly you find array of of names for both projects
proj1_names = []
project.exchange_rates.each{ |er| proj1_names << er["name"] }
proj2_names = []
project_2.exchange_rates.each{ |er| proj2_names << er["name"]}
this gives result like
proj1_names = ["USD","EUR","CNY","NOK","DKK","JPY"]
proj2_names = ["USD","GBP","SGD","HKD","CNY","DE","JPY"]
and then try below method
proj1_names.select{|name| proj2_names.include?(name)}
this returns duplicate names as result
i.e => ["USD", "CNY", "JPY"]
May this helps you..

Related

In Ruby, during iteration of an array, how do I send multiple array elements as values for one specific hash key?

I know how to iterate through an array and I know how to create hash keys and values. I do not know how to create an array for the value and pass it multiple elements. My desired value for hash below is:
{'abc' => [1, 2, 3] , 'def' => [4,5,6,7]}
How would I achieve this hash, while iterating through array a and b below using each?
a = [1,2,3,4,5,6,7]
c = [1,2,3]
b = ['abc', 'def']
hash = {}
From your guidelines given in the comment:
While iterating through array a, if the element of iteration is included in array c, it is passed to the array value within key 'abc'. Otherwise, it is passed to other array value in key 'def'
You can do this:
hash = {}
hash['abc'] = a.select { |x| c.include?(x) }
hash['def'] = a.reject{ |x| c.include?(x) }
See Enumerable#select and Enumerable#reject. Also can take a look at Enumerable#partition which would be another good choice here, where you want to split an array into two arrays based on some condition:
in_a, not_in_a = a.partition { |x| c.include?(x) }
hash = { 'abc' => in_a, 'def' => not_in_a }
You can also do it with regular each if these fancy enumerable methods are bit too much for you:
hash = { 'abc' => [], 'def' => [] }
a.each do |x|
if c.include?(x)
hash['abc'].push(x)
else
hash['def'].push(x)
end
end
Unfortunately this question turned out not to be as interesting as I was hoping. I was hoping that the problem was this:
Knowing the hash key and a value, how can I make sure the key's value is an array and that the given value is appended to that?
For instance, start with h being {}. I have a key name :k and a value 1. I want h[:k], if it doesn't already exist, to be [1]. But if it does already exist, then it's an array and I want to append 1 to that array; for instance, if h[:k] is already [3,2], now it should be [3,2,1].
I can think of various ways to ensure that, but one possibility is this:
(hash[key] ||= []) << value
To see that this works, let's make it a method:
def add_to_hash_array_value(hash:, key:, value:)
(hash[key] ||= []) << value
end
Now I'll just call that a bunch of times:
h = {}
add_to_hash_array_value(hash:h, key:"abc", value:1)
add_to_hash_array_value(hash:h, key:"abc", value:2)
add_to_hash_array_value(hash:h, key:"def", value:4)
add_to_hash_array_value(hash:h, key:"def", value:5)
add_to_hash_array_value(hash:h, key:"abc", value:3)
puts h #=> {"abc"=>[1, 2, 3], "def"=>[4, 5]}
We got the right answer.
This is nice because suppose I have some way of knowing, given a value, what key it should be appended to. I can just repeatedly apply that decision-making process.
However, the trouble with trying to apply that to the original question is that the original question seems not to know exactly how to decide, given a value, what key it should be appended to.

Passing elements from one array to another, generating CSV

I'm trying to write a program in Ruby that allows one array to receive information from another array. Basically, I have a multidimensional array called "student_array" that contains information on a few students
student_array = [["Mike", 13, "American", "male"],
["Grace", 12, "Canadian", "female"],
["Joey", 13, "American", "male"],
["Lily", 13, "American", "female"]
]
I also initialized two other arrays that will count nationalities:
nationality_array = Array.new
nationality_count = Array.new
The purpose of this program is to loop through student array, count the different nationalities of the students, and create a CSV file that will contain the headers of the different nationalities, and a count for each one.
Expected output.csv
American, Canadian
3, 1
Here is the code I have so far
student_array.each do |student|
#pushes the nationality string into the nationality array
nationality_array.push(student[2])
end
so the nationality_array should currently look like this:
nationality_array = ["American", "Canadian", "American", "American"];
nationality_array.uniq = ["American", "Canadian"];
So I will have two headers - "American" and "Canadian"
Now I need a way to loop through the student_array, count up each instance of "American" and "Canadian", and somehow assign it back to the nationality array. I'm having a hard time visualizing how to go about this. This is what I have so far--
american_count = 0;
canadian_count = 0;
student_array.each do |student|
if student[2] = "American"
american_count++
elsif student[2] = "Canadian"
canadian_count++
end
end
nationality_count.push(american_count);
nationality_count.push(canadian_count);
Okay, now I have those counts in the nationality_count array, but how can I pass it to a CSV, making sure that they are assigned to the right headers? I have a feeling that my code is very awkward and could be much more streamlined as well.
It would probably look something like this?
CSV.open("output/redemptions.csv", "wb") do |csv|
csv << [nationality_array]
csv << [nationality_count]
end
Can anyone provide any insight into a cleaner way to go about this?
You could use a Hash to group the counts by nationality instead of different arrays.
nationalities_count = student_array.each_with_object(Hash.new(0)) do |student, hash|
nationality = student[2]
hash[nationality] += 1
end
That will give you a Hash that would look like
{ "American" => 2, "Canadian" => 1 }
You could then use Hash#to_a and Array#transpose like so:
hsh = { "American" => 2, "Canadian" => 1 }
=> {"American"=>2, "Canadian"=>1}
2.4.2 :002 > hsh.to_a
=> [["American", 2], ["Canadian", 1]]
2.4.2 :003 > hsh.to_a.transpose
=> [["American", "Canadian"], [2, 1]]
Finally, to output the CSV file all you need to do is write the arrays into the file
nationalities_with_count = hash.to_a.transpose
CSV.open("output/redemptions.csv", "wb") do |csv|
csv << nationalities_with_count[0]
csv << nationalities_with_count[1]
end
Array#group_by in Ruby core and Hash#transform_values in ActiveSupport are two very versitile methods that can be used here:
require 'active_support/all'
require 'csv'
student_array = [
["Mike", 13, "American", "male"],
["Grace", 12, "Canadian", "female"],
["Joey", 13, "American", "male"],
["Lily", 13, "American", "female"]
]
counts = student_array.group_by { |attrs| attrs[2] }.transform_values(&:length)
# => => {"American"=>3, "Canadian"=>1}
CSV.open("output/redemptions.csv", "wb") do |csv|
csv << counts.keys
csv << counts.values
end
puts File.read "output/redemptions.csv"
# => American,Canadian
# 3,1
.group_by { |attrs| attrs[2] } turns the array into a hash, where keys are the unique values for attrs[2], and values are a list of elements that have that attrs[2]. At this point you can use transform_values to turn those values into numbers representing their length (meaning, how many elements have that specific attrs[2]). The keys and values can then be extracted from the hash as separate arrays.
You even don’t need a CSV tool here:
result =
student_array.
map { |a| a[2] }. # get nationalities
group_by { |e| e }. # hash
map { |n, c| [n, c.count] }. # map values to count
transpose. # put data in rows
map { |row| row.join ',' }. # join values in a row
join($/) # join rows
#⇒ American,Canadian
# 3,1
Now you have a string that is valid CSV, just spit it out to the file.

How to collapse a multi-dimensional array of hashes in Ruby?

Background:
Hey all, I am experimenting with external APIs and am trying to pull in all of the followers of a User from a site and apply some sorting.
I have refactored a lot of the code, HOWEVER, there is one part that is giving me a really tough time. I am convinced there is an easier way to implement this than what I have included and would be really grateful on any tips to do this in a much more eloquent way.
My goal is simple. I want to collapse an array of arrays of hashes (I hope that is the correct way to explain it) into one array of hashes.
Problem Description:
I have an array named f_collectionswhich has 5 elements. Each element is an array of size 200. Each sub-element of these arrays is a hash of about 10 key-value pairs. My best representation of this is as follows:
f_collections = [ collection1, collection2, ..., collection5 ]
collection1 = [ hash1, hash2, ..., hash200]
hash1 = { user_id: 1, user_name: "bob", ...}
I am trying to collapse this multi-dimensional array into one array of hashes. Since there are five collection arrays, this means the results array would have 1000 elements - all of which would be hashes.
followers = [hash1, hash2, ..., hash1000]
Code (i.e. my attempt which I do not want to keep):
I have gotten this to work with a very ugly piece of code (see below), with nested if statements, blocks, for loops, etc... This thing is a nightmare to read and I have tried my hardest to research ways to do this in a simpler way, I just cannot figure out how. I have tried flatten but it doesn't seem to work.
I am mostly just including this code to show I have tried very hard to solve this problem, and while yes I solved it, there must be a better way!
Note: I have simplified some variables to integers in the code below to make it more readable.
for n in 1..5 do
if n < 5
(0..199).each do |j|
if n == 1
nj = j
else
nj = (n - 1) * 200 + j
end
#followers[nj] = #f_collections[n-1].collection[j]
end
else
(0..199).each do |jj|
njj = (4) * 200 + jj
#followers[njj] = #f_collections[n-1].collection[jj]
end
end
end
Oh... so It is not an array objects that hold collections of hashes. Kind of. Lets give it another try:
flat = f_collection.map do |col|
col.collection
end.flatten
which can be shortened (and is more performant) to:
flat = f_collection.flat_map do |col|
col.collection
end
This works because the items in the f_collection array are objects that have a collection attribute, which in turn is an array.
So it is "array of things that have an array that contains hashes"
Old Answer follows below. I leave it here for documentation purpose. It was based on the assumption that the data structure is an array of array of hashes.
Just use #flatten (or #flatten! if you want this to be "inline")
flat = f_collections.flatten
Example
sub1 = [{a: 1}, {a: 2}]
sub2 = [{a: 3}, {a: 4}]
collection = [sub1, sub2]
flat = collection.flatten # returns a new collection
puts flat #> [{:a=>1}, {:a=>2}, {:a=>3}, {:a=>4}]
# or use the "inplace"/"destructive" version
collection.flatten! # modifies existing collection
puts collection #> [{:a=>1}, {:a=>2}, {:a=>3}, {:a=>4}]
Some recommendations for your existing code:
Do not use for n in 1..5, use Ruby-Style enumeration:
["some", "values"].each do |value|
puts value
end
Like this you do not need to hardcode the length (5) of the array (did not realize you removed the variables that specify these magic numbers). If you you want to detect the last iteration you can use each_with_index:
a = ["some", "home", "rome"]
a.each_with_index do |value, index|
if index == a.length - 1
puts "Last value is #{value}"
else
puts "Values before last: #{value}"
end
end
While #flatten will solve your problem you might want to see how DIY-solution could look like:
def flatten_recursive(collection, target = [])
collection.each do |item|
if item.is_a?(Array)
flatten_recursive(item, target)
else
target << item
end
end
target
end
Or an iterative solution (that is limited to two levels):
def flatten_iterative(collection)
target = []
collection.each do |sub|
sub.each do |item|
target << item
end
end
target
end

Add key value pair to Array of Hashes when unique Id's match

I have two arrays of hashes
sent_array = [{:sellersku=>"0421077128", :asin=>"B00ND80WKY"},
{:sellersku=>"0320248102", :asin=>"B00WTEF9FG"},
{:sellersku=>"0324823180", :asin=>"B00HXZLB4E"}]
active_array = [{:price=>39.99, :asin1=>"B00ND80WKY"},
{:price=>7.99, :asin1=>"B00YSN9QOG"},
{:price=>10, :asin1=>"B00HXZLB4E"}]
I want to loop through sent_array, and find where the value in :asin is equal to the value in :asin1 in active_array, then copy the key & value of :price to sent_array. Resulting in this:
final_array = [{:sellersku=>"0421077128", :asin=>"B00ND80WKY", :price=>39.99},
{:sellersku=>"0320248102", :asin=>"B00WTEF9FG"},
{:sellersku=>"0324823180", :asin=>"B00HXZLB4E", :price=>10}]
I tried this, but I get a TypeError - no implicit conversion of Symbol into Integer (TypeError)
sent_array.each do |x|
x.detect { |key, value|
if value == active_array[:asin1]
x[:price] << active_array[:price]
end
}
end
For reasons of both efficiency and readability, it makes sense to first construct a lookup hash on active_array:
h = active_array.each_with_object({}) { |g,h| h[g[:asin1]] = g[:price] }
#=> {"B00ND80WKY"=>39.99, "B00YSN9QOG"=>7.99, "B00HXZLB4E"=>10}
We now merely step through sent_array, updating the hashes:
sent_array.each { |g| g[:price] = h[g[:asin]] if h.key?(g[:asin]) }
#=> [{:sellersku=>"0421077128", :asin=>"B00ND80WKY", :price=>39.99},
# {:sellersku=>"0320248102", :asin=>"B00WTEF9FG"},
# {:sellersku=>"0324823180", :asin=>"B00HXZLB4E", :price=>10}]
Retrieving a key-value pair from a hash (h) is much faster, of course, than searching for a key-value pair in an array of hashes.
This does the trick. Iterate over your sent array and attempt to find a record in your active_array that has that :asin. If you find something, set the price and you are done.
Your code I believe used detect/find incorrectly. What you want out of that method is the hash that matches and then do something with that. You were trying to do everything inside of detect.
sent_array.each do |sent|
item = active_array.find{ |i| i.has_value? sent[:asin] }
sent[:price] = item[:price] if item
end
=> [{:sellersku=>"0421077128", :asin=>"B00ND80WKY", :price=>39.99}, {:sellersku=>"0320248102", :asin=>"B00WTEF9FG"}, {:sellersku=>"0324823180", :asin=>"B00HXZLB4E", :price=>10}]
I am assuming second element of both sent_array and active_array has B00WTEF9FG as asin and asin1 respectively. (seeing your final result)
Now:
a = active_array.group_by{|a| a[:asin1]}
b = sent_array.group_by{|a| a[:asin]}
a.map { |k,v|
v[0].merge(b[k][0])
}
# => [{:price=>39.99, :asin1=>"B00ND80WKY", :sellersku=>"0421077128", :asin=>"B00ND80WKY"}, {:price=>7.99, :asin1=>"B00WTEF9FG", :sellersku=>"0320248102", :asin=>"B00WTEF9FG"}, {:price=>10, :asin1=>"B00HXZLB4E", :sellersku=>"0324823180", :asin=>"B00HXZLB4E"}]
Why were you getting TypeError?
You are doing active_array[:asin1]. Remember active_array itself is an Array. Unless you iterate over it, you cannot look for keys.
Another issue with your approach is, you are using Hash#detect
find is implemented in terms of each. And each, when called on a
Hash, returns key-value pairs in form of arrays with 2 elements
each. That's why find returns an array.
source
Same is true for detect. So x.detect { |key, value| .. } is not going to work as you are expecting it to.
Solution without assumption
a.map { |k,v|
b[k] ? v[0].merge(b[k][0]) : v[0]
}.compact
# => [{:price=>39.99, :asin1=>"B00ND80WKY", :sellersku=>"0421077128", :asin=>"B00ND80WKY"}, {:price=>7.99, :asin1=>"B00YSN9QOG"}, {:price=>10, :asin1=>"B00HXZLB4E", :sellersku=>"0324823180", :asin=>"B00HXZLB4E"}]
Here since asin1 => "B00ND80WKY" has no match, it cannot get sellersku from other hash.

Modify hashes in an array based on another array

I have two arrays like this:
a = [{'one'=>1, 'two'=>2},{'uno'=>1, 'dos'=>2}]
b = ['english', 'spanish']
I need to add a key-value pair to each hash in a to get this:
a = [{'one'=>1, 'two'=>2, 'language'=>'english'},{'uno'=>1, 'dos'=>2, 'language'=>'spanish'}]
I attempted this:
(0..a.length).each {|c| a[c]['language']=b[c]}
and it does not work. With this:
a[1]['language']=b[1]
(0..a.length).each {|c| puts c}
an error is shown:
NoMethodError (undefined method '[]=' for nil:NilClass)
How can I fix this?
a.zip(b){|h, v| h["language"] = v}
a # => [
# {"one"=>1, "two"=>2, "language"=>"english"},
# {"uno"=>1, "dos"=>2, "language"=>"spanish"}
# ]
When the each iterator over your Range reaches the last element (i.e. a.length), you will attempt to access a nonexisting element of a.
In your example, a.length is 2, so on the last iteration of your each, you will attempt to access a[2], which doesn't exist. (a only contains 2 elements wich indices 0 and 1.) a[2] evaluates to nil, so you will now attempt to call nil['language']=b[2], which is syntactic sugar for nil.[]=('language', b[2]), and since nil doesn't have a []= method, you get a NoMethodError.
The immediate fix is to not iterate off the end of a, by using an exclusive Range:
(0...a.length).each {|c| a[c]['language'] = b[c] }
By the way, the code you posted:
(0..a.length).each {|c| puts c }
should clearly have shown you that you iterate till 2 instead of 1.
That's only the immediate fix, however. The real fix is to simply never iterate over a datastructure manually. That's what iterators are for.
Something like this, where Ruby will keep track of the index for you:
a.each_with_index do |hsh, i| hsh['language'] = b[i] end
Or, without fiddling with indices at all:
a.zip(b.zip(['language'].cycle).map(&:reverse).map(&Array.method(:[])).map(&:to_h)).map {|x, y| x.merge!(y) }
[Note: this last one doesn't mutate the original Arrays and Hashes unlike the other ones.]
The problem you're having is that your (0..a.length) is inclusive. a.length = 2 so you want to modify it to be 0...a.length which is exclusive.
On a side note, you could use Array#each_with_index like this so you don't have to worry about the length and so on.
a.each_with_index do |hash, index|
hash['language'] = b[index]
end
Here is another method you could use
b.each_with_index.with_object(a) do |(lang,i),obj|
obj[i]["language"] = lang
obj
end
#=>[
{"one"=>1, "two"=>2, "language"=>"english"},
{"uno"=>1, "dos"=>2, "language"=>"spanish"}
]
What this does is creates an Enumerator for b with [element,index] then it calls with_object using a as the object. It then iterates over the Enumerator passing in each language and its index along with the a object. It then uses the index from b to find the proper index in a and adds a language key to the hash that is equal to the language.
Please know this is a destructive method where the objects in a will mutate during the process. You could make it non destructive using with_object(a.map(&:dup)) this will dup the hashes in a and the originals will remain untouched.
All that being said I think YAML would be better suited for a task like this but I am not sure what your constraints are. As an example:
yml = <<YML
-
one: 1
two: 2
language: "english"
-
uno: 1
dos: 2
language: "spanish"
YML
require 'yaml'
YAML.load(yml)
#=>[
{"one"=>1, "two"=>2, "language"=>"english"},
{"uno"=>1, "dos"=>2, "language"=>"spanish"}
]
Although using YAML I would change the structure for numbers to be more like language => Array of numbers by index e.g. {"english" => ["zero","one","two"]}. That way you can can access them like ["english"][0] #=> "zero"

Resources