Compare two array of hashes with same keys - arrays

I have 2 arrays of hashes with same keys but different values.
A = [{:a=>1, :b=>4, :c=>2},{:a=>2, :b=>1, :c=>3}]
B = [{:a=>1, :b=>1, :c=>2},{:a=>1, :b=>3, :c=>3}]
I'm trying to compare 1st hash in A with 1st hash in B and so on using their keys and identify which key and which value is not matching if they do not match. please help.
A.each_key do |key|
if A[key] == B[key]
puts "#{key} match"
else
puts "#{key} dont match"

I am not certain which comparisons you want to make, so I will show ways of answering different questions. You want to make pairwise comparisons of two arrays of hashes, but that's really no more difficult than just comparing two hashes, as I will show later. For now, suppose you merely want to compare two hashes:
h1 = {:a=>1, :b=>4, :c=>2, :d=>3 }
h2 = {:a=>1, :b=>1, :c=>2, :e=>5 }
What keys are in h1 or h2 (or both)?
h1.keys | h2.keys
#=> [:a, :b, :c, :d, :e]
See Array#|.
What keys are in both hashes?
h1.keys & h2.keys
#=> [:a, :b, :c]
See Array#&.
What keys are in h1 but not h2?
h1.keys - h2.keys
#=> [:d]
See Array#-.
What keys are in h2 but not h1?
h2.keys - h1.keys #=> [:e]
What keys are in one hash only?
(h1.keys - h2.keys) | (h2.keys - h1.keys)
#=> [:d, :e]
or
(h1.keys | h2.keys) - (h1.keys & h2.keys)
What keys are in both hashes and have the same values in both hashes?
(h1.keys & h2.keys).select { |k| h1[k] == h2[k] }
#=> [:a, :c]
See Array#select.
What keys are in both hashes and have different values in the two hashes?
(h1.keys & h2.keys).reject { |k| h1[k] == h2[k] }
#=> [:b]
Suppose now we had two arrays of hashes:
a1 = [{:a=>1, :b=>4, :c=>2, :d=>3 }, {:a=>2, :b=>1, :c=>3, :d=>4}]
a2 = [{:a=>1, :b=>1, :c=>2, :e=>5 }, {:a=>1, :b=>3, :c=>3, :e=> 6}]
and wished to compare the hashes pairwise. To do that first take the computation of interest above and wrap it in a method. For example:
def keys_in_both_with_different_values(h1, h2)
(h1.keys & h2.keys).reject { |k| h1[k] == h2[k] }
end
Then write:
a1.zip(a2).map { |h1,h2| keys_in_both_with_different_values(h1, h2) }
#=> [[:b], [:a, :b]]
See Enumerable#zip.

Since you're comparing elements of arrays...
A.each_with_index do |hasha, index|
hashb = B[index]
hasha.each_key do |key|
if hasha[key] == hashb[key]
puts "in array #{index} the key #{key} matches"
else
puts "in array #{index} the key #{key} doesn't match"
end
end
end
edit - added a missing end!

When you are dealing with an array, you should reference an element with open-close bracket '[]' as in
A[index at which lies the element you are looking for]
If you want to access an element in a hash, you want to use open-close bracket with the corresponding key in it, as in
A[:a]
(referencing the value that corresponds to the key ':a', which is of a type symbol.)
In this case, the arrays in question are such that hashes are nested within an array. So for example, the expression B[0][:c] will give 2.
To compare the 1st hash in A with the 1st hash in B, the 2nd hash in A with the second hash in B and so forth, you can use each_with_index method on an Array object ,like so;
A = [{:a=>1, :b=>4, :c=>2},{:a=>2, :b=>1, :c=>3}]
B = [{:a=>1, :b=>1, :c=>2},{:a=>1, :b=>3, :c=>3}]
sym = [:a, :b, :c]
A.each_with_index do |hash_a, idx_a|
sym.each do |sym|
if A[idx_a][sym] == B[idx_a][sym]
puts "Match found! (key -- :#{sym}, value -- #{A[idx_a][sym]})"
else
puts "No match here."
end
end
end
which is checking the values based on the keys, which are symbols, in the following order; :a -> :b -> :c -> :a -> :b -> :c
This will print out;
Match found! (key -- :a, value -- 1)
No match here.
Match found! (key -- :c, value -- 2)
No match here.
No match here.
Match found! (key -- :c, value -- 3)
The method each_with_index may look a little bit cryptic if you are not familiar with it.
If you are uncomfortable with it you might want to check;
http://apidock.com/ruby/Enumerable/each_with_index
Last but not least, don't forget to add 'end'(s) at the end of a block (i.e. the code between do/end) and if statement in your code.
I hope it helps.

Related

Why is the method #each_index non existing for hashes in Ruby?

Why can I use #each and #each_with_index on both arrays and hashes, but the #each_index method can be used only on an array and not on a hash?
each_index
Iterates over array indexes.
Hashes don't have indexes. Hashes have keys. Use each_key.
Why do Hashes have each_with_index? Because that's part of the Enumerable module which both Hashes and Arrays include. Enumerable methods use shared terminology. Hashes and Arrays predate the Enumerable module, Array#each_index (sometime before v1.0r2) came before Enumerable#each_with_index (Feb 5th, 1998), so there is some overlap between Array, Hash, and Enumerable for backwards compatibility.
Hashes don't have indexes. The order of elements in a Hash is unstable and can change. You can iterate over keys and values with each.
hash = { a: 1, b: 2 }
hash.each do |key, val|
puts "#{key.inspect} => #{val.inspect}"
end
#=> :a => 1
#=> :b => 2
If you really need to iterate over a hash with a counter of some sort you can just chain each with with_index like so:
hash = { a: 1, b: 2 }
hash.each.with_index do |key_val, i|
key, val = key_val
puts "#{i}| #{key.inspect} => #{val.inspect}"
end
#=> 0| :a => 1
#=> 1| :b => 2

Whats the 'Ruby way' to identify duplicate values in a hash?

I'm on Ruby 2.7.x on macOS Catalina.
I have up to 1 million key value tuples.
The keys are strings and are guaranteed unique.
The values are strings and may contain duplicates (or triplicates or more).
Given the uniqueness of the keys, it seems that a hash is a natural data structure for them.
So if I start with original_hash, containing all the key value tuples,
I'd like to end up with uniques_hash, containing all and only unique key value tuples,
and duplicates_hash, containing all the keys with duplicated values.
I am more interested in optimising for clarity and Ruby idiom than memory efficiency or speed - I don't expect to be running this code frequently, and I have plenty of RAM.
If I convert to two arrays, I can find the uniques in the values array - but how do I guarantee re-pairing with the correct key? And is that the right way to approach this problem?
Many thanks for any assistance!
Suppose
original_hash = {:a=>1, :b=>2, :c=>2, :d=>2, :e=>3, :f=>4, :g=>4}
If you were only interested in returning a hash uniques_hash that contained unique values, you could write the following.
uniques_hash = original_hash.invert.invert
#=> {:a=>1, :d=>2, :e=>3, :g=>4}
the intermediate step being
original_hash.invert
#=> {1=>:a, 2=>:d, 3=>:e, 4=>:g}
See Hash#invert. Note that uniques_hash, as defined, is not itself unique. It could be any of the following.
{:a=>1, :b=>2, :e=>3, :f=>4}
{:a=>1, :b=>2, :e=>3, :g=>4}
{:a=>1, :c=>2, :e=>3, :f=>4}
{:a=>1, :c=>2, :e=>3, :g=>4}
{:a=>1, :d=>2, :e=>3, :f=>4}
{:a=>1, :d=>2, :e=>3, :g=>4}
Another way of doing this is to use Enumerable#uniq and Array#to_h.
unique_hash = original_hash.uniq(&:last).to_h
#=> {:a=>1, :b=>2, :e=>3, :f=>4}
the intermediate calculation being
original_hash.uniq(&:last)
#=> [[:a, 1], [:b, 2], [:e, 3], [:f, 4]]
which is shorthand for
original_hash.uniq { |_k,v| v }
Presumably, each key of duplicates_hash is a value in original_hash and the value of that key is an array of those keys k in original_hash for which original_hash[k] == v.
One way to compute duplicates_hash is as follows.
duplicates_hash = original_hash.each_with_object({}) do |(k,v),h|
h[v] = (h[v] || []) << k
end
#=> {1=>[:a], 2=>[:b, :c, :d], 3=>[:e], 4=>[:f, :g]}
This can also be written
duplicates_hash = original_hash.
each_with_object(Hash.new { |h,k| h[k] = [] }) { |(k,v),h| h[v] << k }
#=> {1=>[:a], 2=>[:b, :c, :d], 3=>[:e], 4=>[:f, :g]}
See Hash::new. Both forms are equivalent to
duplicates_hash = original_hash.each_with_object({}) do |(k,v),h|
h[v] = [] unless h.key?(v)
h[v] << k
end
Writing the block variables as |(k,v),h| makes use of array decomposition.
We have an enumerator that will generate values and pass them to its block.
enum = original_hash.each_with_object({})
#=> #<Enumerator: {:a=>1, :b=>2, :c=>2, :d=>2, :e=>3, :f=>4, :g=>4}:
# each_with_object({})>
Enumerators are instances of the class Enumerator.
The first value of the enumerator is generated and the block variables are assigned values like so:
(k,v),h = enum.next
#=> [[:a, 1], {}]
Array decomposition is seen to split this array of two elements as follows:
k #=> :a
v #=> 1
h #=> {}
Notice how the parentheses on the left correspond to the inner brackets on the right. The block calculation is then performed using these variables.
h[v] = (h[v] || []) << k
#=> [:a]
Now,
h #=> {1=>[:a]}
The next value is then generated by the enumerator and the block calculation is performed.
(k,v),h = enum.next
#=> [[:b, 2], {1=>[:a]}]
k #=> :b
v #=> 2
h #=> {1=>[:a]}
h[v] = (h[v] || []) << k
so now
h #=> {1=>[:a], 2=>[:b]}
This continues until
enum.next
#=> Stop Interation (exception)
causing Ruby to return the value of h.
Note that by computing duplicates_hash first we could compute uniques_hash as follows.
keeper_keys = duplicates_hash.values.map(&:first)
#=> [:a, :b, :e, :f]
unique_keys = original_hash.slice(*keeper_keys)
#=> {:a=>1, :b=>2, :e=>3, :f=>4}
or
unique_keys = original_hash.slice(*duplicates_hash.values.map(&:first))
#=> {:a=>1, :b=>2, :e=>3, :f=>4}
See Hash#slice. If one feels guilty by favouring certain keys one could instead write
unique_keys = original_hash.slice(*duplicates_hash.values.map(&:sample))
#=> {:a=>1, :b=>2, :e=>3, :g=>4}
See Array#sample.
it might or might not be the best way to go about this, but I've used the "group_by" and "select" functions to get me a new hash that finds duplicates:
hash.group_by{|k,v| v}.select{|k,v| v.count > 1}
in this case, the returned hash will look a bit like:
{value: [{key: value}, {key: value}]}
Count the values using group_by and store the results in a hash. Use that hash to partition the original hash like so:
h = {a: 1, b: 2, c: 2, d: 2, e: 3, f: 4, g: 4}
cnt = Hash[h.values.group_by{ |i| i }.map { |k, v| [k, v.count] }]
h_uniq, h_dups = h.partition{ |k, v| cnt[v] == 1 }.map(&:to_h)
puts cnt
# {1=>1, 2=>3, 3=>1, 4=>2}
puts h_uniq.inspect
# {:a=>1, :e=>3}
puts h_dups.inspect
# {:b=>2, :c=>2, :d=>2, :f=>4, :g=>4}
Thanks to all for the help on this! I thought it might help someone if I posted my draft code, and any comments /improvements are very welcome! (I'm in the process of refactoring Listing..)
N
#!/usr/bin/env ruby
# shebang to run the script from Terminal
# include shasum
require 'digest'
class Listing
# class of arrays of file listings
attr_reader :path
def initialize(path)
#path = path
Dir.chdir(#path)
#list_of_items = Dir['**/*']
#list_of_folders = []
#list_of_files = []
#list_of_extensions = []
#list_of_uniques = []
#list_of_duplicates = []
end
def to_s
"#{#path}"
end
def analyse_items
#list_of_items.each do |f|
if File.directory?(f)
#list_of_folders << f
else
#list_of_files << f
end
end
#list_of_folders.sort!
#list_of_files.sort!
#folder_count = #list_of_folders.count
#files_count = #list_of_files.count
#items_count = #list_of_items.count
#count_check = (#items_count -(#folder_count + #files_count))
# count_check should be zero
end
def identify_duplicates
# Given an array of filepaths, this method divides it into an array of unique files and an array of duplicated files.
source = {}
uniques = {}
#list_of_files.each do |f|
digest = Digest::SHA512.hexdigest File.read f
source.store(f, digest)
end
uniques = source.invert.invert
#list_of_uniques = uniques.keys
#list_of_duplicates = source.keys - uniques.keys
end
def tell_duplicates
puts "dupes = #{#list_of_duplicates}"
end
end
l = Listing.new("/Volumes/Things/Photos/")
l.analyse_items
l.identify_duplicates
l.tell_duplicates

Make an array from list where first element of the array is the first element in list

I am very new to ruby so excuse me if this is a basic question.
I have a list that looks like this:
[name, max, john, raj, sam]
And i want to make a new array that will look like this:
[[name, max], [name, john], [name, raj], [name, sam]]
Here is what i am trying to do:
row.xpath('td').each_with_index do |cell, index|
if index == 0
tarray['name'] << cell.text
else
tarray['values'] << cell.text
end
I know i am doing it wrong because when i have ['name'] it will not be logically possible to have ['values']
Please advice me on how i can achieve this using the best method.
You can take the first value (:name) and get its product with every remaning element by using Array#product:
array = [:name, :max, :john, :raj, :sam]
p [array[0]].product(array[1..-1])
# [[:name, :max], [:name, :john], [:name, :raj], [:name, :sam]]
Array#product is better, but you can also use Array#zip:
ary = [:a, :b, :c, :d]
([ary.first]*(ary.size-1)).zip ary[1..]
#=> [[:a, :b], [:a, :c], [:a, :d]]
Or also:
ary.then { |a, *rest| ([a] * rest.size).zip rest }
But yes, product is cleaner:
ary.then { |a, *rest| [a].product rest }

Ruby : Choosing between each, map, inject, each_with_index and each_with_object

When I started writing Ruby many years ago, it took me a while to understand the difference between each and map. It only got worse when I discovered all the other Enumerable and Array methods.
With the help of the official documentation and many StackOverflow questions, I slowly began to understand what those methods did.
Here is what took me even longer to understand though :
Why should I use one method or another?
Are there any guidelines?
I hope this question isn't a duplicate : I'm more interested in the "Why?" than the "What?" or "How?", and I think it could help Ruby newcomers.
A more tl;dr answer:
How to choose between each, map, inject, each_with_index and each_with_object?
Use #each when you want "generic" iteration and don't care about the result. Example - you have numbers, you want to print the absolute value of each individual number:
numbers.each { |number| puts number.abs }
Use #map when you want a new list, where each element is somehow formed by transforming the original elements. Example - you have numbers, you want to get their squares:
numbers.map { |number| number ** 2 }
Use #inject when you want to somehow reduce the entire list to one single value. Example - you have numbers, you want to get their sum:
numbers.inject(&:+)
Use #each_with_index in the same situation as #each, except you also want the index with each element:
numbers.each_with_index { |number, index| puts "Number #{number} is on #{index} position" }
Uses for #each_with_object are more limited. The most common case is if you need something similar to #inject, but want a new collection (as opposed to singular value), which is not a direct mapping of the original. Example - number histogram (frequencies):
numbers.each_with_object({}) { |number, histogram| histogram[number] = histogram[number].to_i.next }
Which object can I use?
First, the object you're working with should be an Array, a Hash, a Set, a Range or any other object that respond to each. If it doesn't, it might be converted to something that will. You cannot call each directly on a String for example, because you need to specify if you'd like to iterate over each byte, character or line.
"Hello World".respond_to?(:each)
#=> false
"Hello World".each_char.respond_to?(:each)
#=> true
I want to calculate something with each element, just like with a for loop in C or Java.
If you want to iterate over each element, do something with it and not modify the original object, you can use each. Please keep reading though, in order to know if you really should.
array = [1,2,3]
#NOTE: i is a bound variable, it could be replaced by anything else (x, n, element). It's a good idea to use a descriptive name if you can
array.each do |i|
puts "La"*i
end
#=> La
# LaLa
# LaLaLa
It is the most generic iteration method, and you could write any of the other mentioned methods with it. We will actually, for pedagogical purposes only. If you spot a similar pattern in your code, you could probably replace it with the corresponding method.
It is basically never wrong to use each, it is almost never the best choice though. It is verbose and not Ruby-ish.
Note that each returns the original object, but this is rarely (never?) used. The logic happens inside the block, and should not modify the original object.
The only time I use each is:
when no other method would do. The more I learn about Ruby, the less often it happens.
when I write a script for someone who doesn't know Ruby, has some programming experience (e.g. C, Fortran, VBA) and would like to understand my code.
I want to get an Array out of my String/Hash/Set/File/Range/ActiveRecord::Relation
Just call object.to_a.
(1..10).to_a
#=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
"Hello world".each_char.to_a
#=> ["H", "e", "l", "l", "o", " ", "w", "o", "r", "l", "d"]
{:a => 1, :b => 2}.to_a
#=> [[:a, 1], [:b, 2]]
Movie.all.to_a #NOTE: Probably very inefficient. Try to keep an ActiveRecord::Relation as Relation for as long as possible.
#=> [Citizen Kane, Trois couleurs: Rouge, The Grapes of Wrath, ....
Some methods described below (e.g. compact, uniq) are only defined for Arrays.
I want to get a modified Array based on the original object.
If you want to get an Array based on the original object, you can use map. The returned object will have the same size as the original one.
array = [1,2,3]
new_array = array.map do |i|
i**2
end
new_array
#=> [1, 4, 9]
#NOTE: map is often used in conjunction with other methods. Here is the corresponding one-liner, without creating a new variable :
array.map{|i| i**2}
#=> [1, 4, 9]
# EACH-equivalent (For pedagogical purposes only):
new_array = []
array.each do |i|
new_array << i**2
end
new_array
#=> [1, 4, 9]
The returned Array will not replace the original object.
This method is very widely used. It should be the first one you learn after each.
collect is a synonym of map. Make sure to use only one of both in your projects.
I want to get a modified Hash based on the original Hash.
If your original object is a Hash, map will return an Array anyway. If you want a Hash back :
hash = {a: 1, b: 2}
hash.map{|key, value| [key, value*2]}.to_h
#=> {:a=>2, :b=>4}
# EACH-equivalent
hash = {a: 1, b: 2}
new_hash = {}
hash.each do |key,value|
new_hash[key]=value*2
end
new_hash
#=> {:a=>2, :b=>4}
I want to filter some elements.
I want to remove nil elements
You can call compact. It will return a new Array without the nil elements.
array = [1,2,nil,4,5]
#NOTE: array.map{|i| i*2} Would raise a NoMethodError
array.compact
# => [1, 2, 4, 5]
# EACH-equivalent
new_array = []
array.each do |integer_or_nil|
new_array << integer_or_nil unless integer_or_nil.nil?
end
new_array
I want to write some logic to determine if an element should be kept in the new Array
You can use select or reject.
integers = (1..10)
integers.select{|i| i.even?}
# => [2, 4, 6, 8, 10]
integers.reject{|i| i.odd?}
# => [2, 4, 6, 8, 10]
# EACH-equivalent
new_array = []
integers.each do |i|
new_array << i if i.even?
end
new_array
I want to remove duplicate elements from your Array
You can use uniq :
letters = %w(a b a b c)
letters.uniq
#=> ["a", "b", "c"]
# EACH-equivalent
uniq_letters = []
letters.each do |letter|
uniq_letters << letter unless uniq_letters.include?(letter)
end
uniq_letters
#TODO: Add find/detect/any?/all?/count
#TODO: Add group_by/sort/sort_by
I want to iterate over all the elements while counting from 0 to n-1
You can use each_with_index :
letters = %w(a b c)
letters.each_with_index do |letter, i|
puts "Letter ##{i} : #{letter}"
end
#=> Letter #0 : a
# Letter #1 : b
# Letter #2 : c
#NOTE: There's a nice Ruby syntax if you want to use each_with_index with a Hash
hash = {:a=>1, :b=>2}
hash.each_with_index{|(key,value),i| puts "#{i} : #{key}->#{value}"}
# => 0 : a->1
# 1 : b->2
# EACH-equivalent
i = 0
letters.each do |letter|
puts "Letter ##{i} : #{letter}"
i+=1
end
each_with_index returns the original object.
I want to iterate over all the elements while setting a variable during each iteration and using it in the next iteration.
You can use inject :
gauss = (1..100)
gauss.inject{|sum, i| sum+i}
#=> 5050
#NOTE: You can specify a starting value with gauss.inject(0){|sum, i| sum+i}
# EACH-equivalent
sum = 0
gauss.each do |i|
sum = sum + i
end
puts sum
It returns the variable as defined by the last iteration.
reduce is a synonym. As with map/collect, choose one keyword and keep it.
I want to iterate over all the elements while keeping a variable available to each iteration.
You can use each_with_object :
letter_ids = (1..26)
letter_ids.each_with_object({}){|i,alphabet| alphabet[("a".ord+i-1).chr]=i}
#=> {"a"=>1, "b"=>2, "c"=>3, "d"=>4, "e"=>5, "f"=>6, "g"=>7, "h"=>8, "i"=>9, "j"=>10, "k"=>11, "l"=>12, "m"=>13, "n"=>14, "o"=>15, "p"=>16, "q"=>17, "r"=>18, "s"=>19, "t"=>20, "u"=>21, "v"=>22, "w"=>23, "x"=>24, "y"=>25, "z"=>26}
# EACH-equivalent
alphabet = {}
letter_ids.each do |i|
letter = ("a".ord+i-1).chr
alphabet[letter]=i
end
alphabet
It returns the variable as modified by the last iteration. Note that the order of the two block variables is reversed compared to inject.
If your variable is a Hash, you should probably prefer this method to inject, because h["a"]=1 returns 1, and it would require one more line in your inject block to return a Hash.
I want something that hasn't been mentioned yet.
Then it's probably okay to use each ;)
Notes :
It's a work in progress, and I would gladly appreciate any feedback. If it's interesting enough and fit in one page, I might extract a flowchart out of it.

Turning a multi-dimensional array into a hash without overwriting values

I have a multi-dimensional array such as:
array = [["stop", "halt"],["stop", "red"],["go", "green"],["go","fast"],["caution","yellow"]]
And I want to turn it into a hash like this:
hash = {"stop" => ["halt","red"], "go" => ["green","fast"], "caution" => "yellow"}
However, when I array.to_h , the values overwrite one another and I get:
hash = {"stop" => "red", "go" => "fast", "caution" => "yellow"}
How do I get the desired array?
This is one way. It uses Enumerable#each_with_object and the form of Hash#update (aka merge!) that employs a block to determine the values of keys that are present in both hashes being merged.
array << ["stop", "or I'll fire!"]
array.each_with_object({}) { |(f,l),h|
h.update(f=>l) { |_,ov,nv| ov.is_a?(Array) ? ov << nv : [ov, nv] } }
#=> {"stop"=>["halt", "red", "or I'll fire!"],
# "go"=>["green", "fast"],
# "caution"=>"yellow"}
The code is simplified if you want all values in the returned hash to be arrays (i.e., "caution"=>["yellow"]), which is generally more convenient for subsequent calculations:
array.each_with_object({}) { |(f,l),h| h.update(f=>[l]) {|_,ov,nv| ov+nv }}
#=> {"stop"=>["halt", "red", "or I'll fire!"],
# "go"=>["green", "fast"],
# "caution"=>["yellow"]}
One way to do it:
array.inject({}) {|r, (k, v)| r[k] &&= [*r[k], v]; r[k] ||= v; r }
That's pretty messy though. Written out, it looks like this:
def to_hash_with_duplicates(arr)
{}.tap do |r|
arr.each do |k, v|
r[k] &&= [*r[k], v] # key already present, turn into array and add value
r[k] ||= v # key not present, simply store value
end
end
end
Edit: Thinking a bit more, #cary-swoveland's update-with-block solution is better, because it handles nil and false values correctly.

Resources