Most efficient way to count duplicated elements between two arrays

Most efficient way to count duplicated elements between two arrays - arrays

As part of a very basic program I am writing in Ruby, I am trying to find the total number of shared elements between two arrays of equal length, but
I need to include repeats.
My current example code for this situation is as follows:
array_a = ["B","A","A","A","B"]
array_b = ["A","B","A","B","B"]
counter = 0
array_a.each_index do |i|
array_a.sort[i] == array_b.sort[i]
counter += 1
end
end
puts counter
I want the return value of this comparison in this instance to be 4, and not 2, as the two arrays share 2 duplicate characters ("A" twice, and "B" twice). This seems to work, but I am wondering if there are any more efficient solutions for this issue. Specifically whether there are any methods you would suggest looking into. I spoke with someone who suggested a different method, inject, but I really don't understand how that applies and would like to understand. I did quite a bit of reading on uses for it, and it still isn't clear to me how it is appropriate. Thank you.
Looking at my code, I have realized that it doesn't seem to work for the situation that I am describing.

Allow me to reiterate and explain what I think the OP's original intent was:
Given arrays of equal size
array_a = ["B","A","A","A","B"]
array_b = ["A","B","A","B","B"]
We need to show the total number of matching pairs of elements between the two arrays. In other words, each B in array_a will "use up" a B in array_b, and the same will be true for each A. As there are two B's in array_a and three in array_b, this leaves us with a count of 2 for B, and following the same logic, 2 for A, for a sum of 4.
(array_a & array_b).map { |e| [array_a.count(e), array_b.count(e)].min }.reduce(:+)
If we get the intersection of the arrays with &, the result is a list of values that exist in both arrays. We then iterate over each match, and select the minimum number of times the element exists in either array --- this is the most number of times the element that can be "used". All that is left is to total the number of paired elements, with reduce(:+)
Changing array_a to ["B", "A", "A", "B", "B"] results in a total of 5, as there are now enough of B to exhaust the supply of B in array_b.

If I understand the question correctly, you could do the following.
Code
def count_shared(arr1, arr2)
arr1.group_by(&:itself).
merge(arr2.group_by(&:itself)) { |_,ov,nv| [ov.size, nv.size].min }.
values.
reduce(0) { |t,o| (o.is_a? Array) ? t : t + o }
end
Examples
arr1 = ["B","A","A","A","B"]
arr2 = ["A","B","A","B","B"]
count_shared(arr1, arr2)
#=> 4 (2 A's + 2 B's)
arr1 = ["B", "A", "C", "C", "A", "A", "B", "D", "E", "A"]
arr2 = ["C", "D", "F", "F", "A", "B", "A", "B", "B", "G"]
count_shared(arr1, arr2)
#=> 6 (2 A's + 2 B's + 1 C + 1 D + 0 E's + 0 F's + 0 G's)
Explanation
The steps are as follows for a slightly modified version of the first example.
arr1 = ["B","A","A","A","B","C","C"]
arr2 = ["A","B","A","B","B","D"]
First apply Enumerable#group_by to both arr1 and arr2:
h0 = arr1.group_by(&:itself)
#=> {"B"=>["B", "B"], "A"=>["A", "A", "A"], "C"=>["C", "C"]}
h1 = arr2.group_by(&:itself)
#=> {"A"=>["A", "A"], "B"=>["B", "B", "B"], "D"=>["D"]}
Prior to Ruby v.2.2, when Object#itself was introduced, you would have to write:
arr.group_by { |e| e }
Continuing,
h2 = h0.merge(h1) { |_,ov,nv| [ov.size, nv.size].min }
#=> {"B"=>2, "A"=>2, "C"=>["C", "C"], "D"=>["D"]}
I will return shortly to explain the above calculation.
a = h2.values
#=> [2, 2, ["C", "C"], ["D"]]
a.reduce(0) { |t,o| (o.is_a? Array) ? t : t + o }
#=> 4
Here Enumerable#reduce (aka inject) merely sums the values of a that are not arrays. The arrays correspond to elements of arr1 that do not appear in arr2 or vise-versa.
As promised, I will now explain how h2 is computed. I've used the form of Hash#merge that employs a block (here { |k,ov,nv| [ov.size, nv.size].min }) to compute the values of keys that are present in both hashes being merged. For example, when the first key-value pair of h1 ("A"=>["A", "A"]) is being merged into h0, since h0 also has a key "A", the array
["A", ["A", "A", "A"], ["A", "A"]]
is passed to the block and the three block variables are assigned values (using "parallel assignment", which is sometimes called "multiple assignment"):
k, ov, nv = ["A", ["A", "A", "A"], ["A", "A"]]
so we have
k #=> "A"
ov #=> ["A", "A", "A"]
nv #=> ["A", "A"]
k is the key, ov ("old value") is the value of "A" in h0 and nv ("new value") is the value of "A" in h1. The block calculation is
[ov.size, nv.size].min
#=> [3,2].min = 2
so the value of "A" is now 2.
Notice that the key, k, is not used in the block calculation (which is very common when using this form of merge). For that reason I've changed the block variable from k to _ (a legitimate local variable), both to reduce the chance of introducing a bug and to signal to the reader that the key is not used in the block. The other elements of h2 that use this block are computed similarly.
Another way
It would be quite simple if we had available an Array method I've proposed be added to the Ruby core:
array_a = ["B","A","A","A","B"]
array_b = ["A","B","A","B","B"]
array_a.size - (array_a.difference(array_b)).size
#=> 4
or
array_a.size - (array_b.difference(array_a)).size
#=> 4
I've cited other applications in my answer here.

This is a perfect job for Enumerable#zip and Enumerable#count:
array_a.zip(array_b).count do |a, b|
a == b
end
# => 2
The zip method pairs up elements, "zippering" them together, and the count method can take a block as to if the element should be counted.
The inject method is very powerful, but it's also the most low-level. Pretty much every other Enumerable method can be created with inject if you work at it, so it's quite flexible, but usually a more special-purpose method is better suited. It's still a useful tool if applied correctly.
In this case zip and count do a much better job and if you know what these methods do, this code is self explanatory.
Update:
If you need to count all overlapping letters regardless of order you need to do some grouping on them. Ruby on Rails provides the handy group_by method in ActiveSupport, but in pure Ruby you need to make your own.
Here's an approach that counts up all the unique letters, grouping them using chunk:
# Convert each array into a map like { "A" => 2, "B" => 3 }
# with a default count of 0.
counts = [ array_a, array_b ].collect do |a|
Hash.new(0).merge(
Hash[a.sort.chunk { |v| v }.collect { |k, a| [ k, a.length ] }]
)
end
# Iterate over one of the maps key by key and count the minimum
# overlap between the two.
counts[0].keys.inject(0) do |sum, key|
sum + [ counts[0][key], counts[1][key] ].min
end

Related

Ruby, Delete elements (slices) from a multidimensional array

(Pretty new with Ruby)
I can remove a block of elements from a single-dimensional array
array1D = Array.new(6){|i| i*i}
array1D.slice!(2,2) #=> [1, 16, 25]
len = array1D.length #=> 4
However,
Array(arrayd3d[0][0]).slice!(30000,8880)
on arrayd3d[1][1][38884],
I still get
len = array3D.length #=> 38884
1) What I'm doing wrong?
2) How can I delete the same block of elements (30000,8880) from all
arrayd3d[1..nDim1][1..nDim2]?

slice! returns the deleted object:
a = [ "a", "b", "c" ]
a.slice!(1) #=> "b"
a #=> ["a", "c"]
In general in ruby we prefer not to alter the original object unless we're looking for some particular performance gain (very very rare, eg maybe you want to reduce the memory consumption of a very large array before moving on).
That's the reason for the exclamation symbol (! aka bang) which usually indicates some destructive behaviour.
Please consider using the non-bang version instead.
Array.new(6){ |i| i*i }
y = array1D.slice(2,2)
or
def some_method(input_array)
input_array.slice(2,2)
end
x = Array.new(6){ |i| i*i }
y = some_method(x)
This way your code becomes more predictable as you're not altering the value of your arguments.

Need a `values_at_if ` method to map values in ruby

I have two arrays of the same size
response = ["N","N","Y","Y","Y"]
mapping = ["A","B","C","D","E"]
I need to select the elements in mapping whose corresponding value in response, i.e., the element with the same index, is "Y", as below
["C","D","E"]
It reminds me of the values_at method. My solution is as follows
def values_at_if(response, mapping)
result=[]
response.each_index |k|
if k=="Y"
result << mapping[k]
end
end
result
end
I am not happy with it. Please let me know of a way to do it efficiently.

Update
The simplest solution I can come up with is:
mapping.select.with_index {|_,i| response[i] == "Y"}
#=>["C", "D", "E"]
This will select all the elements in mapping where the corresponding index in response equals "Y"
TL;DR Other options include:
mapping.values_at(*response.each_with_index.select {|v,_| v == "Y"}.map(&:last))
mapping.zip(response).map {|k,v| k if v == "Y"}.compact
The first uses each with index which will be
[["N",0],["N",1],["Y",2],["Y",3],["Y",4]]
then we select the groups where the first element is "Y" and map the indexes to pass to values_at
The second version zips the mapping and the response together creating.
[["A", "N"], ["B", "N"], ["C", "Y"], ["D", "Y"], ["E", "Y"]]
Then map the first element only when the second element is "Y". the compact removes the nil values from the mapping.
There are a lot of other ways to accomplish this task if you have a look through the Enumerable module

I would go with
mapping.zip(response).select { |_, r| r == 'Y' }.map(&:first)
#=> ["C", "D", "E"]

Convert response to an Enumerator; each without a block does that. Then use it in the select block. If the block returns true then the item is selected:
response = ["N","N","Y","Y","Y"]
mapping = ["A","B","C","D","E"]
enum_resp = response.each
mapping.select{ enum_resp.next == "Y" } # =>["C", "D", "E"]
Note it would save memory if response would consist of false and trues, which also would make the comparison in the select block unnecessary.

response.each_with_object([]).with_index do |(v, arr), i|
arr << mapping[i] if v == 'Y'
end
Or
mapping.each_with_object([]).with_index do |(v, arr), i|
arr << v if response[i] == 'Y'
end

How do I find the index of any element from an array within another array?

I have an array:
["a", "b", "c", "d"]
How do I figure out the index of the first element of the above array to occur within a second array:
["next", "last", "d", "hello", "a"]
The index of the first element from the first array to occur within the above array would be 2; "d" belongs to the first array and occurs at position 2.

There's a couple of ways to do this, but the naive approach might work well enough to get going:
tests = ["a", "b", "c", "d"]
in_array = ["next", "last", "d", "hello", "a"]
in_array.each_with_index.find do |e, i|
tests.include?(e)
end
# => ["d", 2]
You can speed this up by making tests a Set which avoids a lot of O(N) lookups:
tests = Set.new([ ... ])
The same code will work with include? but that's now much faster on longer lists.

This approach, wrapped in a method, returns an array containing all indexes of common elements between two arrays.
def find_positions(original_array, look_up_array)
positions_array = []
original_array.each do |x|
if look_up_array.index(x) != nil
positions_array << look_up_array.index(x)
end
end
positions_array
# positions_array.first => for the first matched element
end
If you want only the first matched element you could return positions_array.first but this way you'll not avoid the extra lookups.
PS: you could also use #collect and avoid the extra array (positions_array)

You can iterate through the array you want to be compared and use .select or .find iterator method. .find will select the first element match in the arrays while .select will match all elements in the arrays. If you want to add the index in the selection you can add .each_with_index. '.index(a)' returns the element if present else it will return nil.
alphabet = %w(a b c d)
%w(next last d hello a).each_with_index.find {|a, _index| alphabet.index(a) }
=> ["d", 2]
%w(next last d hello a).each_with_index.select {|a, _index| alphabet.index(a) }[0]
=> ["d", 2]
# if you just need the index of the first match
%w(next last d hello a).index {|a| alphabet.index(a) }
=> 2

How can I generate a percentage for a regex string match in Ruby?

I'm trying to build a simple method to look at about 100 entries in a database for a last name and pull out all the ones that match above a specific percentage of letters. My current approach is:
Pull all 100 entries from the database into an array
Iterate through them while performing the following action
Split the last name into an array of letters
Subtract that array from another array that contains the letters for the name I am trying to match which leaves only the letters that weren't matched.
Take the size of the result and divide by the original size of the array from step 3 to get a percentage.
If the percentage is above a predefined threshold, push that database object into a results array.
This works, but I feel like there must be some cool ruby/regex/active record method of doing this more efficiently. I have googled quite a bit but can't find anything.

To comment on the merit of the measure you suggested would require speculation, which is out-of-bounds at SO. I therefore will merely demonstrate how you might implement your proposed approach.
Code
First define a helper method:
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
In short, if
a = [3,1,2,3,4,3,2,2,4]
b = [2,3,4,4,3,4]
then
a - b #=> [1]
whereas
a.difference(b) #=> [1, 3, 2, 2]
This method is elaborated in my answer to this SO question. I've found so many uses for it that I've proposed it be added to the Ruby Core.
The following method produces a hash whose keys are the elements of names (strings) and whose values are the fractions of the letters in the target string that are contained in each string in names.
def target_fractions(names, target)
target_arr = target.downcase.scan(/[a-z]/)
target_size = target_arr.size
names.each_with_object({}) do |s,h|
s_arr = s.downcase.scan(/[a-z]/)
target_remaining = target_arr.difference(s_arr)
h[s] = (target_size-target_remaining.size)/target_size.to_f
end
end
Example
target = "Jimmy S. Bond"
and the names you are comparing are given by
names = ["Jill Dandy", "Boomer Asad", "Josefine Simbad"]
then
target_fractions(names, target)
#=> {"Jill Dandy"=>0.5, "Boomer Asad"=>0.5, "Josefine Simbad"=>0.8}
Explanation
For the above values of names and target,
target_arr = target.downcase.scan(/[a-z]/)
#=> ["j", "i", "m", "m", "y", "s", "b", "o", "n", "d"]
target_size = target_arr.size
#=> 10
Now consider
s = "Jill Dandy"
h = {}
then
s_arr = s.downcase.scan(/[a-z]/)
#=> ["j", "i", "l", "l", "d", "a", "n", "d", "y"]
target_remaining = target_arr.difference(s_arr)
#=> ["m", "m", "s", "b", "o"]
h[s] = (target_size-target_remaining.size)/target_size.to_f
#=> (10-5)/10.0 => 0.5
h #=> {"Jill Dandy"=>0.5}
The calculations are similar for Boomer and Josefine.

Ruby on Rails - How to know how many time the same Object appears in Array using Active Record?

How to know how many times the same Object appears in Array?
I want to check how many times I found the object, like:
array = ['A','A','A','B','B','C','C','C','D']
So, A appeared three times, B twice, C three too, and only one for D.
I know that if I use "find_all", like:
array.find_all{ |e| array.count(e) > 1 }
I will get with answer
["A", "A", "A", "B", "B", "C", "C", "C"]
but, how I can count this? I want something like:
A = 3, B = 2, C = 3, D = 1.

You can use inject on the array to iterate over the array and pass a hash into each iteration to store data. So to retrieve the count of the array you gave you would do this:
array = ["A", "A", "A", "B", "B", "C", "C", "C"]
array.inject(Hash.new(0)) do |hash, array_item|
hash[array_item] += 1
hash # this will be passed into the next iteration as the hash parameter
end
=> {"A"=>3, "B"=>2, "C"=>3}
Passing in Hash.new(0) rather than {} will mean that the default value for each key when you first encounter it will be 0.