Counting matching elements in an array - arrays

Given two arrays of equal size, how can I find the number of matching elements disregarding the position?
For example:
[0,0,5] and [0,5,5] would return a match of 2 since there is one 0 and one 5 in common;
[1,0,0,3] and [0,0,1,4] would return a match of 3 since there are two matches of 0 and one match of 1;
[1,2,2,3] and [1,2,3,4] would return a match of 3.
I tried a number of ideas, but they all tend to get rather gnarly and convoluted. I'm guessing there is some nice Ruby idiom, or perhaps a regex that would be an elegant answer to this solution.

You can accomplish it with count:
a.count{|e| index = b.index(e) and b.delete_at index }
Demonstration
or with inject:
a.inject(0){|count, e| count + ((index = b.index(e) and b.delete_at index) ? 1 : 0)}
Demonstration
or with select and length (or it's alias – size):
a.select{|e| (index = b.index(e) and b.delete_at index)}.size
Demonstration
Results:
a, b = [0,0,5], [0,5,5] output: => 2;
a, b = [1,2,2,3], [1,2,3,4] output: => 3;
a, b = [1,0,0,3], [0,0,1,4] output => 3.

(arr1 & arr2).map { |i| [arr1.count(i), arr2.count(i)].min }.inject(0, &:+)
Here (arr1 & arr2) return list of uniq values that both arrays contain, arr.count(i) counts the number of items i in the array.

Another use for the mighty (and much needed) Array#difference, which I defined in my answer here. This method is similar to Array#-. The difference between the two methods is illustrated in the following example:
a = [1,2,3,4,3,2,4,2]
b = [2,3,4,4,4]
a - b #=> [1]
a.difference b #=> [1, 3, 2, 2]
For the present application:
def number_matches(a,b)
left_in_b = b
a.reduce(0) do |t,e|
if left_in_b.include?(e)
left_in_b = left_in_b.difference [e]
t+1
else
t
end
end
end
number_matches [0,0,5], [0,5,5] #=> 2
number_matches [1,0,0,3], [0,0,1,4] #=> 3
number_matches [1,0,0,3], [0,0,1,4] #=> 3

Using the multiset gem:
(Multiset.new(a) & Multiset.new(b)).size
Multiset is like Set, but allows duplicate values. & is the "set intersection" operator (return all things that are in both sets).

I don't think this is an ideal answer, because it's a bit complex, but...
def count(arr)
arr.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
end
def matches(a1, a2)
m = 0
a1_counts = count(a1)
a2_counts = count(a2)
a1_counts.each do |e, c|
m += [a1_counts, a2_counts].min
end
m
end
Basically, first write a method that creates a hash from an array of the number of times each element appears. Then, use those to sum up the smallest number of times each element appears in both arrays.

Related

which solution is the most performing and why in order to find the number of duplicates in a complex list?

I have the following arrays:
a = [1, 1, 1, 1, 3]
b = [2, 3, 2, 3, 3]
c = [1, 1, 1, 1, 3]
my goal is to calculate the amount of extra repetitions for each column.
Meaning in this case that [1,2,1] appears twice, meaning 1 duplicate, and likewise for [1,3,1]
so in total the amount of duplicates is 2, once for [1,2,1] and once for [1,3,1].
I have developed the following 2 solutions but I don't know to be honest which one is the most performing and why:
Solution 1:
sum = 0
zip = a.zip(b, c)
zip.group_by { |e| e}
.select { |_, value| value.size > 1 }
.each_value { |value| sum += (value.size - 1) }
return sum
Solution 2:
zip = a.zip(b, c)
hash = Hash.new(0)
zip.each { |e| hash.store(e, hash[e]+1) }
hash.each{|e, _| hash[e] -= 1}
return hash.sum {|e, _| hash[e] }
thanks in advance
Illustrating Bench-marking :
require 'benchmark'
v1 = [1, 1, 1, 1]
v2 = [2, 3, 2, 3]
v3 = [1, 1, 1, 1 ]
def sol_1(a,b,c)
sum = 0
zip = a.zip(b, c)
zip.group_by { |e| e}
.select { |_, value| value.size > 1 }
.each_value { |value| sum += (value.size - 1) }
return sum
end
def sol_2(a,b,c)
zip = a.zip(b, c)
hash = Hash.new(0)
zip.each { |e| hash.store(e, hash[e]+1) }
hash.each{|e, _| hash[e] -= 1}
return hash.sum {|e, _| hash[e] }
end
n=1_000
Benchmark.bmbm do |x|
x.report("sol_1"){n.times{sol_1(v1, v2, v3)} }
x.report("sol_2"){n.times{sol_2(v1, v2, v3)} }
end
Results in:
Rehearsal -----------------------------------------
sol_1 0.011076 0.000000 0.011076 ( 0.011091)
sol_2 0.012276 0.000000 0.012276 ( 0.012355)
-------------------------------- total: 0.023352sec
user system total real
sol_1 0.007206 0.000000 0.007206 ( 0.007212)
sol_2 0.011452 0.000000 0.011452 ( 0.011453)
So, just by reading it both solutions are very similar in approach. While I am not 100% sure what you mean by most performing, but I'll guess that you mean computational complexity of both solutions - so computational cost for large inputs. When there is a lot of columns, the only element of the solution that takes the time is iterating over the array of columns - everything else will take very little time in comparison.
So in first solution, you're iterating 3 times - once to group the columns, second to select ones with duplicates and then 3rd time to count the repetitions (however here, in the worse case scenario, the array you iterate over has at most N/2 elements). So, in total you have 2.5 iterations over array of columns.
In second solution, you're also iterating 3 times. Firstly, over the array of columns to count how many times they appear, then over the result (which in a worst case scenario has same amount of elements) to subtract one from each number and finally to sum the numbers - this gives roughly 3 iterations.
So, first solution might be just slightly more performant - however when dealing with complexity we look at the type of function ignoring the number in front of it - in this case both solutions are linear. Additionally, different methods are optimized in different way in ruby. So the only hope of determining which one is more performant would go with benchmarks - repeating those algorithms 100 times for (the same) 10000 columns takes 10.5s for first solution and 18s for the second solution.
Here is a slightly (20%) faster solution to #steenslag's benchmark:
require 'matrix'
def sol_3(matrix)
Matrix.
columns(matrix).
to_a.
each_with_object({}) { |e, a|
digest = e.hash
a[digest] = a[digest].nil? ? 1 : a[digest] + 1
}.sum { |_, v| v > 1 ? 1 : 0 }
end
user system total real
sol_1 0.006908 0.000008 0.006916 ( 0.006917)
sol_2 0.011866 0.000018 0.011884 ( 0.011902)
sol_3 0.005532 0.000008 0.005540 ( 0.005555)
Complete script: https://gist.github.com/jaredbeck/edc708df10fcc0267db80bf1c31c8298

Find difference between two arrays considering duplicates [duplicate]

[1,2,3,3] - [1,2,3] produces the empty array []. Is it possible to retain duplicates so it returns [3]?
I am so glad you asked. I would like to see such a method added to the class Array in some future version of Ruby, as I have found many uses for it:
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
A description of the method and links to some of its applications are given here.
By way of example:
a = [1,2,3,4,3,2,4,2]
b = [2,3,4,4,4]
a - b #=> [1]
a.difference b #=> [1,2,3,2]
Ruby v2.7 gave us the method Enumerable#tally, allowing us to replace the first line of the method with
h = other.tally
As far as I know, you can't do this with a built-in operation. Can't see anything in the ruby docs either. Simplest way to do this would be to extend the array class like this:
class Array
def difference(array2)
final_array = []
self.each do |item|
if array2.include?(item)
array2.delete_at(array2.find_index(item))
else
final_array << item
end
end
end
end
For all I know there's a more efficient way to do this, also
EDIT:
As suggested by user2864740 in question comments, using Array#slice! is a much more elegant solution
def arr_sub(a,b)
a = a.dup #if you want to preserve the original array
b.each {|del| a.slice!(a.index(del)) if a.include?(del) }
return a
end
Credit:
My original answer
def arr_sub(a,b)
b = b.each_with_object(Hash.new(0)){ |v,h| h[v] += 1 }
a = a.each_with_object([]) do |v, arr|
arr << v if b[v] < 1
b[v] -= 1
end
end
arr_sub([1,2,3,3],[1,2,3]) # a => [3]
arr_sub([1,2,3,3,4,4,4],[1,2,3,4,4]) # => [3, 4]
arr_sub([4,4,4,5,5,5,5],[4,4,5,5,5,5,6,6]) # => [4]

Using memoization for storing values in ruby array

For a short array the following function works well. It's supposed to return the first array pair that whe sum is equal to a given integer. However, if the array has a length upwards of 10 million elements, the request times out, because (I think) is storing thousands of values in the variable I create in the first line. I know I have to use memoization (||=) but have no idea how to use it.
array1 = [1,2,3,4,5,6,7]
number = 3
array2 = [1,2,3.....n] # millions of elements
combos = array1.combination(2).to_a
(combos.select { |x,y| x + y == number }).sort.first
I need to gather all possible pairs to sort them, I'm using select to go through the entire list and not stop at the first pair that returns true.
This is one of the possible solutions.
def sum_pairs(ints, s)
seen = {}
for i in ints do
return [s-i, i] if seen[s-i]
seen[i] = true
end
nil
end
def find_smallest(arr, nbr)
first, *rest = arr.sort
until rest.empty?
matching = rest.bsearch { |n| n == nbr - first }
return [first, matching] unless matching.nil?
first, *rest = rest
end
nil
end
arr = [12, 7, 4, 5, 14, 9]
find_smallest(arr, 19) #=> [5, 14]
find_smallest(arr, 20) #=> nil
I've used the method Array#bsearch (rather than Enumerable#find to speed up the search for an element equal to nbr - first (O(log rest.size) vs. O(rest.size)).

Ruby - Efficient method of checking if sum of two numbers in array equal a value

Here's my problem: I have a list of 28,123 numbers I need to iterate through and an array of 6965 other numbers checking if the sum of two numbers (can be the same number) have equal value to each of the 28,123 numbers. I want to put them in a new array or mark them as true / false. Any solutions I've come up with so far are extremely inefficient.
So a dumbed-down version of what I want is if I have the following: array = [1, 2, 5] and the numbers 1 to 5 would return result = [2, 3, 4] or the array of result = [false, true, true, true, false]
I read this SE question: Check if the sum of two different numbers in an array equal a variable number? but I need something more efficient in my case it seems, or maybe a different approach to the problem. It also doesn't seem to work for two of the same number being added together.
Any help is much appreciated!
non_abundant(n) is a function that returns the first n non_abundant numbers. It executes almost instantaneously.
My Code:
def contains_pair?(array, n)
!!array.combination(2).detect { |a, b| a + b == n }
end
result = []
array = non_abundant(6965)
(1..28123).each do |n|
if array.index(n) == nil
index = array.length - 1
else
index = array.index(n)
end
puts n
if contains_pair?( array.take(index), n)
result << n
end
end
numbers = [1, 2, 5]
results = (1..10).to_a
numbers_set = numbers.each_with_object({}){ |i, h| h[i] = true }
results.select do |item|
numbers.detect do |num|
numbers_set[item - num]
end
end
#=> [2, 3, 4, 6, 7, 10]
You can add some optimizations by sorting your numbers and checking if num is bigger then item/2.
The complexity is O(n*m) where n and m are lengths of two lists.
Another optimization is if numbers list length is less then results list (n << m) you can achieve O(n*n) complexity by calculating all possible sums in numbers list first.
The most inefficient part of your algorithm is the fact that you are re-calculating many possible sums of combinations, 28123 times. You only need to do this once.
Here is a very simple improvement to your code:
array = non_abundant(6965)
combination_sums = array.combination(2).map {|comb| comb.inject(:+)}.uniq
result = (1..28123).select do |n|
combination_sums.include? n
end
The rest of your algorithm seems to be an attempt to compensate for that original performance mistake of re-calculating the sums - which is no longer needed.
There are further optimisations you could potentially make, such as using a binary search. But I'm guessing this improvement will already be sufficient for your needs.

Difference Between Arrays Preserving Duplicate Elements in Ruby

I'm quite new to Ruby, and was hoping to get the difference between two arrays.
I am aware of the usual method:
a = [...]
b = [...]
difference = (a-b)+(b-a)
But the problem is that this is computing the set difference, because in ruby, the statement (a-b) defines the set compliment of a, relative to b.
This means [1,2,2,3,4,5,5,5,5] - [5] = [1,2,2,3,4], because it takes out all of occurrences of 5 in the first set, not just one, behaving like a filter on the data.
I want it to remove differences only once, so for example, the difference of [1,2,2,3,4,5,5,5,5], and [5] should be [1,2,2,3,4,5,5,5], removing just one 5.
I could do this iteratively:
a = [...]
b = [...]
complimentAbyB = a.dup
complimentBbyA = b.dup
b.each do |bValue|
complimentAbyB.delete_at(complimentAbyB.index(bValue) || complimentAbyB.length)
end
a.each do |aValue|
complimentBbyA.delete_at(complimentBbyA.index(aValue) || complimentBbyA.length)
end
difference = complimentAbyB + complimentBbyA
But this seems awfully verbose and inefficient. I have to imagine there is a more elegant solution than this. So my question is basically, what is the most elegant way of finding the difference of two arrays, where if one array has more occurrences of a single element then the other, they will not all be removed?
I recently proposed that such a method, Ruby#difference, be added to Ruby's core. For your example, it would be written:
a = [1,2,2,3,4,5,5,5,5]
b = [5]
a.difference b
#=> [1,2,2,3,4,5,5,5]
The example I've often given is:
a = [3,1,2,3,4,3,2,2,4]
b = [2,3,4,4,3,4]
a.difference b
#=> [1, 3, 2, 2]
I first suggested this method in my answer here. There you will find an explanation and links to other SO questions where I proposed use of the method.
As shown at the links, the method could be written as follows:
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
.....
ha = a.group_by(&:itself).map{|k, v| [k, v.length]}.to_h
hb = b.group_by(&:itself).map{|k, v| [k, v.length]}.to_h
ha.merge(hb){|_, va, vb| (va - vb).abs}.inject([]){|a, (k, v)| a + [k] * v}
ha and hb are hashes with the element in the original array as the key and the number of occurrences as the value. The following merge puts them together and creates a hash whose value is the difference of the number of occurrences in the two arrays. inject converts that to an array that has each element repeated by the number given in the hash.
Another way:
ha = a.group_by(&:itself)
hb = b.group_by(&:itself)
ha.merge(hb){|k, va, vb| [k] * (va.length - vb.length).abs}.values.flatten

Resources