Find difference between two arrays considering duplicates [duplicate] - arrays

[1,2,3,3] - [1,2,3] produces the empty array []. Is it possible to retain duplicates so it returns [3]?

I am so glad you asked. I would like to see such a method added to the class Array in some future version of Ruby, as I have found many uses for it:
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
A description of the method and links to some of its applications are given here.
By way of example:
a = [1,2,3,4,3,2,4,2]
b = [2,3,4,4,4]
a - b #=> [1]
a.difference b #=> [1,2,3,2]
Ruby v2.7 gave us the method Enumerable#tally, allowing us to replace the first line of the method with
h = other.tally

As far as I know, you can't do this with a built-in operation. Can't see anything in the ruby docs either. Simplest way to do this would be to extend the array class like this:
class Array
def difference(array2)
final_array = []
self.each do |item|
if array2.include?(item)
array2.delete_at(array2.find_index(item))
else
final_array << item
end
end
end
end
For all I know there's a more efficient way to do this, also

EDIT:
As suggested by user2864740 in question comments, using Array#slice! is a much more elegant solution
def arr_sub(a,b)
a = a.dup #if you want to preserve the original array
b.each {|del| a.slice!(a.index(del)) if a.include?(del) }
return a
end
Credit:
My original answer
def arr_sub(a,b)
b = b.each_with_object(Hash.new(0)){ |v,h| h[v] += 1 }
a = a.each_with_object([]) do |v, arr|
arr << v if b[v] < 1
b[v] -= 1
end
end
arr_sub([1,2,3,3],[1,2,3]) # a => [3]
arr_sub([1,2,3,3,4,4,4],[1,2,3,4,4]) # => [3, 4]
arr_sub([4,4,4,5,5,5,5],[4,4,5,5,5,5,6,6]) # => [4]

Related

How can I refactor this Ruby method to run faster?

The method below is supposed to take an array a and return the duplicated integer whose second index value is the lowest. The array will only include integers between 1 and a.length. With this example,
firstDuplicate([1,2,3,2,4,5,1])
the method returns 2.
def firstDuplicate(a)
num = 1
big_num_array = []
a.length.times do
num_array = []
if a.include?(num)
num_array.push(a.index(num))
a[a.index(num)] = "x"
if a.include?(num)
num_array.unshift(a.index(num))
num_array.push(num)
end
big_num_array.push(num_array) if num_array.length == 3
end
num += 1
end
if big_num_array.length > 0
big_num_array.sort![0][2]
else
-1
end
end
The code works, but seems longer than necessary and doesn't run fast enough. I am looking for ways to refactor this.
You could count the entries as you go and use Enumerable#find to stop iterating as soon as you find something again:
h = { }
a.find do |e|
h[e] = h[e].to_i + 1 # The `to_i` converts `nil` to zero without a bunch of noise.
h[e] == 2
end
You could also say:
h = Hash.new(0) # to auto-vivify with zeros
a.find do |e|
h[e] += 1
h[e] == 2
end
or use Hash#fetch with a default value:
h = { }
a.find do |e|
h[e] = h.fetch(e, 0) + 1
h[e] == 2
end
find will stop as soon as it finds an element that makes that block true so this should be reasonably efficient.
Here are two ways that could be done quite simply.
Use a set
require 'set'
def first_dup(arr)
st = Set.new
arr.find { |e| st.add?(e).nil? }
end
first_dup [1,2,3,2,4,5,4,1,4]
#=> 2
first_dup [1,2,3,4,5]
#=> nil
See Set#add?.
Use Array#difference
def first_dup(arr)
arr.difference(arr.uniq).first
end
first_dup [1,2,3,2,4,5,4,1,4]
#=> 2
first_dup [1,2,3,4,5]
#=> nil
I have found Array#difference to be sufficiently useful that I proposed it be added to the Ruby core (but it doesn't seem to be gaining traction). It is as follows:
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
As explained at the link, it differs from Array#- as follows:
a = [1,2,2,3,3,2,2]
b = [2,2,3]
a - b
#=> [1]
a.difference(b)
#=> [1,3,2,2]
That is, difference "removes" one 2in a for each 2 in b (similar for 3), preserving the order of what's left of a. a is not mutated, however.
The steps in the example given above for the present problem are as follows.
arr = [1,2,3,2,4,5,4,1,4]
a = arr.uniq
#=> [1,2,3,4,5]
b = arr.difference(a)
#=> [2, 4, 1, 4]
b.first
#=> 2
If you are looking for super performance, ruby is probably not a best language of choice. If you are looking for a readability, here you go:
[1,2,3,2,4,5,1].
map. # or each (less readable, probably faster)
with_index.
group_by(&:shift). # or group_by(&:first)
min_by { |v, a| a[1] && a[1].last || Float::INFINITY }.
first
#⇒ 2

Return unique values of an array without using `uniq`

For a challenge, I'm trying to return the unique values of an array without using uniq. This is what I have so far, which doesn't work:
def unique
unique_arr = []
input_arr.each do |word|
if word != unique_arr.last
unique_arr.push word
end
end
puts unique_arr
end
input = gets.chomp
input_arr = input.split.sort
input_arr.unique
My reasoning here was that if I sorted the array first before I iterated through it with each, I could push it to unique_arr without repetition being a possibility considering if it's a duplicate, the last value pushed would match it.
Am I tackling this the wrong way?
Yes, you are making at least two mistakes.
If you want to call it as input_arr.unique with input_arr being an array, then you have to define the method on Array. You have input_arr within your method body, which comes from nowhere.
puts in the last line of your code outputs to the terminal, but makes the method return nil, which makes it behave differently from uniq.
It can be fixed as:
class Array
def unique
unique_arr = []
each do |word|
unique_arr.push(word) unless unique_arr.last == word
end
unique_arr
end
end
A unique array? That sounds like a Set to me:
require 'set'
Set.new([1,2,3,2,3,4]).to_a
#=> [1,2,3,4]
Here's a concise way to do it that doesn't explicitly use functionality from another class but probably otherwise misses the point of the challenge:
class Array
def unique
group_by(&:itself).keys
end
end
I try this three options. Just for challenge
class Array
def unique
self.each_with_object({}) { |k, h| h[k] = k }.keys
end
def unique2
self.each_with_object([]) { |k, a| a << k unless a.include?(k) }
end
def unique3
arr = []
self.map { |k| arr << k unless arr.include?(k) }
arr
end
end
Here is one more way to do this:
uniques = a.each.with_object([]) {|el, arr| arr << el if not arr.include?(el)}
That's so easy if you see it this way:
a = [1,1,2,3,4]
h = Hash.new
a.each{|q| h[q] = q}
h.values
and this will return:
[1, 2, 3, 4]

Difference Between Arrays Preserving Duplicate Elements in Ruby

I'm quite new to Ruby, and was hoping to get the difference between two arrays.
I am aware of the usual method:
a = [...]
b = [...]
difference = (a-b)+(b-a)
But the problem is that this is computing the set difference, because in ruby, the statement (a-b) defines the set compliment of a, relative to b.
This means [1,2,2,3,4,5,5,5,5] - [5] = [1,2,2,3,4], because it takes out all of occurrences of 5 in the first set, not just one, behaving like a filter on the data.
I want it to remove differences only once, so for example, the difference of [1,2,2,3,4,5,5,5,5], and [5] should be [1,2,2,3,4,5,5,5], removing just one 5.
I could do this iteratively:
a = [...]
b = [...]
complimentAbyB = a.dup
complimentBbyA = b.dup
b.each do |bValue|
complimentAbyB.delete_at(complimentAbyB.index(bValue) || complimentAbyB.length)
end
a.each do |aValue|
complimentBbyA.delete_at(complimentBbyA.index(aValue) || complimentBbyA.length)
end
difference = complimentAbyB + complimentBbyA
But this seems awfully verbose and inefficient. I have to imagine there is a more elegant solution than this. So my question is basically, what is the most elegant way of finding the difference of two arrays, where if one array has more occurrences of a single element then the other, they will not all be removed?
I recently proposed that such a method, Ruby#difference, be added to Ruby's core. For your example, it would be written:
a = [1,2,2,3,4,5,5,5,5]
b = [5]
a.difference b
#=> [1,2,2,3,4,5,5,5]
The example I've often given is:
a = [3,1,2,3,4,3,2,2,4]
b = [2,3,4,4,3,4]
a.difference b
#=> [1, 3, 2, 2]
I first suggested this method in my answer here. There you will find an explanation and links to other SO questions where I proposed use of the method.
As shown at the links, the method could be written as follows:
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
.....
ha = a.group_by(&:itself).map{|k, v| [k, v.length]}.to_h
hb = b.group_by(&:itself).map{|k, v| [k, v.length]}.to_h
ha.merge(hb){|_, va, vb| (va - vb).abs}.inject([]){|a, (k, v)| a + [k] * v}
ha and hb are hashes with the element in the original array as the key and the number of occurrences as the value. The following merge puts them together and creates a hash whose value is the difference of the number of occurrences in the two arrays. inject converts that to an array that has each element repeated by the number given in the hash.
Another way:
ha = a.group_by(&:itself)
hb = b.group_by(&:itself)
ha.merge(hb){|k, va, vb| [k] * (va.length - vb.length).abs}.values.flatten

Ruby: Find index of next match in array, or find with offset

I want to find further matches after Array#find_index { |item| block } matches for the first time. How can I search for the index of the second match, third match, and so on?
In other words, I want the equivalent of the pos argument to Regexp#match(str, pos) for Array#find_index. Then I can maintain a current-position index to continue the search.
I cannot use Enumerable#find_all because I might modify the array between calls (in which case, I will also adjust my current-position index to reflect the modifications). I do not want to copy part of the array, as that would increase the computational complexity of my algorithm. I want to do this without copying the array:
new_pos = pos + array[pos..-1].find_index do |elem|
elem.matches_condition?
end
The following are different questions. They only ask the first match in the array, plus one:
https://stackoverflow.com/questions/11300886/ruby-how-to-find-the-next-match-in-an-array
https://stackoverflow.com/questions/4596517/ruby-find-next-in-array
The following question is closer, but still does not help me, because I need to process the first match before continuing to the next (and this way also conflicts with modification):
https://stackoverflow.com/questions/9925654/ruby-find-in-array-with-offset
A simpler way to do it is just:
new_pos = pos
while new_pos < array.size and not array[new_pos].matches_condition?
new_pos += 1
end
new_pos = nil if new_pos == array.size
In fact, I think this is probably better than my other answer, because it's harder to get wrong, and there's no chance of future shadowing problems being introduced from the surrounding code. However, it's still clumsy.
And if the condition is more complex, then you end up needing to do something like this:
new_pos = pos
# this check is only necessary if pos may be == array.size
if new_pos < array.size
prepare_for_condition
end
while new_pos < array.size and not array[new_pos].matches_condition?
new_pos += 1
if new_pos < array.size
prepare_for_condition
end
end
new_pos = nil if new_pos == array.size
Or, God forbid, a begin ... end while loop (although then you run into trouble with the initial value of new_pos):
new_pos = pos - 1
begin
new_pos += 1
if new_pos < array.size
prepare_for_condition
end
end while new_pos < array.size and not array[new_pos].matches_condition?
new_pos = nil if new_pos == array.size
This may seem horrible. However, supposing prepare_for_condition is something that keeps being tweaked in small ways. Those tweaks will eventually get refactored; however, by that time, the output of the refactored code will also end up getting tweaked in small ways that don't belong with the old refactored code, but do not yet seem to justify refactoring of their own - and so on. Occasionally, someone will forget to change both places. This may seem pathological; however, in programming, as we all know, the pathological case has a habit of occurring only too often.
Here is one way this can be done. We can define a new method in Array class that will allow us to find indexes that match a given condition. The condition can be specified as block that returns boolean.
The new method returns an Enumerator so that we get the benefit of many of the Enumerator methods such next, to_a, etc.
ary = [1,2,3,4,5,6]
class Array
def find_index_r(&block)
Enumerator.new do |yielder|
self.each_with_index{|i, j| yielder.yield j if block.call(i)}
end
end
end
e = ary.find_index_r { |r| r % 2 == 0 }
p e.to_a #=> [1, 3, 5]
p e.next
#=> 1
p e.next
#=> 3
ary[2]=10
p ary
#=> [1, 2, 10, 4, 5, 6]
p e.next
#=> 5
e.rewind
p e.next
#=> 1
p e.next
#=> 2
Note: I added a new method in Array class for demonstration purpose. Solution can be adapted easily to work without the monkey-patching
Of course, one way to do it would be:
new_pos = pos + (pos...array.size).find_index do |index|
elem = array[index]
elem.matches_condition?
end
However, this is clumsy and easy to get wrong. For example, you may forget to add pos. Also, you have to make sure elem isn't shadowing something. Both of these can lead to hard-to-trace bugs.
I find it hard to believe that an index argument to Array#find_index and Array#index still hasn't made it into the language. However, I notice Regexp#match(str,pos) wasn't there until version 1.9, which is equally surprising.
Suppose
arr = [9,1,4,1,9,36,25]
findees = [1,6,3,6,3,7]
proc = ->(n) { n**2 }
and for each element n in findees we want the index of the first unmatched element m of arr for which proc[n] == m. For example, if n=3, then proc[3] #==> 9, so the first matching index in arr would be 0. For the next n=3 in findees, the first unmatched match in arr is at index 4.
We can do this like so:
arr = [9,1,4,1,9,36,25]
findees = [1,6,3,6,3,7]
proc = ->(n) { n**2 }
h = arr.each_with_index.with_object(Hash.new { |h,k| h[k] = [] }) { |(n,i),h| h[n] << i }
#=> {9=>[0, 4], 1=>[1, 3], 4=>[2], 36=>[5], 25=>[6]}
findees.each_with_object([]) { |n,a| v=h[proc[n]]; a << v.shift if v }
#=> [1, 5, 0, nil, 4, nil]
We can generalize this into a handy Array method as follow:
class Array
def find_indices(*args)
h = each_with_index.with_object(Hash.new {|h,k| h[k] = []}) { |(n,i),h| h[n] << i }
args.each_with_object([]) { |n,a| v=h[yield n]; a << v.shift if v }
end
end
arr.find_indices(*findees) { |n| n**2 }
#=> [1, 5, 0, nil, 4, nil]
arr = [3,1,2,1,3,6,5]
findees = [1,6,3,6,3,7]
arr.find_indices(*findees, &:itself)
#=> [1, 5, 0, nil, 4, nil]
My approach is not much different from the others but perhaps packaged cleaner to be syntactically similar to Array#find_index . Here's the compact form.
def find_next_index(a,prior=nil)
(((prior||-1)+1)...a.length).find{|i| yield a[i]}
end
Here's a simple test case.
test_arr = %w(aa ab ac ad)
puts find_next_index(test_arr){|v| v.include?('a')}
puts find_next_index(test_arr,1){|v| v.include?('a')}
puts find_next_index(test_arr,3){|v| v.include?('a')}
# evaluates to:
# 0
# 2
# nil
And of course, with a slight rewrite you could monkey-patch it into the Array class

Counting matching elements in an array

Given two arrays of equal size, how can I find the number of matching elements disregarding the position?
For example:
[0,0,5] and [0,5,5] would return a match of 2 since there is one 0 and one 5 in common;
[1,0,0,3] and [0,0,1,4] would return a match of 3 since there are two matches of 0 and one match of 1;
[1,2,2,3] and [1,2,3,4] would return a match of 3.
I tried a number of ideas, but they all tend to get rather gnarly and convoluted. I'm guessing there is some nice Ruby idiom, or perhaps a regex that would be an elegant answer to this solution.
You can accomplish it with count:
a.count{|e| index = b.index(e) and b.delete_at index }
Demonstration
or with inject:
a.inject(0){|count, e| count + ((index = b.index(e) and b.delete_at index) ? 1 : 0)}
Demonstration
or with select and length (or it's alias – size):
a.select{|e| (index = b.index(e) and b.delete_at index)}.size
Demonstration
Results:
a, b = [0,0,5], [0,5,5] output: => 2;
a, b = [1,2,2,3], [1,2,3,4] output: => 3;
a, b = [1,0,0,3], [0,0,1,4] output => 3.
(arr1 & arr2).map { |i| [arr1.count(i), arr2.count(i)].min }.inject(0, &:+)
Here (arr1 & arr2) return list of uniq values that both arrays contain, arr.count(i) counts the number of items i in the array.
Another use for the mighty (and much needed) Array#difference, which I defined in my answer here. This method is similar to Array#-. The difference between the two methods is illustrated in the following example:
a = [1,2,3,4,3,2,4,2]
b = [2,3,4,4,4]
a - b #=> [1]
a.difference b #=> [1, 3, 2, 2]
For the present application:
def number_matches(a,b)
left_in_b = b
a.reduce(0) do |t,e|
if left_in_b.include?(e)
left_in_b = left_in_b.difference [e]
t+1
else
t
end
end
end
number_matches [0,0,5], [0,5,5] #=> 2
number_matches [1,0,0,3], [0,0,1,4] #=> 3
number_matches [1,0,0,3], [0,0,1,4] #=> 3
Using the multiset gem:
(Multiset.new(a) & Multiset.new(b)).size
Multiset is like Set, but allows duplicate values. & is the "set intersection" operator (return all things that are in both sets).
I don't think this is an ideal answer, because it's a bit complex, but...
def count(arr)
arr.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
end
def matches(a1, a2)
m = 0
a1_counts = count(a1)
a2_counts = count(a2)
a1_counts.each do |e, c|
m += [a1_counts, a2_counts].min
end
m
end
Basically, first write a method that creates a hash from an array of the number of times each element appears. Then, use those to sum up the smallest number of times each element appears in both arrays.

Resources