Finding the common value in N number of arrays - arrays

I have n number of arrays and I want to work out if there is a common value in these arrays. If I knew the number of arrays I could do something like:
a = [1,2,3]
b = [2,4,5]
c = [2,6,7]
x = a & b & c
x == [2]
However, this isn't possible if you don't know the number of arrays. So far I've come up with this:
array_of_integers = [[1,2,3],[2,4,5]....]
values = []
array_of_integers.each_with_index do |array, index|
values = if index.zero?
array
else
values & array
end
end
# `values` will be an array of common values
However, this doesn't seem very efficient. Is there a better way?

However, this isn't possible if you don't know the number of arrays.
Actually, Enumerable#reduce can help with it:
[[1,2,3], [2,4,5], [2,6,7]].reduce(&:&) # => [2]
&:& looks interesting, but it's just:
[[1,2,3], [2,4,5], [2,6,7]].reduce { |memo, el| memo & el } # => [2]
Or it's also possible to do it as #Jagdeep suggested:
[[1,2,3], [2,4,5], [2,6,7]].reduce(:&) # => [2]

Related

Concatenating a variable number of arrays in Julia

Suppose there is a variable number of 2D arrays which I want to concatenate into a 3D array:
n = 10 # Number of arrays, can be changed to other integers
arrays = Dict()
for i in 1:n
arrays[i] = rand(2,2)
end
The syntax for concatenating arrays, as far as I know, is:
cat(arr1, arr2, arr3, ..., dims=3)
Since the number of arguments is variable, I can only think of the solution:
3d_array = arr1
for i in 2:n
3d_array = cat(3d_array, arrays[i])
end
But how do I concatenate it in the direction dims=3 with one line only, without for loops, etc.?
given the initial code:
n = 10 #random positive integer
arrays = Dict()
for i in 1:n
arrays[i] = rand(2,2)
end
here are some options:
using cat with splatting:
res1 = cat(values(arrays)...,dims=3) #values(dict) gives an iterable of all values stored
using reduce with cat:
res2 = reduce((x,y)->cat(x,y,dims=3),values(arrays)) #using anonymous function to pass kwargs
Im gonna guess and assume that you also want that the following equality holds true:
arrays[i] == res[:,:,i] # for i in 1:n
there is a problem here, as Dicts are unordered, you can check on the display:
julia> arrays
Dict{Any,Any} with 10 entries:
7 => [0.586479 0.280905; 0.805592 0.737151]
4 => [0.0214868 0.340997; 0.191425 0.271359]
9 => [0.060134 0.939555; 0.0896634 0.455099]
10 => [0.990368 0.214775; 0.224519 0.767086]
2 => [0.578315 0.109518; 0.794717 0.0584819]
3 => [0.106458 0.287653; 0.523525 0.277063]
5 => [0.372227 0.151974; 0.921043 0.238088]
8 => [0.690332 0.14813; 0.771126 0.320432]
⋮ => ⋮
How to solve this? changing the iterator:
cat with ordered splatting:
res3 = cat((arrays[i] for i in 1:n)...,dims=3) #using iterator syntax to return ordered values
reduce with ordered cat:
res4 = reduce((x,y)->cat(x,y,dims=3),(arrays[i] for i in 1:n))
at last, not asked, but my favorite, using broadcasting syntax to put those values on an prealocated array:
res5 = zeros(eltype(arrays[1]),2,2,n) #if you know the size beforehand
res5 = zeros(eltype(arrays[1]),size(arrays[1])...,n) #if you dont know
for i in 1:n
res5[:,:,i] .= arrays[i]
end
You use reduce. The syntax is
reduce((x,y) -> cat(x,y,dims = 3), arrays)

Checking if an element of an array is included in another array

I have arrays:
a = [1,3,4,5]
b = [1,2,3]
Is there any short way to check as follows?
a.include? b
It should return true as 3 is there.
We can do:
b.each do |bb|
puts true if a.include? bb
end
but this is not a good way to iterate over a big array. Or:
c = [2,4]
a.include? b
should return true without iteration.
You could intersect the arrays. If the intersection is non-empty, the arrays have common elements:
a = [1,2,3,4]
b = [2,4]
(a & b).any? # true
!(a & b).empty? # => true
This is quite efficient as it uses a temporary hash under the hood.
I Hope you may be want something like intersect kind of thing used in Set
require 'set'
Set[1,3,4,5].intersect? Set[1,2,3] # => true
Here it is
If I subtract b - a and a has some elements that b has too, then the new array of b - a has a smaller number of elements, because b - a returns all elements, which b has but a has not. I can check the result against the original size.
a = [1,3,4,5]
b = [1,2,3]
b.size > (b - a).size
# => true
You can use array intersection:
a = [1,2,3,4]
b = [2,4]
c = [5,6]
It gives following results:
(a & b).any?
# true
(a & c).any?
# false

Difference Between Arrays Preserving Duplicate Elements in Ruby

I'm quite new to Ruby, and was hoping to get the difference between two arrays.
I am aware of the usual method:
a = [...]
b = [...]
difference = (a-b)+(b-a)
But the problem is that this is computing the set difference, because in ruby, the statement (a-b) defines the set compliment of a, relative to b.
This means [1,2,2,3,4,5,5,5,5] - [5] = [1,2,2,3,4], because it takes out all of occurrences of 5 in the first set, not just one, behaving like a filter on the data.
I want it to remove differences only once, so for example, the difference of [1,2,2,3,4,5,5,5,5], and [5] should be [1,2,2,3,4,5,5,5], removing just one 5.
I could do this iteratively:
a = [...]
b = [...]
complimentAbyB = a.dup
complimentBbyA = b.dup
b.each do |bValue|
complimentAbyB.delete_at(complimentAbyB.index(bValue) || complimentAbyB.length)
end
a.each do |aValue|
complimentBbyA.delete_at(complimentBbyA.index(aValue) || complimentBbyA.length)
end
difference = complimentAbyB + complimentBbyA
But this seems awfully verbose and inefficient. I have to imagine there is a more elegant solution than this. So my question is basically, what is the most elegant way of finding the difference of two arrays, where if one array has more occurrences of a single element then the other, they will not all be removed?
I recently proposed that such a method, Ruby#difference, be added to Ruby's core. For your example, it would be written:
a = [1,2,2,3,4,5,5,5,5]
b = [5]
a.difference b
#=> [1,2,2,3,4,5,5,5]
The example I've often given is:
a = [3,1,2,3,4,3,2,2,4]
b = [2,3,4,4,3,4]
a.difference b
#=> [1, 3, 2, 2]
I first suggested this method in my answer here. There you will find an explanation and links to other SO questions where I proposed use of the method.
As shown at the links, the method could be written as follows:
class Array
def difference(other)
h = other.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
reject { |e| h[e] > 0 && h[e] -= 1 }
end
end
.....
ha = a.group_by(&:itself).map{|k, v| [k, v.length]}.to_h
hb = b.group_by(&:itself).map{|k, v| [k, v.length]}.to_h
ha.merge(hb){|_, va, vb| (va - vb).abs}.inject([]){|a, (k, v)| a + [k] * v}
ha and hb are hashes with the element in the original array as the key and the number of occurrences as the value. The following merge puts them together and creates a hash whose value is the difference of the number of occurrences in the two arrays. inject converts that to an array that has each element repeated by the number given in the hash.
Another way:
ha = a.group_by(&:itself)
hb = b.group_by(&:itself)
ha.merge(hb){|k, va, vb| [k] * (va.length - vb.length).abs}.values.flatten

Counting matching elements in an array

Given two arrays of equal size, how can I find the number of matching elements disregarding the position?
For example:
[0,0,5] and [0,5,5] would return a match of 2 since there is one 0 and one 5 in common;
[1,0,0,3] and [0,0,1,4] would return a match of 3 since there are two matches of 0 and one match of 1;
[1,2,2,3] and [1,2,3,4] would return a match of 3.
I tried a number of ideas, but they all tend to get rather gnarly and convoluted. I'm guessing there is some nice Ruby idiom, or perhaps a regex that would be an elegant answer to this solution.
You can accomplish it with count:
a.count{|e| index = b.index(e) and b.delete_at index }
Demonstration
or with inject:
a.inject(0){|count, e| count + ((index = b.index(e) and b.delete_at index) ? 1 : 0)}
Demonstration
or with select and length (or it's alias – size):
a.select{|e| (index = b.index(e) and b.delete_at index)}.size
Demonstration
Results:
a, b = [0,0,5], [0,5,5] output: => 2;
a, b = [1,2,2,3], [1,2,3,4] output: => 3;
a, b = [1,0,0,3], [0,0,1,4] output => 3.
(arr1 & arr2).map { |i| [arr1.count(i), arr2.count(i)].min }.inject(0, &:+)
Here (arr1 & arr2) return list of uniq values that both arrays contain, arr.count(i) counts the number of items i in the array.
Another use for the mighty (and much needed) Array#difference, which I defined in my answer here. This method is similar to Array#-. The difference between the two methods is illustrated in the following example:
a = [1,2,3,4,3,2,4,2]
b = [2,3,4,4,4]
a - b #=> [1]
a.difference b #=> [1, 3, 2, 2]
For the present application:
def number_matches(a,b)
left_in_b = b
a.reduce(0) do |t,e|
if left_in_b.include?(e)
left_in_b = left_in_b.difference [e]
t+1
else
t
end
end
end
number_matches [0,0,5], [0,5,5] #=> 2
number_matches [1,0,0,3], [0,0,1,4] #=> 3
number_matches [1,0,0,3], [0,0,1,4] #=> 3
Using the multiset gem:
(Multiset.new(a) & Multiset.new(b)).size
Multiset is like Set, but allows duplicate values. & is the "set intersection" operator (return all things that are in both sets).
I don't think this is an ideal answer, because it's a bit complex, but...
def count(arr)
arr.each_with_object(Hash.new(0)) { |e,h| h[e] += 1 }
end
def matches(a1, a2)
m = 0
a1_counts = count(a1)
a2_counts = count(a2)
a1_counts.each do |e, c|
m += [a1_counts, a2_counts].min
end
m
end
Basically, first write a method that creates a hash from an array of the number of times each element appears. Then, use those to sum up the smallest number of times each element appears in both arrays.

Convert an array into an index hash in Ruby

I have an array, and I want to make a hash so I can quickly ask "is X in the array?".
In perl, there is an easy (and fast) way to do this:
my #array = qw( 1 2 3 );
my %hash;
#hash{#array} = undef;
This generates a hash that looks like:
{
1 => undef,
2 => undef,
3 => undef,
}
The best I've come up with in Ruby is:
array = [1, 2, 3]
hash = Hash[array.map {|x| [x, nil]}]
which gives:
{1=>nil, 2=>nil, 3=>nil}
Is there a better Ruby way?
EDIT 1
No, Array.include? is not a good idea. Its slow. It does a query in O(n) instead of O(1). My example array had three elements for brevity; assume the actual one has a million elements. Let's do a little benchmarking:
#!/usr/bin/ruby -w
require 'benchmark'
array = (1..1_000_000).to_a
hash = Hash[array.map {|x| [x, nil]}]
Benchmark.bm(15) do |x|
x.report("Array.include?") { 1000.times { array.include?(500_000) } }
x.report("Hash.include?") { 1000.times { hash.include?(500_000) } }
end
Produces:
user system total real
Array.include? 46.190000 0.160000 46.350000 ( 46.593477)
Hash.include? 0.000000 0.000000 0.000000 ( 0.000523)
If all you need the hash for is membership, consider using a Set:
Set
Set implements a collection of unordered values with no
duplicates. This is a hybrid of Array's intuitive inter-operation
facilities and Hash's fast lookup.
Set is easy to use with Enumerable objects (implementing
each). Most of the initializer methods and binary operators accept
generic Enumerable objects besides sets and arrays. An
Enumerable object can be converted to Set using the
to_set method.
Set uses Hash as storage, so you must note the following points:
Equality of elements is determined according to Object#eql? and Object#hash.
Set assumes that the identity of each element does not change while it is stored. Modifying an element of a set will render the set to an
unreliable state.
When a string is to be stored, a frozen copy of the string is stored instead unless the original string is already frozen.
Comparison
The comparison operators <, >, <= and >= are implemented as
shorthand for the {proper_,}{subset?,superset?} methods. However, the
<=> operator is intentionally left out because not every pair of
sets is comparable. ({x,y} vs. {x,z} for example)
Example
require 'set'
s1 = Set.new [1, 2] # -> #<Set: {1, 2}>
s2 = [1, 2].to_set # -> #<Set: {1, 2}>
s1 == s2 # -> true
s1.add("foo") # -> #<Set: {1, 2, "foo"}>
s1.merge([2, 6]) # -> #<Set: {1, 2, "foo", 6}>
s1.subset? s2 # -> false
s2.subset? s1 # -> true
[...]
Public Class Methods
new(enum = nil)
Creates a new set containing the elements of the given enumerable
object.
If a block is given, the elements of enum are preprocessed by the
given block.
try this one:
a=[1,2,3]
Hash[a.zip]
You can do this very handy trick:
Hash[*[1, 2, 3, 4].map {|k| [k, nil]}.flatten]
=> {1=>nil, 2=>nil, 3=>nil, 4=>nil}
If you want to quickly ask "is X in the array?" you should use Array#include?.
Edit (in response to addition in OP):
If you want speedy look up times, use a Set. Having a Hash that points to all nils is silly. Conversion is an easy process too with Array#to_set.
require 'benchmark'
require 'set'
array = (1..1_000_000).to_a
set = array.to_set
Benchmark.bm(15) do |x|
x.report("Array.include?") { 1000.times { array.include?(500_000) } }
x.report("Set.include?") { 1000.times { set.include?(500_000) } }
end
Results on my machine:
user system total real
Array.include? 36.200000 0.140000 36.340000 ( 36.740605)
Set.include? 0.000000 0.000000 0.000000 ( 0.000515)
You should consider just using a set to begin with, instead of an array so that a conversion is never necessary.
I'm fairly certain that there isn't a one-shot clever way to construct this hash. My inclination would be to just be explicit and state what I'm doing:
hash = {}
array.each{|x| hash[x] = nil}
It doesn't look particularly elegant, but it's clear, and does the job.
FWIW, your original suggestion (under Ruby 1.8.6 at least) doesn't seem to work. I get an "ArgumentError: odd number of arguments for Hash" error. Hash.[] expects a literal, even-lengthed list of values:
Hash[a, 1, b, 2] # => {a => 1, b => 2}
so I tried changing your code to:
hash = Hash[*array.map {|x| [x, nil]}.flatten]
but the performance is dire:
#!/usr/bin/ruby -w
require 'benchmark'
array = (1..100_000).to_a
Benchmark.bm(15) do |x|
x.report("assignment loop") {hash = {}; array.each{|e| hash[e] = nil}}
x.report("hash constructor") {hash = Hash[*array.map {|e| [e, nil]}.flatten]}
end
gives
user system total real
assignment loop 0.440000 0.200000 0.640000 ( 0.657287)
hash constructor 4.440000 0.250000 4.690000 ( 4.758663)
Unless I'm missing something here, a simple assignment loop seems the clearest and most efficient way to construct this hash.
Rampion beat me to it. Set might be the answer.
You can do:
require 'set'
set = array.to_set
set.include?(x)
Your way of creating the hash looks good. I had a muck around in irb and this is another way
>> [1,2,3,4].inject(Hash.new) { |h,i| {i => nil}.merge(h) }
=> {1=>nil, 2=>nil, 3=>nil, 4=>nil}
I think chrismear's point on using assignment over creation is great. To make the whole thing a little more Ruby-esque, though, I might suggest assigning something other than nil to each element:
hash = {}
array.each { |x| hash[x] = 1 } # or true or something else "truthy"
...
if hash[376] # instead of if hash.has_key?(376)
...
end
The problem with assigning to nil is that you have to use has_key? instead of [], since [] give you nil (your marker value) if the Hash doesn't have the specified key. You could get around this by using a different default value, but why go through the extra work?
# much less elegant than above:
hash = Hash.new(42)
array.each { |x| hash[x] = nil }
...
unless hash[376]
...
end
Maybe I am misunderstanding the goal here; If you wanted to know if X was in the array, why not do array.include?("X") ?
Doing some benchmarking on the suggestions so far gives that chrismear and Gaius's assignment-based hash creation is slightly faster than my map method (and assigning nil is slightly faster than assigning true). mtyaka and rampion's Set suggestion is about 35% slower to create.
As far as lookups, hash.include?(x) is a very tiny amount faster than hash[x]; both are twice as a fast as set.include?(x).
user system total real
chrismear 6.050000 0.850000 6.900000 ( 6.959355)
derobert 6.010000 1.060000 7.070000 ( 7.113237)
Gaius 6.210000 0.810000 7.020000 ( 7.049815)
mtyaka 8.750000 1.190000 9.940000 ( 9.967548)
rampion 8.700000 1.210000 9.910000 ( 9.962281)
user system total real
times 10.880000 0.000000 10.880000 ( 10.921315)
set 93.030000 17.490000 110.520000 (110.817044)
hash-i 45.820000 8.040000 53.860000 ( 53.981141)
hash-e 47.070000 8.280000 55.350000 ( 55.487760)
Benchmarking code is:
#!/usr/bin/ruby -w
require 'benchmark'
require 'set'
array = (1..5_000_000).to_a
Benchmark.bmbm(10) do |bm|
bm.report('chrismear') { hash = {}; array.each{|x| hash[x] = nil} }
bm.report('derobert') { hash = Hash[array.map {|x| [x, nil]}] }
bm.report('Gaius') { hash = {}; array.each{|x| hash[x] = true} }
bm.report('mtyaka') { set = array.to_set }
bm.report('rampion') { set = Set.new(array) }
end
hash = Hash[array.map {|x| [x, true]}]
set = array.to_set
array = nil
GC.start
GC.disable
Benchmark.bmbm(10) do |bm|
bm.report('times') { 100_000_000.times { } }
bm.report('set') { 100_000_000.times { set.include?(500_000) } }
bm.report('hash-i') { 100_000_000.times { hash.include?(500_000) } }
bm.report('hash-e') { 100_000_000.times { hash[500_000] } }
end
GC.enable
If you're not bothered what the hash values are
irb(main):031:0> a=(1..1_000_000).to_a ; a.length
=> 1000000
irb(main):032:0> h=Hash[a.zip a] ; h.keys.length
=> 1000000
Takes a second or so on my desktop.
If you're looking for an equivalent of this Perl code:
grep {$_ eq $element} #array
You can just use the simple Ruby code:
array.include?(element)
Here's a neat way to cache lookups with a Hash:
a = (1..1000000).to_a
h = Hash.new{|hash,key| hash[key] = true if a.include? key}
Pretty much what it does is create a default constructor for new hash values, then stores "true" in the cache if it's in the array (nil otherwise). This allows lazy loading into the cache, just in case you don't use every element.
This preserves 0's if your hash was [0,0,0,1,0]
hash = {}
arr.each_with_index{|el, idx| hash.merge!({(idx + 1 )=> el }) }
Returns :
# {1=>0, 2=>0, 3=>0, 4=>1, 5=>0}

Resources