How to check whether have same element in two arrays? [duplicate] - arrays

This question already has answers here:
How can I check if a Ruby array includes one of several values?
(5 answers)
Closed 7 years ago.
For example:
a = [1,2,3,4,5,6,7,8]
b = [1,9,10,11,12,13,14,15]
a array has 1 and b array has 1 too. So they have the same element.
How to compare them and return true or false with ruby?

Check if a & b is empty:
a & b
# => [1]
(a & b).empty?
# => false

If you have many elements per Array, doing an intersection (&) can be an expensive operation. I assume that it would be quicker to go 'by hand':
def have_same_element?(array1, array2)
# Return true on first element found that is in both array1 and array2
# Return false if no such element found
array1.each do |elem|
return true if array2.include?(elem)
end
return false
end
a = [*1..100] # [1, 2, 3, ... , 100]
b = a.reverse.to_a # [100, 99, 98, ... , 1]
puts have_same_element?(a, b)
If you know more beforehand (e.g. "array1 contains many duplicates") you can further optimize the operation (e.g. by calling uniq or compact first, depending on your data).
Would be interesting to see actual benchmarks.
Edit
require 'benchmark'
Benchmark.bmbm(10) do |bm|
bm.report("by hand") {have_same_element?(a, b)}
bm.report("set operation") { (a & b).empty? }
end
Rehearsal -------------------------------------------------
by hand 0.000000 0.000000 0.000000 ( 0.000014)
set operation 0.000000 0.000000 0.000000 ( 0.000095)
---------------------------------------- total: 0.000000sec
user system total real
by hand 0.000000 0.000000 0.000000 ( 0.000012)
set operation 0.000000 0.000000 0.000000 ( 0.000131)
So, in this case it looks as if the "by hand" method is really faster, but its quite a sloppy method of benchmarking with limited expressiveness.
Also, see #CarySwoveland s excellent comments about using sets, proper benchmarking and a snappier expression using find (detect would do the same and be more expressive imho - but carefull as it returns the value found - if your arrays contain falsey values like nil (or false)...; you generally want to use any?{} here).

Intersection of two arrays can get using & operator. If you need to get similar elements in two arrays, take intersect as
a = [1,2,3,4,5,6,7,8]
b = [1,9,10,11,12,13,14,15]
and taking intersection
u = a & b
puts u
# [1]
u.empty?
# false

Related

perl6 What is a quick way to de-select array or list elements?

To select multiple elements from an array in perl6, it is easy: just use a list of indices:
> my #a = < a b c d e f g >;
> #a[ 1,3,5 ]
(b d f)
But to de-select those elements, I had to use Set:
> say #a[ (#a.keys.Set (-) (1,3,5)).keys.sort ]
(a c e g)
I am wondering if there is an easier way because the arrays I use are often quite large?
sub infix:<not-at> ($elems, #not-ats) {
my $at = 0;
flat gather for #not-ats -> $not-at {
when $at < $not-at { take $at++ xx $not-at - $at }
NEXT { $at++ }
LAST { take $at++ xx $elems - $not-at - 1 }
}
}
my #a = < a b c d e f g >;
say #a[ * not-at (1, 3, 5) ]; # (a c e g)
I think the operator code is self-explanatory if you know each of the P6 constructs it uses. If anyone would appreciate an explanation of it beyond the following, let me know in the comments.
I'll start with the two aspects that generate the call to not-at.
* aka Whatever
From the Whatever doc page:
When * is used in term position, that is, as an operand, in combination with most operators, the compiler will transform the expression into a closure of type WhateverCode
* is indeed used in the above as an operand. In this case it's the left argument (corresponding to the $elems parameter) of the infix not-at operator that I've just created.
The next question is, will the compiler do the transform? The compiler decides based on whether the operator has an explicit * as the parameter corresponding to the * argument. If I'd written * instead of $elems then that would have made not-at one of the few operators that wants to directly handle the * and do whatever it chooses to do and the compiler would directly call it. But I didn't. I wrote $elems. So the compiler does the transform I'll describe next.
The transform builds a new WhateverCode around the enclosing expression and rewrites the Whatever as "it" aka the topic aka $_ instead. So in this case it turns this:
* not-at (1,3,5)
into this:
{ $_ not-at (1,3,5) }
What [...] as a subscript does
The [...] in #a[...] is a Positional (array/list) subscript. This imposes several evaluation aspects, of which two matter here:
"it" aka the topic aka $_ is set to the length of the list/array.
If the content of the subscript is a Callable it gets called. The WhateverCode generated as explained above is indeed a Callable so it gets called.
So this:
#a[ * not-at (1,3,5) ]
becomes this:
#a[ { $_ not-at [1,3,5] } ]
which turns into this:
#a[ { infix:not-at(7, [1,3,5]) } ]
Given the indexer wants the elements to extract, we could solve this by turning the list of elements to exclude into a list of ranges of elements to extract. That is, given:
1, 3, 5
We'd produce something equivalent to:
0..0, 2..2, 4..4, 6..Inf
Given:
my #exclude = 1, 3, 5;
We can do:
-1, |#exclude Z^..^ |#exclude, Inf
To break it down, zip (-1, 1, 3, 5) with (1, 3, 5, Inf), but using the range operator with exclusive endpoints. This results in, for the given example:
(-1^..^1 1^..^3 3^..^5 5^..^Inf)
Which is equivalent to the ranges I mentioned above. Then we stick this into the indexer:
my #a = <a b c d e f g>
my #exclude = 1, 3, 5;
say #a[-1, |#exclude Z^..^ |#exclude, Inf].flat
Which gives the desired result:
(a c e g)
This approach is O(n + m). It will probably work out quite well if there array is long, but the number of things to exclude is comparatively small, since it only produces the Range objects needed for the indexing, and then indexing by range is relatively well optimized.
Finally, should the flat on the outside be considered troublesome, it's also possible to move it inside:
#a[{ flat -1, |#exclude Z^..^ |#exclude, $_ }]
Which works because the block is passed the number of elements in #a.
Here's another option:
my #a = < a b c d e f g >;
say #a[#a.keys.grep(none(1, 3, 5))];
But all in all, arrays aren't optimized for this use case. They are optimized for working with a single element, or all elements, and slices provide a shortcut for (positively) selecting several elements by key.
If you tell us about the underlying use case, maybe we can recommend a more suitable data structure.
This might be slow for big arrays, but it's logically the closer to what you're looking for:
my #a = <a b c d>;
say (#a ⊖ #a[0,1]).keys; # (c d)
It's basically the same solution you proposed at the beginning, using set difference, except it's using it on the whole array instead of on the indices. Also, in some cases you might use the set directly; it depends on what you want to do.
#raiphs solution combined with #Jonathan Worthington 's:
The operator should be very efficient for huge numbers and large #not-ats lists as it returns a list of ranges, and it even creates that list of ranges lazily. For the #not-ats it supports integers and ranges with included and excluded bounds and infinity. But it has to be ascending.
The $elems can be a Range or an Int. It is interpreted as .Int as in Jonathan Worthington's solution to support (but needs a .flat applying it to array slicing - the price of performance for the lazy operator - this could be changed by using flat gather instead of lazy gather in the 2nd line)
#a[ (* not-at (1, 3, 5)).flat ];
or newly support
#a[ (* not-at (1, 3^ .. 5, 8 .. 8, 10, 14 .. ^18, 19 .. *)).flat ];
The performance improvements can be seen, when not slicing the array at once, but operating on parts of the array, optimally with multithreading.
sub infix:<not-at> ($elems, #not-ats) {
lazy gather {
my $at = 0;
for #not-ats { # iterate over #not-ats ranges
my ($stop, $continue) = do given $_ {
when Int { succeed $_, $_ } # 5
when !.infinite { succeed .int-bounds } # 3..8 | 2^..8 | 3..^9 | 2^..^9
when !.excludes-min { succeed .min, $elems.Int } # 4..*
default { succeed .min + 1, $elems.Int } # 3^..*
}
take $at .. $stop - 1 if $at < $stop; # output Range before current $not-at range
$at = $continue + 1; # continue after current $not-at range
}
take $at .. $elems.Int - 1 if $at < $elems; # output Range with remaining elements
}
}

Sorted version of in

I have an array of times event_times and I want to check if t in event_times. However, I know that event_times is sorted. Is there a way to make use of that to make the search faster?
An idiomatic Julian way would be an elaboration of:
struct SortedVector{T,V<:AbstractVector} <: AbstractVector{T}
v::V
SortedVector{T,V}(v::AbstractVector{T}) where {T, V} = new(v)
# check sorted in inner constructor??
end
SortedVector(v::AbstractVector{T}) where T = SortedVector{T,typeof(v)}(v)
#inline Base.size(sv::SortedVector) = size(sv.v)
#inline Base.getindex(sv::SortedVector,i) = sv.v[i]
#inline Base.in(e::T,sv::SortedVector{T}) where T = !isempty(searchsorted(sv.v,e))
And then:
julia> v = SortedVector(sort(rand(1:10,10)))
10-element SortedVector{Int64,Array{Int64,1}}:
1
4
5
5
6
6
6
7
7
10
julia> 3 in v
false
julia> 1 in v
true
If I recall correctly David Sanders had an implementation with this name. Perhaps looking at https://github.com/JuliaIntervals/IntervalOptimisation.jl/blob/889bf43e8a514e696869baaa6af1300ace87b90b/src/SortedVectors.jl would promote reuse.
Following #ColinTBowers's hint, you can use the fact that searchsorted returns a range which is empty iff t is not in event_times. Thus !isempty(searchsorted(event_times,t)) is a fast method to get the answer.

How to know if multiple arrays in a list are the same

I would like to know whether all arrays within a list are the same.
== compares two arrays, but I want to know if there is any library method to tell if all arrays within a list are the same.
You can traverse the list of arrays just once, comparing the first array with all the other arrays. If the first one is equal to all the others, then all arrays in the list are equal. Something like this will work:
arrays = [[1,3],[1,3],[1,3]]
array0 = arrays.first
arrays[1..-1].all? { |a| array0 == a }
# => true
arrays = [[1,3],[1,3],[1,4]]
array0 = arrays.first
arrays[1..-1].all? { |a| array0 == a }
# => false
I was curious about the performance of each of the solutions here. Please be welcome to edit this post with your own results, if you like.
In my tests, the difference between the approaches raised with the length of the list of arrays, so I preferably measured a long list of relatively short arrays. I always did a few runs to remove the possible influence of GC.
require 'benchmark'
n = 10
n_arrays = 1000000
arrays = [(1..n).to_a] * n_arrays
Benchmark.bm(14) do |bm|
bm.report("1st vs others:") do
array0 = arrays.first
arrays[1..-1].all? { |a| array0 == a }
end
bm.report("uniq:") { arrays.uniq.size == 1 }
bm.report("each_cons:") { arrays.each_cons(2).all?{|x, y| x == y} }
end
The results suggest that while the each_cons approach is about the same (only slightly slower) than the "1st vs others" approach, the one using uniq is much much slower.
user system total real
1st vs others: 0.080000 0.000000 0.080000 ( 0.080872)
uniq: 1.810000 0.000000 1.810000 ( 1.807646)
each_cons: 0.180000 0.000000 0.180000 ( 0.174251)
[[1,3],[1,3],[1,3]].uniq.size == 1
#=> true
[[1,3],[1,3],[1,4]].uniq.size == 1
#=> false
array.each_cons(2).all?{|x, y| x == y}

How to check if two data frames are equal [duplicate]

This question already has an answer here:
regarding matrix comparison in R
(1 answer)
Closed 9 years ago.
Say I have large datasets in R and I just want to know whether two of them they are the same. I use this often when I'm experimenting different algorithms to achieve the same result. For example, say we have the following datasets:
df1 <- data.frame(num = 1:5, let = letters[1:5])
df2 <- df1
df3 <- data.frame(num = c(1:5, NA), let = letters[1:6])
df4 <- df3
So this is what I do to compare them:
table(x == y, useNA = 'ifany')
Which works great when the datasets have no NAs:
> table(df1 == df2, useNA = 'ifany')
TRUE
10
But not so much when they have NAs:
> table(df3 == df4, useNA = 'ifany')
TRUE <NA>
11 1
In the example, it's easy to dismiss the NA as not a problem since we know that both dataframes are equal. The problem is that NA == <anything> yields NA, so whenever one of the datasets has an NA, it doesn't matter what the other one has on that same position, the result is always going to be NA.
So using table() to compare datasets doesn't seem ideal to me. How can I better check if two data frames are identical?
P.S.: Note this is not a duplicate of R - comparing several datasets, Comparing 2 datasets in R or Compare datasets in R
Look up all.equal. It has some riders but it might work for you.
all.equal(df3,df4)
# [1] TRUE
all.equal(df2,df1)
# [1] TRUE
As Metrics pointed out, one could also use identical() to compare the datasets. The difference between this approach and that of Codoremifa is that identical() will just yield TRUE of FALSE, depending whether the objects being compared are identical or not, whereas all.equal() will either return TRUE or hints about the differences between the objects. For instance, consider the following:
> identical(df1, df3)
[1] FALSE
> all.equal(df1, df3)
[1] "Attributes: < Component 2: Numeric: lengths (5, 6) differ >"
[2] "Component 1: Numeric: lengths (5, 6) differ"
[3] "Component 2: Lengths: 5, 6"
[4] "Component 2: Attributes: < Component 2: Lengths (5, 6) differ (string compare on first 5) >"
[5] "Component 2: Lengths (5, 6) differ (string compare on first 5)"
Moreover, from what I've tested identical() seems to run much faster than all.equal().

Multiple assignment in Scala without using Array?

I have an input something like this: "1 2 3 4 5".
What I would like to do, is to create a set of new variables, let a be the first one of the sequence, b the second, and xs the rest as a sequence (obviously I can do it in 3 different lines, but I would like to use multiple assignment).
A bit of search helped me by finding the right-ignoring sequence patterns, which I was able to use:
val Array(a, b, xs # _*) = "1 2 3 4 5".split(" ")
What I do not understand is that why doesn't it work if I try it with a tuple? I get an error for this:
val (a, b, xs # _*) = "1 2 3 4 5".split(" ")
The error message is:
<console>:1: error: illegal start of simple pattern
Are there any alternatives for multiple-assignment without using Array?
I have just started playing with Scala a few days ago, so please bear with me :-) Thanks in advance!
Other answers tell you why you can't use tuples, but arrays are awkward for this purpose. I prefer lists:
val a :: b :: xs = "1 2 3 4 5".split(" ").toList
Simple answer
val Array(a, b, xs # _*) = "1 2 3 4 5".split(" ")
The syntax you are seeing here is a simple pattern-match. It works because "1 2 3 4 5".split(" ") evaluates to an Array:
scala> "1 2 3 4 5".split(" ")
res0: Array[java.lang.String] = Array(1, 2, 3, 4, 5)
Since the right-hand-side is an Array, the pattern on the left-hand-size must, also, be an Array
The left-hand-side can be a tuple only if the right-hand-size evaluates to a tuple as well:
val (a, b, xs) = (1, 2, Seq(3,4,5))
More complex answer
Technically what's happening here is that the pattern match syntax is invoking the unapply method on the Array object, which looks like this:
def unapplySeq[T](x: Array[T]): Option[IndexedSeq[T]] =
if (x == null) None else Some(x.toIndexedSeq)
Note that the method accepts an Array. This is what Scala must see on the right-hand-size of the assignment. And it returns a Seq, which allows for the #_* syntax you used.
Your version with the tuple doesn't work because Tuple3's unapplySeq is defined with a Product3 as its parameter, not an Array:
def unapply[T1, T2, T3](x: Product3[T1, T2, T3]): Option[Product3[T1, T2, T3]] =
Some(x)
You can actually "extractors" like this that do whatever you want by simply creating an object and writing an unapply or unapplySeq method.
The answer is:
val a :: b :: c = "1 2 3 4 5".split(" ").toList
Should clarify that in some cases one may want to bind just the first n elements in a list, ignoring the non-matched elements. To do that, just add a trailing underscore:
val a :: b :: c :: _ = "1 2 3 4 5".split(" ").toList
That way:
c = "3" vs. c = List("3","4","5")
I'm not an expert in Scala by any means, but I think this might have to do with the fact that Tuples in Scala are just syntatic sugar for classes ranging from Tuple2 to Tuple22.
Meaning, Tuples in Scala aren't flexible structures like in Python or other languages of the sort, so it can't really create a Tuple with an unknown a priori size.
We can use pattern matching to extract the values from string and assign it to multiple variables. This requires two lines though.
Pattern says that there are 3 numbers([0-9]) with space in between. After the 3rd number, there can be text or not, which we don't care about (.*).
val pat = "([0-9]) ([0-9]) ([0-9]).*".r
val (a,b,c) = "1 2 3 4 5" match { case pat(a,b,c) => (a,b,c) }
Output
a: String = 1
b: String = 2
c: String = 3

Resources