Julia: does an Array contain a specific sub-array - arrays

In julia we can check if an array contains a value, like so:
> 6 in [4,6,5]
true
However this returns false, when attempting to check for a sub-array in a specific order:
> [4,6] in [4,6,5]
false
What is the correct syntax to verify if a specific sub-array exists in an array?

I think it is worth mentioning that in Julia 1.0 you have the function issubset
> issubset([4,6], [4,6,5])
true
You can also quite conveniently call it using the \subseteq latex symbol
> [4,6] ⊆ [4,6,5]
true
This looks pretty optimized to me:
> using Random
> x, y = randperm(10^3)[1:10^2], randperm(10^3);
> #btime issubset(x, y);
16.153 μs (12 allocations: 45.96 KiB)

It takes a little bit of code to make a function that performs well, but this is much faster than the issubvec version above:
function subset2(x,y)
lenx = length(x)
first = x[1]
if lenx == 1
return findnext(y, first, 1) != 0
end
leny = length(y)
lim = length(y) - length(x) + 1
cur = 1
while (cur = findnext(y, first, cur)) != 0
cur > lim && break
beg = cur
#inbounds for i = 2:lenx
y[beg += 1] != x[i] && (beg = 0 ; break)
end
beg != 0 && return true
cur += 1
end
false
end
Note: it would also be much more useful if the function actually returned the position of the beginning of the subarray if found, or 0 if not, similarly to the findfirst/findnext functions.
Timing information (the second one is using my subset2 function):
0.005273 seconds (65.70 k allocations: 4.073 MB)
0.000086 seconds (4 allocations: 160 bytes)

For the third condition i.e. vector [4,6] appears as a sub-vector of 4,6,5 the following function is suggested:
issubvec(v,big) =
any([v == slice(big,i:(i+length(v)-1)) for i=1:(length(big)-length(v)+1)])
For the second condition, that is, give a boolean for each element in els vectors which appears in set vector, the following is suggested:
function vecin(els,set)
res = zeros(Bool,size(els))
res[findin(els,set)]=true
res
end
With the vector in the OP, these result in:
julia> vecin([4,6],[4,6,5])
2-element Array{Bool,1}:
true
true
julia> issubvec([4,6],[4,6,5])
true

note that you can now vectorize in with a dot:
julia> in([4,6,5]).([4, 6])
2-element BitArray{1}:
true
true
and chain with all to get your answer:
julia> all(in([4,6,5]).([4, 6]))
true

I used this recently to find subsequences in arrays of integers. It's not as good or as fast as #scott's subset2(x,y)... but it returns the indices.
function findsequence(arr::Array{Int64}, seq::Array{Int64})
indices = Int64[]
i = 1
n = length(seq)
if n == 1
while true
occurrence = findnext(arr, seq[1], i)
if occurrence == 0
break
else
push!(indices, occurrence)
i = occurrence +1
end
end
else
while true
occurrence = Base._searchindex(arr, seq, i)
if occurrence == 0
break
else
push!(indices, occurrence)
i = occurrence +1
end
end
end
return indices
end
julia> #time findsequence(rand(1:9, 1000), [2,3])
0.000036 seconds (29 allocations: 8.766 KB)
16-element Array{Int64,1}:
80
118
138
158
234
243
409
470
539
589
619
629
645
666
762
856

Here is a more up-to-date implementation using findall
function issubsequence(A, B)
B1inA = findall(isequal(B[1]), A) # indices of the first element of b occuring in a
matchFirstIndex = [] # Saves the first index in A of the occurances
for i in B1inA
if length(A[i:end]) < length(B) continue end
if A[i:i + length(B) - 1] == B
push!(matchFirstIndex, i)
end
end
return matchFirstIndex
end
I get a similar runtime to #daycaster
#time issubsequence(rand(1:9, 1000), [2,3])
0.000038 seconds (111 allocations: 20.844 KiB)
7-element Vector{Any}:
57
427
616
644
771
855
982

There is no standard Julia library function to determine if a particular sequence occurs as a subsequence of another. Probably this is because this is actually a fairly trickly problem (known as the String-searching algorithm) and quite how to do it depends on whether you'll be searching repeatedly, whether you want to do multiple matches, have multiple patterns, want fuzzy matches, etc.
The other answers here give reasonable answers but some are old and Julia has improved, and I wanted to offer a slightly more idiomatic solution.
function issubarray(needle, haystack)
getView(vec, i, len) = view(vec, i:i+len-1)
ithview(i) = getView(haystack, i, length(needle))
return any(i -> ithview(i) == needle, 1:length(haystack)-length(needle)+1)
end
This is lightening fast and requires almost no memory - Julia's view is lightweight and efficient. And, as always with Julia, the solution is generally to simply define more functions.

Related

Array subsetting in Julia

With the Julia Language, I defined a function to sample points uniformly inside the sphere of radius 3.14 using rejection sampling as follows:
function spherical_sample(N::Int64)
# generate N points uniformly distributed inside sphere
# using rejection sampling:
points = pi*(2*rand(5*N,3).-1.0)
ind = sum(points.^2,dims=2) .<= pi^2
## ideally I wouldn't have to do this:
ind_ = dropdims(ind,dims=2)
return points[ind_,:][1:N,:]
end
I found a hack for subsetting arrays:
ind = sum(points.^2,dims=2) .<= pi^2
## ideally I wouldn't have to do this:
ind_ = dropdims(ind,dims=2)
But, in principle array indexing should be a one-liner. How could I do this better in Julia?
The problem is that you are creating a 2-dimensional index vector. You can avoid it by using eachrow:
ind = sum.(eachrow(points.^2)) .<= pi^2
So that your full answer would be:
function spherical_sample(N::Int64)
points = pi*(2*rand(5*N,3).-1.0)
ind = sum.(eachrow(points.^2)) .<= pi^2
return points[ind,:][1:N,:]
end
Here is a one-liner:
points[(sum(points.^2,dims=2) .<= pi^2)[:],:][1:N, :]
Note that [:] is dropping a dimension so the BitArray can be used for indexing.
This does not answer your question directly (as you already got two suggestions), but I rather thought to hint how you could implement the whole procedure differently if you want it to be efficient.
The first point is to avoid generating 5*N rows of data - the problem is that it is very likely that it will be not enough to generate N valid samples. The point is that the probability of a valid sample in your model is ~50%, so it is possible that there will not be enough points to choose from and [1:N, :] selection will throw an error.
Below is the code I would use that avoids this problem:
function spherical_sample(N::Integer) # no need to require Int64 only here
points = 2 .* pi .* rand(N, 3) .- 1.0 # note that all operations are vectorized to avoid excessive allocations
while N > 0 # we will run the code until we have N valid rows
v = #view points[N, :] # use view to avoid allocating
if sum(x -> x^2, v) <= pi^2 # sum accepts a transformation function as a first argument
N -= 1 # row is valid - move to the previous one
else
rand!(v) # row is invalid - resample it in place
#. v = 2 * pi * v - 1.0 # again - do the computation in place via broadcasting
end
end
return points
end
This one is pretty fast, and uses StaticArrays. You can probably also implement something similar with ordinary tuples:
using StaticArrays
function sphsample(N)
T = SVector{3, Float64}
v = Vector{T}(undef, N)
n = 1
while n <= N
p = rand(T) .- 0.5
#inbounds v[n] = p .* 2π
n += (sum(abs2, p) <= 0.25)
end
return v
end
On my laptop it is ~9x faster than the solution with views.

Reducer for non-parallel for loops/multiline comprehensions

Julia has a parallel macro for for loops, which allows things like:
s = #sync #parallel vcat for i in 1:9
k = iseven(i) ? i÷2 : 3i+1
k^2
end
and since the reducer specified is vcat, we get back an array of numbers.
Is it possible to do something like this with a normal for loop (without having to explicitly initialize and push! into the array)?
Since I'm only looking to reduce using vcat, another way to ask this question is: is there a neat readable multiline form of array comprehensions? It's possible to stretch to usual comprehension syntax like this:
s = [
(k = iseven(i) ? i÷2 : 3i+1;
k^2)
for i in 1:9
]
but that seems messy and less readable compared to the #parallel vcat for syntax. Is there a better way of doing multiline comprehensions?
Extending on #Gnimuc's answer, I think mapreduce plus do-syntax is pretty nice:
julia> mapreduce(vcat, 1:9) do i
k = iseven(i) ? i÷2 : 3i+1
k^2
end
9-element Array{Int64,1}:
16
1
100
4
256
9
484
16
784
The short answer is to write multiline functions(or do-blocks as #phg reminds) with a single line array comprehension or map/mapreduce:
s = [
(k = iseven(i) ? i÷2 : 3i+1;
k^2)
for i in 1:9
]
This example is pure comprehension, no reducer is involved. Array comprehension is usually written in one line, for example, s = [iseven(i) ? i÷2 : 3i+1 |> x->x^2 for i in 1:9]. As #phg suggested, multi-line functions can be enclosed in a do-block:
julia> map(1:9) do x
k = iseven(x) ? x÷2 : 3x+1
k^2
end
However, no reducer such as vcat is needed in this case, but if the output of f in the above example is a vector:
julia> function f(x)
k = iseven(x) ? x÷2 : 3x+1
[k^2]
end
f (generic function with 1 method)
julia> s = [f(i) for i in 1:9]
9-element Array{Array{Int64,1},1}:
[16]
[1]
[100]
[4]
[256]
[9]
[484]
[16]
[784]
array comprehension will give you an array of vectors. This time you need to use mapreduce instead:
julia> mapreduce(f, vcat, 1:9)
9-element Array{Int64,1}:
16
1
100
4
256
9
484
16
784

Killing a For loop in Julia array comprehension

I have the following line of code in Julia:
X=[(i,i^2) for i in 1:100 if i^2%5==0]
Basically, it returns a list of tuples (i,i^2) from i=1 to 100 if the remainder of i^2 and 5 is zero. What I want to do is, in the array comprehension, break out of the for loop if i^2 becomes larger than 1000. However, if I implement
X=[(i,i^2) for i in 1:100 if i^2%5==0 else break end]
I get the error: syntax: expected "]".
Is there any way to easily break out of this for loop inside the array? I've tried looking online, but nothing came up.
It's a "fake" for-loop, so you can't break it. Take a look at the lowered code below:
julia> foo() = [(i,i^2) for i in 1:100 if i^2%5==0]
foo (generic function with 1 method)
julia> #code_lowered foo()
LambdaInfo template for foo() at REPL[0]:1
:(begin
nothing
#1 = $(Expr(:new, :(Main.##1#3)))
SSAValue(0) = #1
#2 = $(Expr(:new, :(Main.##2#4)))
SSAValue(1) = #2
SSAValue(2) = (Main.colon)(1,100)
SSAValue(3) = (Base.Filter)(SSAValue(1),SSAValue(2))
SSAValue(4) = (Base.Generator)(SSAValue(0),SSAValue(3))
return (Base.collect)(SSAValue(4))
end)
The output shows that array comprehension is implemented via Base.Generator which takes an iterator as input. It only supports the [if cond(x)::Bool] "guard" for now, so there is no way to use break here.
For your specific case, a workaround is to use isqrt:
julia> X=[(i,i^2) for i in 1:isqrt(1000) if i^2%5==0]
6-element Array{Tuple{Int64,Int64},1}:
(5,25)
(10,100)
(15,225)
(20,400)
(25,625)
(30,900)
I don't think so. You could always just
tmp(i) = (j = i^2; j > 1000 ? false : j%5==0)
X=[(i,i^2) for i in 1:100 if tmp(i)]
Using a for loop is considered idiomatic in Julia and could be more readable in this instance. Also, it could be faster.
Specifically:
julia> using BenchmarkTools
julia> tmp(i) = (j = i^2; j > 1000 ? false : j%5==0)
julia> X1 = [(i,i^2) for i in 1:100 if tmp(i)];
julia> #btime [(i,i^2) for i in 1:100 if tmp(i)];
471.883 ns (7 allocations: 528 bytes)
julia> X2 = [(i,i^2) for i in 1:isqrt(1000) if i^2%5==0];
julia> #btime [(i,i^2) for i in 1:isqrt(1000) if i^2%5==0];
281.435 ns (7 allocations: 528 bytes)
julia> function goodsquares()
res = Vector{Tuple{Int,Int}}()
for i=1:100
if i^2%5==0 && i^2<=1000
push!(res,(i,i^2))
elseif i^2>1000
break
end
end
return res
end
julia> X3 = goodsquares();
julia> #btime goodsquares();
129.123 ns (3 allocations: 304 bytes)
So, another 2x improvement is nothing to disregard and the long function gives plenty of room for illuminating comments.

ruby - Faster prime gap search

I'm working on creating a method, gap(g, m, n).
All 3 parameters are integers.
gap finds the first 2 consecutive prime numbers, between m and n, that have a difference of g.
For example, gap(2, 1, 10) => [3, 5]
In the range from m to n, 1..10,
the first 2 consecutive prime numbers with a gap of 2 is [3,5].
If instead, it was gap(1, 1, 10) => [2,3]
and if it was gap(6, 1, 10) => nil
https://repl.it/BbGo/1
# method to check if a number is prime
def prime?(num)
(2..Math.sqrt(num).floor).each do |m|
if num % m == 0
return false
end
end
true
end
this method works by iterating through each number from 2, the smallest prime, to the square root of the parameter, checking to see if the parameter is evenly divisible by anything in that range. If it is, the method returns false.
# gap method
def gap(g, m, n)
if g.odd? && g > 1
return nil
end
primes = (m..n).select do |num|
num.odd? && prime?(num)
end
first = primes[0..-2].find_index do |x|
primes[primes.index(x) + 1] - x == g
end
[
primes[first],
primes[first+1]
] unless first.nil?
end
gap(2, 10000000, 11000000)
All prime numbers have a gap of either 2, 4, or a number made up of 2 and 4's added together. The only exception is the gap from 2-3.
So if the gap argument, g, given is a number like 3, which is both odd and greater than 1, the method automatically returns nil because no such prime gap exists.
Problem
My issue is that the method is too slow. By using the replit link above, you get a time of about 20 seconds. Apparently it is possible to get it to about 1 second.
I've tried optimizing by filtering out the even numbers already from m..n, which helped. But I'm just not sure how I can get this to go even faster.
What I'm thinking is before finding out every single prime number in m..n, I should check each iteration if the gap is correct, so once I find it I can just terminate the method without looking at unnecessary primes, but I'm not sure how to implement this.
Thanks for the help, and any general criticism of my code is welcome as well.
Answer using Jordan's advice:
def gap(g, m, n)
if g.odd? && g > 1
return nil
end
recent = 0
current = 0
(m..n).each do |num|
if num.odd? && prime?(num)
current = num
if current - recent == g
break
else
recent = current
end
end
end
[recent, current] unless current - recent != g
end
gap(2, 10000000, 11000000)
#=> [10000139, 10000141]
completes in 3 ms.
https://repl.it/BbGo/3
Thanks!

Vector search Algorithm

I have the following problem. Say I have a vector:
v = [1,2,3,4,5,1,2,3,4,...]
I want to sequentially sample points from the vector, that have an absolute maginute difference higher than a threshold from a previously sampled point. So say my threshold is 2.
I start at the index 1, and sample the first point 1. Then my condition is met at v[3], and I sample 3 (since 3-1 >= 2). Then 3, the new sampled point becomes the reference, that I check against. The next sampled point is 5 which is v[5] (5-3 >= 2). Then the next point is 1 which is v[6] (abs(1-5) >= 2).
Unfortunately my code in R, is taking too long. Basically I am scanning the array repeatedly and looking for matches. I think that this approach is naive though. I have a feeling that I can accomplish this task in a single pass through the array. I dont know how though. Any help appreciated. I guess the problem I am running into is that the location of the next sample point can be anywhere in the array, and I need to scan the array from the current point to the end to find it.
Thanks.
I don't see a way this can be done without a loop, so here is one:
my.sample <- function(x, thresh) {
out <- x
i <- 1
for (j in seq_along(x)[-1]) {
if (abs(x[i]-x[j]) >= thresh) {
i <- j
} else {
out[j] <- NA
}
}
out[!is.na(out)]
}
my.sample(x = c(1:5,1:4), thresh = 2)
# [1] 1 3 5 1 3
You can do this without a loop using a bit of recursion:
vsearch = function(v, x, fun=NULL) {
# v: input vector
# x: threshold level
if (!length(v) > 0) return(NULL)
y = v-rep(v[1], times=length(v))
if (!is.null(fun)) y = fun(y)
i = which(y >= x)
if (!length(i) > 0) return(NULL)
i = i[1]
return(c(v[i], vsearch(v[-(1:(i-1))], x, fun=fun)))
}
With your vector above:
> vsearch(c(1,2,3,4,5,1,2,3,4), 2, abs)
[1] 3 5 1 3

Resources