Using julia, I want to select the first x rows of an array per group.
In the following example, I want the first two rows where the second column is equal to 1.0, then the first two rows where the second column is equal to 2.0, etc.
XX = [repeat([1.0], 6) vcat(repeat([1.0], 3), repeat([2.0], 3))]
XX2 = [repeat([2.0], 6) vcat(repeat([3.0], 3), repeat([4.0], 3))]
beg = [XX;XX2]
> 12×2 Matrix{Float64}:
> 1.0 1.0
> 1.0 1.0
> 1.0 1.0
> 1.0 2.0
> 1.0 2.0
> 1.0 2.0
> 2.0 3.0
> 2.0 3.0
> 2.0 3.0
> 2.0 4.0
> 2.0 4.0
> 2.0 4.0
The final array would look like this:
8×2 Matrix{Float64}:
1.0 1.0
1.0 1.0
1.0 2.0
1.0 2.0
2.0 3.0
2.0 3.0
2.0 4.0
2.0 4.0
I use the following code, but I am not sure whether there is a simpler way (one function) that does already that in a more efficient way?
x = []
for val in unique(beg[:,2])
x = append!(x, findfirst(beg[:,2].==val))
end
idx = sort([x; x.+1])
final = beg[idx, :]
Assuming your data:
is sorted (i.e. groups are forming continuous blocks)
each group is guaranteed to have at least two elements
(your code assumes both)
then you can generate idx filter that you want in the following way:
idx == [i for i in axes(beg, 1) if i < 3 || beg[i, 2] != beg[i-1, 2] || beg[i, 2] != beg[i-2, 2]]
If you cannot assume either of the above please comment and I can show a more general solution.
EDIT
Here is an example without using any external packages:
julia> using Random
julia> XX = [repeat([1.0], 6) vcat(repeat([1.0], 3), repeat([2.0], 3))]
6×2 Matrix{Float64}:
1.0 1.0
1.0 1.0
1.0 1.0
1.0 2.0
1.0 2.0
1.0 2.0
julia> XX2 = [repeat([2.0], 7) vcat(repeat([3.0], 3), repeat([4.0], 3), 5.0)] # last group has length 1
7×2 Matrix{Float64}:
2.0 3.0
2.0 3.0
2.0 3.0
2.0 4.0
2.0 4.0
2.0 4.0
2.0 5.0
julia> beg = [XX;XX2][randperm(13), :] # shuffle groups so they are not in order
13×2 Matrix{Float64}:
2.0 3.0
1.0 2.0
2.0 4.0
2.0 3.0
2.0 4.0
2.0 5.0
2.0 3.0
1.0 2.0
1.0 2.0
1.0 1.0
1.0 1.0
2.0 4.0
1.0 1.0
julia> x = Dict{Float64, Vector{Int}}() # this will store indices per group
Dict{Float64, Vector{Int64}}()
julia> for (i, v) in enumerate(beg[:, 2]) # collect the indices
push!(get!(x, v, Int[]), i)
end
julia> x
Dict{Float64, Vector{Int64}} with 5 entries:
5.0 => [6]
4.0 => [3, 5, 12]
2.0 => [2, 8, 9]
3.0 => [1, 4, 7]
1.0 => [10, 11, 13]
julia> idx = sort!(mapreduce(x -> first(x, 2), vcat, values(x))) # get first two indices per group in ascending order
9-element Vector{Int64}:
1
2
3
4
5
6
8
10
11
Related
Say I have the following function
function foo(p :: Int, x :: Real)
return p*x
end
And I want to call it for the following arrays:
P = [1, 2, 3]
X = [5.0, 6.0, 7.0, 8.0]
If I want to call foo and store the value in a single array of size (length(P), length(X)) I could just double loop X and P as:
R = zeros(length(P), length(X))
for i in 1:length(P), j in 1:length(X)
R[i, j] = foo(P[i], X[j])
end
However, is there another way to do this while explicitly avoiding the double loop?
Maybe something like?
R = foo.(collect(zip(P, X)))
Of course, the above does not work as foo cannot handle ::Tuple{Int64, Float64}, but I have not found a way to broadcast over different sized arrays for the moment. Any tips would be appreciated, thanks!
If you want to use broadcasting you can do it like this:
julia> foo.(P, permutedims(X))
3×4 Matrix{Float64}:
5.0 6.0 7.0 8.0
10.0 12.0 14.0 16.0
15.0 18.0 21.0 24.0
or
julia> foo.(P, reshape(X, 1, :))
3×4 Matrix{Float64}:
5.0 6.0 7.0 8.0
10.0 12.0 14.0 16.0
15.0 18.0 21.0 24.0
or
julia> (((x, y),) -> foo(x, y)).(Iterators.product(P, X))
3×4 Matrix{Float64}:
5.0 6.0 7.0 8.0
10.0 12.0 14.0 16.0
15.0 18.0 21.0 24.0
or
julia> Base.splat(foo).(Iterators.product(P, X))
3×4 Matrix{Float64}:
5.0 6.0 7.0 8.0
10.0 12.0 14.0 16.0
15.0 18.0 21.0 24.0
Note that adjoint (') will not work here in general, as it is recursive:
julia> x = ["a", "b"]
2-element Vector{String}:
"a"
"b"
julia> permutedims(x)
1×2 Matrix{String}:
"a" "b"
julia> x'
1×2 adjoint(::Vector{String}) with eltype Union{}:
Error showing value of type LinearAlgebra.Adjoint{Union{}, Vector{String}}:
ERROR: MethodError: no method matching adjoint(::String)
Is there a np.nanquantile equivalent in Julia? I have a 2D array and calculate a quantile along one axis but the array contains NaN-values. My current code block:
quantiles = Array{Float32}(undef, size(array, 1), 2)
p=0.1
quantiles[:, 1] = mapslices(x -> quantile(x, p), array, dims = 2)
quantiles[:, 2] = mapslices(x -> quantile(x, 1 - p), array, dims = 2)
The simplest thing to do is to use the following:
x -> quantile(filter(!isnan, x), p)
e.g.
julia> array = [1 NaN 3 4
NaN 2 3 4]
2×4 Matrix{Float64}:
1.0 NaN 3.0 4.0
NaN 2.0 3.0 4.0
julia> mapslices(x -> quantile(filter(!isnan, x), 0.5), array, dims = 2)
2×1 Matrix{Float64}:
3.0
3.0
I want to use for loop so I can get the result for i:i+1 columns at each iteration. E.g: when i= 1, I get 1st & 2nd cols, i= 2 , 3rd and 4th cols.
m= [2 -3;4 6]
al= [1 3;-2 -4]
l=[1 0; 2 4]
Random.seed!(1234)
d= [rand(2:20, 10) rand(-1:10, 10)]
Random.seed!(1234)
c= [rand(0:30, 10) rand(-1:20, 10)]
n,g=size(w)
mx=zeros(n,2*g)
for i =1:g
mx[ : ,i:i+1] = m[:,i]' .+ (d[:,i]' .* al[:,i])' .+ (l[:,i]' .* c[:,i])
end
return mx
I got the following which is wrong when I compared it with doing the process manually
julia> mx
10×4 Array{Float64,2}:
8.0 -6.0 10.0 0.0
22.0 3.0 18.0 0.0
40.0 18.0 38.0 0.0
9.0 9.0 22.0 0.0
51.0 3.0 18.0 0.0
10.0 18.0 34.0 0.0
52.0 12.0 30.0 0.0
36.0 21.0 42.0 0.0
44.0 24.0 42.0 0.0
33.0 24.0 42.0 0.0
The first 2 cols and last 2 cols in mx should be matched the results here with same order
mx1_2=m[:,1]' .+ (d[:,1]' .* al[:,1])' .+ (l[:,1]' .* c[:,1])
mx2_4=m[:,2]' .+ (d[:,2]' .* al[:,2])' .+ (l[:,2]' .* c[:,2])
I want to assign some computation into a pair of arrays, with the top portion going into array x, and the bottom portion going into y. I attempted the following, but neither x nor y were updated:
x = zeros(2)
y = zeros(3)
[x;y] .= [1.2, 4.5, 2.3, 4.5, 5.6]
In general1, the .= operator just assigns into whatever the result of the left hand side evaluates to — and in this case the result is a brand new array with the contents of x and y vertically concatenated. You can see that [x; y] creates a new array decoupled from x and y by just trying it by itself:
x = zeros(2)
y = zeros(3)
r = [x;y]
r[1] = 1
julia> r
5-element Array{Float64,1}:
1.0
0.0
0.0
0.0
0.0
julia> x
2-element Array{Float64,1}:
0.0
0.0
julia> y
3-element Array{Float64,1}:
0.0
0.0
0.0
julia> r .= [1.2, 4.5, 2.3, 4.5, 5.6] # just changes `r`, not `x` or `y`
5-element Array{Float64,1}:
1.2
4.5
2.3
4.5
5.6
julia> all(iszero, x) && all(iszero, y)
true
Now, you can update x and y if they're put into a special "lazy" container from LazyArrays.jl that emulates a concatenation operation:
julia> using LazyArrays
julia> ApplyArray(vcat, x, y) .= [1.2, 4.5, 2.3, 4.5, 5.6]
5-element ApplyArray{Float64,1,typeof(vcat),Tuple{Array{Float64,1},Array{Float64,1}}}:
1.2
4.5
2.3
4.5
5.6
julia> x
2-element Array{Float64,1}:
1.2
4.5
julia> y
3-element Array{Float64,1}:
2.3
4.5
5.6
1 There's one important exception to this general rule: we support indexed assignment with multiple selected indices in combination with .= to update the original array. In other words, the syntax y[1:2] .= [3.4, 5.6] will indeed update the first two elements of y, even though y[1:2] elsewhere will allocate a brand new 2-element array decoupled from y. In other words, when you use indexing on the left-hand side of .=, it automatically uses a view when necessary.
Here is a question related to a previous question of mine, which I prefer to submit as a new question. Suppose this time we have only the following 2 arrays in Julia:
[5.0 3.5
6.0 3.6
7.0 3.0]
and
[5.0 4.5
6.0 4.7
8.0 3.0]
I want to obtain an array that calculates the difference between elements of the second column (the first array minus the second array, by this order) but only for common values of the first column. The resulting array must then be the following:
[5.0 -1
6.0 -1.1]
How can we code in Julia for obtaining this last array?
Assume:
x = [5.0 3.5
6.0 3.6
7.0 3.0]
y = [5.0 4.5
6.0 4.7
8.0 3.0]
Again there are many ways to do it. Using DataFrames you can write:
using DataFrames
df = innerjoin(DataFrame(x, [:id, :x]), DataFrame(y, [:id, :y]), on=:id)
df = [df.id df.x-df.y]
## 2×2 Matrix{Float64}:
## 5.0 -1.0
## 6.0 -1.1
You could also convert original arrays to dictionaries and work with them:
dx = Dict(x[i,1] => x[i,2] for i in 1:size(x, 1))
dy = Dict(y[i,1] => y[i,2] for i in 1:size(y, 1))
ks = sort!(collect(intersect(keys(dx), keys(dy))))
[ks [dx[k]-dy[k] for k in ks]]
## 2×2 Matrix{Float64}:
## 5.0 -1.0
## 6.0 -1.1
The difference between those two methods is how they would handle duplicates in either x or y in the first column. The first will produce all combinations, the second will store only last value for each key.
A solution without DataFrames.jl is
julia> idx = findall(x[:,1] .== y[:,1]) # findall match of 1st col
2-element Vector{Int64}:
1
2
julia> [x[idx,1] (x-y)[idx,2]]
2×2 Matrix{Float64}:
5.0 -1.0
6.0 -1.1