apply sum along subsets of array 3rd dimension - arrays

I have the following objects:
A: 1 array with x,y,z, dimensions -> containing a variable (Temperature)
B & C: 2 arrays with x,y dimensions -> containing the indices of vectors along A's z dimension
A <- array(rnorm(n = 12*4*3*5), dim = c(4,3,5))
dimnames(A) <- list("x" = c(1:4), "y" = c(1:3), "z" = c(1:5))
B <- matrix(rep(c(2:1), 6), nrow = 4)
dimnames(B) <- list("x" = c(1:4), "y" = c(1:3))
C <- matrix(rep(c(4:5), 6), nrow = 4)
dimnames(C) <- list("x" = c(1:4), "y" = c(1:3))
I'm looking for a way to apply sum of A across the z dimension only between the indices indicated by B and C.
If instead of the 3D-array I had a vector I would solve it like this:
> A <- round(c(rnorm(5)), 1)
> B <- 2 #index of first value to sum
> C <- 4 #index of last value to sum
> vindex <- seq(B,C,1)
> A
[1] 0.0 -0.9 -1.1 -1.7 -0.4
> vindex
[1] 2 3 4
> sum(A[vindex])
[1] -3.7
>
# or better with a function
> foo <- function(x, start_idx, end_idx) {
+ vidx <- seq(start_idx, end_idx, 1)
+ return(sum(x[vidx]))
+ }
>
> foo(A,B,C)
[1] -3.7
Unfortunately seq() does not accept vectors as arguments and therefore it's not straightforward to use the apply function. If again were A[x,y,z] and B and C[x,y]:
> apply(A,c(1,2),foo,B,C)
Error in seq.default(start_idx, end_idx, 1) : 'from' must be of length 1
Called from: seq.default(start_idx, end_idx, 1)
It would be great if anybody knew how to make this function workable with apply or with other clean solutions.
Thanks a lot!

This is not a very nice task for base R, and I would prefer to implement it in C++ in the absence of a package that already does so (?).
Logically speaking, a plain but vectorized solution to your problem could be structured as:
# initialize index array
D <- array(
1,
dim = c(4,3,5),
dimnames = list(x = letters[1:4], y = letters[1:3], z = letters[1:5])
)
# set indices out of bounds to zero
E <- rep(1:5, each = 4*3)
BB <- rep(B, times = 5)
D[E < BB] <- 0
CC <- rep(C, times = 5)
D[E > CC] <- 0
# multiply with index array and sum
apply(A * D, c(1,2), sum)

Related

equivalent of numpy.c_ in julia

Hi I am going through the book https://nnfs.io/ but using JuliaLang (it's a self-challenge to get to know the language better and use it more often.. rather than doing the same old same in Python..)
I have come across a part of the book in which they have custom wrote some function and I need to recreate it in JuliaLang...
source: https://cs231n.github.io/neural-networks-case-study/
python
N = 100 # number of points per class
D = 2 # dimensionality
K = 3 # number of classes
X = np.zeros((N*K,D)) # data matrix (each row = single example)
y = np.zeros(N*K, dtype='uint8') # class labels
for j in range(K):
ix = range(N*j,N*(j+1))
r = np.linspace(0.0,1,N) # radius
t = np.linspace(j*4,(j+1)*4,N) + np.random.randn(N)*0.2 # theta
X[ix] = np.c_[r*np.sin(t), r*np.cos(t)]
y[ix] = j
# lets visualize the data:
plt.scatter(X[:, 0], X[:, 1], c=y, s=40, cmap=plt.cm.Spectral)
plt.show()
my julia version so far....
N = 100 # Number of points per class
D = 2 # Dimensionality
K = 3 # Number of classes
X = zeros((N*K, D))
y = zeros(UInt8, N*K)
# See https://docs.julialang.org/en/v1/base/math/#Base.range
for j in range(0,length=K)
ix = range(N*(j), length = N+1)
radius = LinRange(0.0, 1, N)
theta = LinRange(j*4, (j+1)*4, N) + randn(N)*0.2
X[ix] = ????????
end
notice the ??????? area because I am now trying to decipher if Julia has an equivalent for this numpy function
https://numpy.org/doc/stable/reference/generated/numpy.c_.html
Any help is appreciated.. or just tell me if I need to write something myself
This is a special object to provide nice syntax for column concatanation. In Julia this is just built into the language hence you can do:
julia> a=[1,2,3];
julia> b=[4,5,6];
julia> [a b]
3×2 Matrix{Int64}:
1 4
2 5
3 6
For your case the Julian equivalent of np.c_[r*np.sin(t), r*np.cos(t)] should be:
[r .* sin.(t) r .* cos.(t)]
To understand Python's motivation you can also have a look at :
numpy.r_ is not a function. What is it?
The equivalent of numpy.c_ would seem to be horizontal concatenation, which you can do with either the hcat function or with (e.g.) simply [a b]. Fixing a few other issues with the translation so far, we end up with
N = 100 # Number of points per class
D = 2 # Dimensionality
K = 3 # Number of classes
X = zeros(N*K, D)
y = zeros(UInt8, N*K)
for j in range(0,length=K)
ix = (N*j+1):(N*(j+1))
radius = LinRange(0.0, 1, N)
theta = LinRange(j*4, (j+1)*4, N) + randn(N)*0.2
X[ix,:] .= [radius.*sin.(theta) radius.*cos.(theta)]
y[ix] .= j
end
# visualize the data:
using Plots
scatter(X[:,1], X[:,2], zcolor=y, framestyle=:box)

filtering 3D array using data in similar array

I have two 3-D arrays, one of which contains data and the other contains metadata. The metadata is a date signature, so an example can be produced with the following:
datamatrix <- array(data = c(rep(0,9), rep(0,9),(sample(0:100, 9)/1000), (sample(30:50, 9)/100), (sample(70:80,9)/100), (sample(30:50,9)/100), rep(0,9), rep(0,9)), dim = c(3,3,8))
timematrix <- array(data = c(sample(1:20), sample(30:50, 9), sample(70:90, 9), sample(110:130,9), sample(150:170,9), sample(190:210,9), sample(230:250,9), sample(260:280,9)), dim = c(3,3,8))
I wish to construct a new 3D array filled with the data from the first matrix (datamatrix) and a bunch of NA's such that the element i in the datamatrix falls into its corresponding date (derived from the corresponding metadata in timematrix) in a final workingdata 3D array like so:
workingdata <- array(data = NA,
dim = c(3,3,365))
for (i in 1:length(datamatrix)){
location <- i
locationguide <- location%%9
locationfinal <- locationguide%%3
if (locationfinal == 0){
a <- 3
b <- 3
}
if (locationfinal == 1){
a <- 1
b <- 1
}
if (locationfinal == 2){
a <- 1
b <- 2
}
if (locationfinal == 3){
a <- 1
b <- 3
}
if (locationfinal == 4){
a <- 2
b <- 1
}
if (locationfinal == 5){
a <- 2
b <- 2
}
if (locationfinal == 6){
a <- 2
b <- 3
}
if (locationfinal == 7){
a <- 3
b <- 1
}
if (locationfinal == 8){
a <- 3
b <- 2
}
value <- datamatrix[i]
day <- timematrix[i]
workingdata[a,b,day] <- datamatrix[i]
}
The dataset I'm working with is thousands of columns wide and equivalently long-rowed. The current method does the job, however it would take forever using a for loop in the actual data, and coding it would be ridiculous because of all of the if's this requires. Does anyone know of a better method for filtering data such as this?
For a viewer-friendly concept of what I want, an image from ESRI best sums it up:
http://pro.arcgis.com/en/pro-app/tool-reference/space-time-pattern-mining/GUID-42A31756-6518-41E9-A900-2C892AF4023A-web.png
I'm shooting for a z-dimension for time, with one block per day, where observations fall into their appropriate row on the z-axis but remain in their original locations in the x- and y-dimension.
I'm not sure what your for loop is doing exactly, and it might perhaps not quite do what you are trying to do. Not sure. For example, check the results of ((1:30)%%9)%%3.
But, from your description of the problem, you might want to do something like this:
workingdata <- array(data = NA, dim = c(3,3,365))
for (i in 1:dim(datamatrix)[1]) {
for (j in 1:dim(datamatrix)[1]) {
workingdata[i, j, timematrix[i, j, ]] <- datamatrix[i, j, ]
}
}
Note that this won't work for when you have days that are 0 in your timematrix (like in your example data) since R has 1 based indexing.
Final answer: Axeman's solution works in 3 dimensions with the following approach:
workingdata <- array(data = NA, dim = c(3,3,365))
for (i in 1:dim(datamatrix)[1]) {
for (j in 1:dim(datamatrix)[2]) {
for(k in 1:dim(datamatrix)[3]){
workingdata[i, j, timematrix[i, j, k]] <- datamatrix[i, j, k]
}
}
}

How to merge two differently ordered arrays by columns

I am trying to cbind two differently ordered named arrays into a dataframe.
x = c("a" = 1, "z" = 10)
y = c("z" = 10, "a" = 1)
# Expected output:
# x y
# a 1 1
# z 10 10
I've tried the following and all ignored the arrays' names:
# Unexpected outputs:
cbind(x,y)
merge(as.data.frame(x),as.data.frame(y))
library(dplyr); bind_cols(as.data.frame(x),as.data.frame(y))
In principle, I know that I could transform the arrays into dataframe and then bind by row names, or I could match the names and index the arrays during binding.
I was wondering if there is a more straight forward way for such a straight forward task.
I came up with
x <- c("a" = 1, "z" = 10)
y <- c("z" = 10, "a" = 1)
cbind(x, "y"=y[names(x)])
> x y
>a 1 1
>z 10 10
May not be optimal but maybe it is enough for your purposes...

R - avoid nested for loops

I have the following function which takes 4 vectors. The T vector has a given length and all 3 other vectors (pga, Sa5Hz and Sa1Hz) have a given (identical but not necessarily equal to T) lenght.
The output is a matrix with length(T) rows and length(pga) columns.
My code below seems like the perfect example of what NOT to do, however, I could not figure out a way to optimize it using an apply function. Can anyone help?
designSpectrum <- function (T, pga, Sa5Hz, Sa1Hz){
Ts <- Sa1Hz / Sa5Hz
#By convention, if Sa5Hz is null, set Ts as 0.
Ts[is.nan(Ts)] <- 0
res <- matrix(NA, nrow = length(T), ncol = length(pga))
for (i in 1:nrow(res))
{
for (j in 1:ncol(res))
{
res[i,j] <- if(T[i] <= 0) {pga[j]}
else if (T[i] <= 0.2 * Ts[j]) {pga[j] + T[i] * (Sa5Hz[j] - pga[j]) / (0.2 * Ts[j])}
else if (T[i] <= Ts[j]) {Sa5Hz[j]}
else Sa1Hz[j] / T[i]
}
}
return(res)
}
Instead of doing a double for loop and processing each i and j value separately, you could use the outer function to process all of them in one shot. Since you're now processing multiple i and j values simultaneously, you could switch to the vectorized ifelse statement instead of the non-vectorized if and else statements:
designSpectrum2 <- function (T, pga, Sa5Hz, Sa1Hz) {
Ts <- Sa1Hz / Sa5Hz
Ts[is.nan(Ts)] <- 0
outer(1:length(T), 1:length(pga), function(i, j) {
ifelse(T[i] <= 0, pga[j],
ifelse(T[i] <= 0.2 * Ts[j], pga[j] + T[i] * (Sa5Hz[j] - pga[j]) / (0.2 * Ts[j]),
ifelse(T[i] <= Ts[j], Sa5Hz[j], Sa1Hz[j] / T[i])))
})
}
identical(designSpectrum(T, pga, Sa5Hz, Sa1Hz), designSpectrum2(T, pga, Sa5Hz, Sa1Hz))
# [1] TRUE
Data:
T <- -1:3
pga <- 1:3
Sa5Hz <- 2:4
Sa1Hz <- 3:5
You can see the efficiency gains by testing on rather large vectors (here I'll use an output matrix with 1 million entries):
# Larger vectors
set.seed(144)
T2 <- runif(1000, -1, 3)
pga2 <- runif(1000, -1, 3)
Sa5Hz2 <- runif(1000, -1, 3)
Sa1Hz2 <- runif(1000, -1, 3)
# Runtime comparison
all.equal(designSpectrum(T2, pga2, Sa5Hz2, Sa1Hz2), designSpectrum2(T2, pga2, Sa5Hz2, Sa1Hz2))
# [1] TRUE
system.time(designSpectrum(T2, pga2, Sa5Hz2, Sa1Hz2))
# user system elapsed
# 4.038 1.011 5.042
system.time(designSpectrum2(T2, pga2, Sa5Hz2, Sa1Hz2))
# user system elapsed
# 0.517 0.138 0.652
The approach with outer is almost 10x faster.

creating an array where every element is a list of varying lengths in R

I wish to create an array. Each element will be assigned to be a list, and each list will be of a different length (unknown before the script is executed). A simple example would be to let a[1] be the list q and a[2] be the list. Is there a construct that I can use, perhaps different than array, that would allow for such assignments.
q <- c(1,2,3,4,5)
w <- c(6,7,8)
a <- array(2)
a[1] <- q
Warning message:
In a[1] <- q : number of items to replace is not a multiple of replacement length
Since you want an array of lists, try:
a[1] <- list(q)
As has been pointed out in the comments, you are likely looking for list and not array (the latter being more akin to a multi-dimensional matrix or mathematical vector.)
However in addition to that is the issue if indexing:
In R there is a major difference between a[1] <- q and a[[1]] <- q
Try the following to spot the diff:
a <- list()
a[[1]] <- q
a[[2]] <- w
a
Compare with
a <- list()
a[1] <- q
a[2] <- w
a
I think what you want is a list of vectors.
q <- c(1,2,3,4,5)
w <- c(6,7,8)
a <- list()
a[[1]] <- q
list works - thanks! it allows me to partition an array of positions into a list of lists according to some separation cutoff.
delta <- 200
pcls <- list(nrow=pctot)
v <- posvec[1]
pcind <- 0
jtest <- 0
for (j in 2:nr) {
dist <- posvec[j]-posvec[j-1]
if (dist <= delta) {
v <- c(v,posvec[j])
jtest <- 1
}
if (dist > delta) {
if (jtest > 0) {
pcind <- pcind + 1
pcls[[pcind]] <- v
v <- posvec[j]
}
jtest <- 0
}
}

Resources