I need to convert a dataframe to an array of 3 dimensions. All columns in the dataframe are numeric. What is an elegant and/or efficient way to accomplish this?
Example:
x <- 1:3
y <- 1:3
g <- t(vapply(x, function(x){
vapply(y, function(y){
as.numeric(paste(x,y,sep="."))}, numeric(1))}, numeric(3)))
gdf <- data.frame( cbind(rep(1:3,each=3), rbind(g, g*2, g*3)) )
I want to convert "gdf" to an array where gdf$x1 defines the third dimension. The result would look like this:
ga <- array( c(g, g*2, g*3), dim=c(3,3,3) )
Thanks!
This works with your example, I hope it will be general enough for you:
gb <- aperm(array(unlist(gdf[, -1]), c(3, 3, 3)), c(1, 3, 2))
identical(ga, gb)
# [1] TRUE
I also found this way using the package abind:
abind( split(gdf, gdf$X1), along=3)
Related
I have a very long Array (1955x2417x1) in R where each position stores a list of two vector (named "max" and "min") of length 5.
I would like to find a simple way to create a multidimensional array (dim 1955x2417x5) where each position holds a single value from vector "max"
I have looked at answers such as array of lists in r
but so far without success.
I know I can access the list in each position of the array using
myarray[posX, PosY][[1]][["max"]]
but how to apply that to the whole Array?
SO far I have tried
newArray <- array( unlist(myarray[][[1]][["max"]]), c(1955, 2417, 5))
and
NewArray <-parApply(cl, myarray, c(1:2), function(x) {
a=x[[1]][["max"]]
} )
but the results are not right.
Do you have any suggestion?
Let
e <- list(min = 1:3, max = 4:6)
arr <- array(list(e)[rep(1, 8)], c(2, 4))
dim(arr)
# [1] 2 4
Then one option is
res <- apply(arr, 1:2, function(x) x[[1]][["max"]])
dim(res)
# [1] 3 2 4
and, if the order of dimensions matters,
dim(aperm(res, c(2, 3, 1)))
# [1] 3 2 4
So, I lately started working with R for a research I'm interested in, and I'm trying to create a multi dimensional array that would contain dataframe rows.
I have a large data frame containing many columns, that are either numeric, or strings. For the sake of simplicity, let's work with 3 columns:
thread_id: an integer number between 1 and 10100.
user_id: an integer number given to users.
post_name: a string that gives us the title of the post
I would like to create a datastructure, that's preferably a two dimensional array, where at the first dimension we have the thread_id, and at the second we have a row from the dataframe.
So, as a return to for
DataSet[1][1], I'd get thread_id: 1, user_id: 100, post_name: "some name 1"
DataSet[1][2], I'd get thread_id: 1, user_id: 101, post_name: "some name 2"
DataSet[5][10], I'd get thread_id: 5, user_id: 900, post_name: "some name 3"
Is this possible to do in R? I only have previous experiences with Java, and in that it is possible to solve with an array for Objects.
Thanks for all the help!
If, say, thread_id took on values 1 to 5, you could use:
mylist <- list()
for(i in 1:5)
mylist[[i]] <- myData[thread_id==i,]
You could of course use max(myData$thread_id) instead of 5...
Here is an alternative for you.
Assumption: df is a data.frame
convert.to.str <- function(df){
df_col <- names(df)
val <- unlist(df)
ans <- paste(df_col,val,sep=': ')
final_ans <- paste(ans,collapse=', ')
}
int_ans <- data.frame(thread_id = df$thread_id, ans = apply(df,1,convert.to.str), nrow2=1:nrow(df))
library(reshape2)
int_ans2 <- dcast(int_ans,thread_id ~ nrow2,value.var='ans')
DataSet <- int_ans2[2:ncol(int_ans2)]
dimnames(DataSet)[[1]] <- int_ans2$thread_id
When trying to multiply the first dimension of an array by each index of a vector by the second dimension, my array is converted to a matrix and things get squirrelly. I can only do the proper multiplication long-hand.
What a mouth full...
It's easier to explain with code...
Arr <- array(runif(10*5*3), dim = c(10,5,3))
dim(Arr)
Vect <- c(1:5)
Arr[,1,1] <- Arr[,1,1]*Vect[1]
Arr[,1,2] <- Arr[,1,2]*Vect[1]
Arr[,1,3] <- Arr[,1,3]*Vect[1]
Arr[,2,1] <- Arr[,2,1]*Vect[2]
Arr[,2,2] <- Arr[,2,2]*Vect[2]
Arr[,2,3] <- Arr[,2,3]*Vect[2]
Arr[,3,1] <- Arr[,3,1]*Vect[3]
Arr[,3,2] <- Arr[,3,2]*Vect[3]
Arr[,3,3] <- Arr[,3,3]*Vect[3]
Arr[,4,1] <- Arr[,4,1]*Vect[4]
Arr[,4,2] <- Arr[,4,2]*Vect[4]
Arr[,4,3] <- Arr[,4,3]*Vect[4]
Arr[,5,1] <- Arr[,5,1]*Vect[5]
Arr[,5,2] <- Arr[,5,2]*Vect[5]
Arr[,5,3] <- Arr[,5,3]*Vect[5]
How do I clean this up to be one command?
Try:
sweep(Arr,2,Vect,FUN="*")
Cast Vect into an array first, then element multiply:
varr <- aperm(array(Vect, dim = c(5L, 10L, 3L)), perm = c(2L, 1L, 3L))
Arr <- varr * Arr
(of course we don't need to store varr if you want this in one command)
(also, turns out this is basically what sweep does under the hood...)
The aaply() function from the plyr package does exactly what you're looking for. It can operate on arrays of any dimension and split them however you like. In this case you're splitting by rows so:
library(plyr)
Arr2 <- aaply(Arr, 1, function(x,y){x*y}, Vect)
We can also replicate the 'Vect' and multiply with 'Arr'. The col is a convenient function that gives the numeric index of columns.
res1 <- Arr * Vect[col(Arr[,,1])]
Or we explicitly do the rep
res2 <- Arr* rep(Vect, each=dim(Arr)[1])
identical(res1, res2)
#[1] TRUE
When I have an array with the dimension (i,j,k) and a matrix with the dimension (j,q). How could I multiply each (,,k) with that matrix. An example makes more sense.
A <- array(c(rep(1,20), rep(2,20), rep(3,20)),dim = c(10,2,3))
B <- matrix(c(1:10), nrow = 2)
# multiply each A[,,i]%*%B
C <- array(NA, dim=c(nrow(A), ncol(B), 3))
C[] <- apply(A, 3, function(x) x%*%B)
I could get the results in this way, but I am looking for a more efficient way, for example with the ATensor package. I hope someone could help me with this problem.
I have an array that can have one or more pages or sheets (my names for the third dimension). I am attempting to perform operations on the array. When there is only one sheet or page the result of the operation is a matrix. I would like the result to be an array. Is there a way to retain the class array even when the result of the operation has only 1 sheet or page?
Here is an example. I would like my.var.2 and my.var.3 to be arrays. The variable my.pages is set to 1 here, which seems to be causing the problem. However, my.pages can be >1. If my.pages <- 2 then my.var.2 and my.var.3 are arrays.
set.seed(1234)
my.rows <- 10
my.columns <- 4
my.pages <- 1
my.var.1 <- array( rnorm((my.rows*my.columns*my.pages), 10, 2),
c(my.rows,my.columns,my.pages))
my.var.1
my.var.2 <- 2 * my.var.1[,-my.columns,]
my.var.3 <- 10 * my.var.1[,-1,]
class(my.var.2)
class(my.var.3)
my.var.2 <- as.array(my.var.2)
my.var.3 <- as.array(my.var.3)
class(my.var.2)
class(my.var.3)
my.var.2 <- as.array( 2 * my.var.1[,-my.columns,])
my.var.3 <- as.array(10 * my.var.1[,-1,] )
class(my.var.2)
class(my.var.3)
The switch to matrix causes problems when I try to use my.var.1 and my.var.2 in nested for-loops.
The following if statement seems to solve the problem, but also seems a little clunky. Is there a more elegant solution?
if(my.pages == 1) {my.var.2 <- array(my.var.2, c(my.rows,(my.columns-1),my.pages))}
From help([):
Usage:
x[i, j, ... , drop = TRUE]
...
drop: For matrices and arrays. If 'TRUE' the result is coerced to
the lowest possible dimension (see the examples). This only
works for extracting elements, not for the replacement. See
'drop' for further details.
Your code, revisited:
set.seed(1234)
my.rows <- 10
my.columns <- 4
my.pages <- 1
my.var.1 <- array( rnorm((my.rows*my.columns*my.pages), 10, 2),
c(my.rows,my.columns,my.pages))
my.var.2 <- 2 * my.var.1[,-my.columns,,drop=FALSE]
my.var.3 <- 10 * my.var.1[,-1,,drop=FALSE]
class(my.var.2)
## [1] "array"
class(my.var.3)
## [1] "array"