I need to convert a 3 dimensional array into a data.frame.
Example:
#create fake data
#two vectors
vec1 = c(2,13,22,98,4,8,8,1,10)
vec2 = c(2,4,6,7,1,55,32,12,1)
#3 dim array
result = array(c(vec1,vec2),dim = c(3,3,2))
print(result)
, , 1
[,1] [,2] [,3]
[1,] 2 98 8
[2,] 13 4 1
[3,] 22 8 10
, , 2
[,1] [,2] [,3]
[1,] 2 7 32
[2,] 4 1 12
[3,] 6 55 1
How can I get the following 9 col data.frame, where letters are colnames (it could be also default values..) and each row represents the result[,,i] slice:
a b c d e f g h i
2 13 22 98 4 8 8 1 10
2 4 6 7 1 55 32 12 1
my real array dim = 140, 200, 20000
thanks
arrays and matrices are vectors with dimensions, so recast to a matrix, then data.frame:
data.frame(matrix(result, nrow=2, byrow=TRUE))
# generalisably:
data.frame(matrix(result, nrow=dim(result)[3], byrow=TRUE))
# X1 X2 X3 X4 X5 X6 X7 X8 X9
#1 2 13 22 98 4 8 8 1 10
#2 2 4 6 7 1 55 32 12 1
You could try mapply:
as.data.frame(mapply(function(x, y) c(x, y), result[,,1], result[,,2]))
# V1 V2 V3 V4 V5 V6 V7 V8 V9
#1 2 13 22 98 4 8 8 1 10
#2 2 4 6 7 1 55 32 12 1
Related
I have the array A
A
,,A
[,1] [,2] [,3]
[1,] 3 7 8
[2,] 4 11 9
[3,] 2 12 4.3
,,B
[,1] [,2] [,3]
[1,] 31 7 8
[2,] 4.2 4 9.5
[3,] 1 1 7
,,C
[,1] [,2] [,3]
[1,] 4 71 8.3
[2,] 4 41 9
[3,] 11 0 73
,,D
[,1] [,2] [,3]
[1,] 7 7 8.3
[2,] 3 4.1 9
[3,] 1 0.5 73
dim(A)
3 3 4
dimnames(A)[3]
A B C D
and I have the data.frame df
df
X Y Z
2 1 A
3 2 D
I would like to put in a new column of df, the values of array A, based on df index-values X(row for the array), Y(column for the array) and third dimension Z, let's say, my aspect result is:
df
X Y Z Res
2 1 A 4 # Res is the value of array A in A[2,1,"A"]
3 2 D 0.5 # Res is the value of array A in A[3,2,"D"]
I tried this code:
df$Res <- NA
if (df$Z == dimnames(A)[3]){
for (i in 1:nrow(df)){
df[i,4] <- A[df[i,1],df[i,2],df[i,3]]
}
}
But it'doesn't work well...
Any idea to associate the dimnames of third dimension array and data frame index-value?
P.S. This is a simple example. My true array is:
dim(A)
137 93 227
and
dim(df)
6080 3
P.S.2 I prefer to don't use merge or other type of similar code for allocation problem
I am working on a partial differential equations project where some N-dimensional objects are required. I got stuck in padding one N-dim object with a copy of each of its dimensions. Here is the function:
padreplicate <- function(a, padSize) {
# Pad an array by replicating values.
numDims <- length(padSize)
idx <- vector("list", numDims)
for (k in 1:numDims) {
M <- dim(a)[k] # 32
onesVector <- ones(1, padSize[k])
idx[[k]] <- c(onesVector, 1:M, M * onesVector)
}
# return(a[ unlist(idx[1]), unlist(idx[2]) ]) # this works for 2D
# return(a[ idx[[1]], idx[[2]] ]) # this also works for 2D
# return(a[apply(idx, 1, function(x) unlist[x])]) #:( doesn't work
# a[sapply(apply(idx, 1:length(dim(idx)), function(x) eval(parse(text=x))), unlist)] #:(
# 2D: "a[idx[[1]], idx[[2]]]" 3D: "a[idx[[1]], idx[[2]]], idx[[3]]]"
dim_text = paste0("a", "[ ",
paste0(sapply(1:length(idx), function(x)
paste0("idx", "[[", x, "]]")), collapse = ", ")," ]")
eval(parse(text=dim_text)) # this works
}
The first argument is the N-dim object; it could be a matrix, a 3D array or higher dimensional array.
An example for a 2D object or matrix would be:
# pad a matrix 4x3 with c(1,1)
set.seed(123456)
mx = matrix(sample.int(9, size = 9*100, replace = TRUE), nrow = 4, ncol = 3)
mx
# [,1] [,2] [,3]
# [1,] 8 4 9
# [2,] 7 2 2
# [3,] 4 5 8
# [4,] 4 1 6
padreplicate(mx, c(1,1))
The padded matrix looks like this:
[,1] [,2] [,3] [,4] [,5]
[1,] 8 8 4 9 9
[2,] 8 8 4 9 9
[3,] 7 7 2 2 2
[4,] 4 4 5 8 8
[5,] 4 4 1 6 6
[6,] 4 4 1 6 6
For a 3-D array the input array and the padded array.
ar = array(sample.int(9, size = 9*100, replace = TRUE), dim = c(3, 3, 1))
ar
padreplicate(ar, c(1,1,1))
# input 3x3x1 array
, , 1
[,1] [,2] [,3]
[1,] 3 4 6
[2,] 7 2 9
[3,] 3 4 8
# padded 5x5x3 array
, , 1
[,1] [,2] [,3] [,4] [,5]
[1,] 3 3 4 6 6
[2,] 3 3 4 6 6
[3,] 7 7 2 9 9
[4,] 3 3 4 8 8
[5,] 3 3 4 8 8
, , 2
[,1] [,2] [,3] [,4] [,5]
[1,] 3 3 4 6 6
[2,] 3 3 4 6 6
[3,] 7 7 2 9 9
[4,] 3 3 4 8 8
[5,] 3 3 4 8 8
, , 3
[,1] [,2] [,3] [,4] [,5]
[1,] 3 3 4 6 6
[2,] 3 3 4 6 6
[3,] 7 7 2 9 9
[4,] 3 3 4 8 8
[5,] 3 3 4 8 8
These are all correct results.
My question is this: "is there a better way of doing this N-dim padding operation?
EDIT 1
Thanks to Frank, now the N-dimensional padreplicate function:
padreplicate <- function(a, padSize) {
# Pad an array by replicating values.
numDims <- length(padSize)
idx <- vector("list", numDims)
for (k in 1:numDims) {
M <- dim(a)[k] # 32
onesVector <- ones(1, padSize[k])
idx[[k]] <- c(onesVector, 1:M, M * onesVector)
}
do.call( `[`, c(list(a), idx))
}
EDIT 2
Using matrix instead of my function ones. Sorry about that.
padreplicate <- function(a, padSize) {
# Pad an array by replicating values.
numDims <- length(padSize)
idx <- vector("list", numDims)
for (k in 1:numDims) {
M <- dim(a)[k] # 32
onesVector <- matrix(1, 1, padSize[k])
idx[[k]] <- c(onesVector, 1:M, M * onesVector)
}
do.call( `[`, c(list(a), idx))
}
I need to create a 3D array sorted by row, from left to right and descendent.
x <- 100
I have tried with this:
b <- array(1:96, dim= c(8,4,3))
but it sorts firstly descendently. Using apperm(b) doesn't work as well
The result I want is this:
, , 1
1 2 3 4 5
6 7 8 9 10
11 12 13 14
15 16 17 18
19 20 21 22
array by default fill values along 1st dimension, then 2nd dimension, then 3rd; What you are looking for is fill it in the order of (2nd, 1st, 3rd), you can initialize the array with the shape of 1st dimension and 2nd dimension switched and then use aperm on it:
b <- aperm(array(1:96, dim= c(4,8,3)), c(2,1,3))
# ^ ^ ^ ^ switch the dimension twice here
b
, , 1
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
Edit: My first try, but #Psidom's answer is the right way to do this.
You need to make it as a combination of 3 matrices and then combine them into an array. In the code below I used 96*i/3 to make it flexible for more than 3 matrices to be combined.
b <- array( c( aperm(array(1:(96*1/3), dim = c(4,8))),
aperm(array(33:(96*2/3), dim = c(4,8))),
aperm(array(65:(96*3/3), dim = c(4,8))) ) ,
dim = c(8, 4, 3))
This will be the output:
b[, , 1]
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
# [2,] 5 6 7 8
# [3,] 9 10 11 12
# [4,] 13 14 15 16
# [5,] 17 18 19 20
# [6,] 21 22 23 24
# [7,] 25 26 27 28
# [8,] 29 30 31 32
I have a 3D array in R constructed as (although the names don’t seem to show up):
v.arr <- array(1:18, c(2,3,3), dimnames = c("A", "B", "X",
"Y","Z","P","Q","R"))
and it shows up like this when printed to the screen:
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
, , 3
[,1] [,2] [,3]
[1,] 13 15 17
[2,] 14 16 18
I write it out to a file using:
write.table(v.arr, file = “Test Data”)
I then read it back in with:
test.data <- read.table(“Test Data”)
and I get this:
X1 X2 X3 X4 X5 X6 X7 X8 X9
1 1 3 5 7 9 11 13 15 17
2 2 4 6 8 10 12 14 16 18
Obviously, I need to do something to either structure the file before writing or restructure it on the read-back to get back the 3D array. I can always restructure the data that I get from reading. Is that the best approach? Thanks in advance.
Your issue is that you are using write.table to do this, so it is (I believe) coercing your array to a table. If you are looking to save it and don't mind that it would be in an R-specific format, you can easily use the save and load functions.
save(v.arr,file = "~/Desktop/v.arr.RData")
rm(list=ls())
load("~/Desktop/v.arr.RData")
v.arr
at first i have a matrix like this :
x <- matrix(rnorm(1e3),260)
and then an Array
lst <- lapply(seq(1,length(x[,1]), by=52), function(i) x[i:(i+51),])
Data_array <- array(unlist(lst), dim=c(52,length(x[1,]),(length(x[,1])/52)))
This array is a sequence of the Dataframe by 52 (weeks).
It's a temporal analysis (weekly)
I would like to compute an ecdf function on this array.
, , 1
[,1] [,2] [,3]
[1,] **0.66319631** 0.01004290 0.02133477
[2,] -1.64273648 0.23105503 1.02862145
[3,] 1.17083363 -0.49700717 -0.01119745
, , 2
[,1] [,2] [,3]
[1,] **-0.79365987** 1.28394049 -0.547763434
[2,] -0.09221301 1.07676841 0.570294731
[3,] 0.20293308 1.00182888 0.247373981
, , 3
[,1] [,2] [,3]
[1,] **1.03862172** -0.961678683 1.25334651
[2,] 0.58476540 0.745250484 -0.06183788
[3,] 0.24057690 1.226575038 0.23363005
compute ecdf function for each cell. It's for a weekly seasonal analysis.
i.e. calcul quantile for this time series (**): 0.66319631;-0.79365987;1.03862172
for MEAN it's works :
array_lag_sum<-apply(Data_array,c(1,2),FUN=function(x){mean(x,na.rm=TRUE)})
i tried a similar function whith ecdf, but it doesn't work.
percent_array<-apply(Data_array,c(1,2),FUN=function(u){ecdf(u)(u)})
Then...it is not finish, i would like to reformat this array like the original format of the data dataframe (x). (like a rbind but on an array.)
Thank you so much for your help.
edit :
sorry, but i don't know if i was so clear. It's sur that array is complicated for me;
but with your method, if i have this simple data frame :
B <- matrix(seq(1,20), 20, 3)
> B
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 2
[3,] 3 3 3
[4,] 4 4 4
[5,] 5 5 5
[6,] 6 6 6
[7,] 7 7 7
[8,] 8 8 8
[9,] 9 9 9
[10,] 10 10 10
[11,] 11 11 11
[12,] 12 12 12
[13,] 13 13 13
[14,] 14 14 14
[15,] 15 15 15
[16,] 16 16 16
[17,] 17 17 17
[18,] 18 18 18
[19,] 19 19 19
[20,] 20 20 20
Your function gives :
Data_array <- array( B, dim=c(10,3,5))
, , 1
[,1] [,2] [,3]
[1,] 1 11 1
[2,] 2 12 2
[3,] 3 13 3
[4,] 4 14 4
[5,] 5 15 5
[6,] 6 16 6
[7,] 7 17 7
[8,] 8 18 8
[9,] 9 19 9
[10,] 10 20 10
, , 2
[,1] [,2] [,3]
[1,] 11 1 11
[2,] 12 2 12
[3,] 13 3 13
[4,] 14 4 14
[5,] 15 5 15
[6,] 16 6 16
[7,] 17 7 17
[8,] 18 8 18
[9,] 19 9 19
[10,] 20 10 20
or i would more something like this :
,,1
[,1] [,2] [,3]
[1,] 1 1 1
[2,] 2 2 2
[3,] 3 3 3
[4,] 4 4 4
[5,] 5 5 5
[6,] 6 6 6
[7,] 7 7 7
[8,] 8 8 8
[9,] 9 9 9
[10,] 10 10 10
,,2
[,1] [,2] [,3]
[1,] 11 11 11
[2,] 12 12 12
[3,] 13 13 13
[4,] 14 14 14
[5,] 15 15 15
[6,] 16 16 16
[7,] 17 17 17
[8,] 18 18 18
[9,] 19 19 19
[10,] 20 20 20
and get in result a table which is the percentile value of the time series.
percentile values of 1 and 11, 2 and 12 for each column and each row (i know it's not pertinent but it's just for exemple)
Sorry if my last question was not understandable
The answer is:
ecdf_mat <- apply( Data_array, 1:2, ecdf)
This passes values from each combination of the first two indices to the the function, ecdf. Each of those passes will return a function into a matrix location. You are getting something most people will not be able to use without a bit of coaching: one 52 x 4 matrix of functions. The functions are contained in lists which are valid matrix or array elements:
> dim(apply( Data_array, 1:2, ecdf) )
[1] 52 4
To access them you need to first pull them out of the matrix with standard "[" indexing but then pull them out of the list container with a call to "[[1]]":
> str(apply( Data_array, 1:2, ecdf)[1,1] )
List of 1
$ :function (v)
..- attr(*, "class")= chr [1:3] "ecdf" "stepfun" "function"
..- attr(*, "call")= language FUN(newX[, i], ...)
> apply( Data_array, 1:2, ecdf)[1,1][[1]]
Empirical CDF
Call: FUN(newX[, i], ...)
x[1:5] = -0.92217, -0.37471, 0.058284, 0.28502, 0.44391
> apply( Data_array, 1:2, ecdf)[1,1][[1]](0)
[1] 0.4
Edit:------
It appears you don't want the ecdf's themselves (despite getting no response to my efforts at getting you to recognize the distinction), but rather want an identically shaped array with the percentile values for the i-j positions considered as individual length k-sequences. I can think of two ways to do this. The first one would use that matrix of ecdf functions I built and demonstrated, but I believe that is the more baroque method and it would be easier to give you a more direct route. I've take the liberty of making this more manageable by making the long first dimension only 10-long.
x <- matrix(rnorm(1e3),260)
lst <- lapply(seq(1,length(x[,1]), by=10), function(i) x[i:(i+51),])
Data_array <- array(unlist(lst), dim=c(10,length(x[1,]),(length(x[,1])/52
pctiles2 <- apply( Data_array, 1:2, function(x) ecdf(x)(x) )
> str(pctiles2)
num [1:5, 1:10, 1:4] 0.8 0.4 0.6 0.2 1 0.4 1 0.2 0.6 0.8 ...
They aren't actually percentiles, but that could be easily remedied by slipping a 100* in from of the ecdf call (or multiplying the result by 100.. You will notice that the structure has been permuted so that the quantile/percentiles sequences run down the first column. That because apply always delivers its result in column major order. There is a function aperm which would allow you to re-arrange these in the original order:
re_pctiles <- aperm(pctiles, c(2,3,1) )