Transform two arrays in to one data frame in R - arrays

I have two arrays coming from a postgreSQL database as following.
iarray
{9.467182035,9.252423958,9.179368178,9.142931845,9.118895803,9.098669713,9.093398102,9.092035392,9.091328028,9.090594437,9.090000456,9.089253543......keeps going on}
varray
{-1.025945126,-0.791203874,-0.506481774,-0.255416444,-0.028424464,0.188855034,0.390787963,0.579327969,0.761521769 ...keeps going on}
Both arrays have equal number of entries. I want to convert these to a data frame hence I can draw a graph of i over v.
How should I proceed?
I tried n<-gsub("^\\{+(.+)\\}+$", '\\1', iarray) to remove the {} and
n2 <- strsplit(n, ",") to remove the commas.

Assuming you are getting iarray & varray as strings :
iarray = "{9.467182035,9.252423958,9.179368178,9.142931845}"
varray = "{-1.025945126,-0.791203874,-0.506481774,-0.255416444}"
n<-gsub("^\\{+(.+)\\}+$", '\\1', iarray)
n1 <- strsplit(n,",")
n1 <- unlist(n1)
df <- as.data.frame(n1)
n<-gsub("^\\{+(.+)\\}+$", '\\1', varray)
n2 <- strsplit(n,",")
n2 <- unlist(n2)
df <- cbind(df,n2)

This seems one of the few occasions to correctly use eval(parse()):
df<-list(iarray,varray)
df<-data.frame(lapply(df,
function(x) eval(parse(text=sub("\\}$",")",sub("^\\{","c(",x))))
))
names(df)<-c("iarray","varray")
We just replace the { with (, add a c at the beginning and iarray and varray become command lines to create vectors which we parse and eval.

Related

How to bind arrays by columns in R...?

Looking for the smartest and shortest way to bind the following two arrays in R. Excepted output is also given...
#Creating two arrays
(a <- array(c(1:9,19:27), dim = c(3,3,2)))
(b <- array(c(10:18,28:36), dim = c(3,3,2)))
#Wanted array by binding a and b
(wanted <- array(1:36, dim = c(3,6,2)))
Thanks a lot...
We can use abind
library(abind)
res <- abind(a, b, along = 2)
attr(res, "dimnames") <- NULL
identical(res, wanted)
#[1] TRUE

Optimizing function speed on 3D array

I am applying a user-defined function to individual cells of a 3D array. The contents of each cell are one of the following possibilities, all of which are character vectors because of prior formatting:
"N"
"A"
""
"1"
"0"
I want to create a new 3D array of the same dimensions, where cells contain either NA or a numeric vector containing 1 or 0. Thus, I wrote a function named Numericize and used aaply to apply it to the entire array. However, it takes forever to apply it.
Numericize <- function(x){
if(!is.na(x)){
x[x=="N"] <- NA; x
x[x=="A"] <- NA; x
x[x==""] <- NA; x
x <- as.integer(x)
}
return(x)
}
The dimensions original array are 480x866x366. The function takes forever to apply using the following code:
Final.Daily.Array <- aaply(.data = Complete.Daily.Array,
.margins = c(1,2,3),
.fun = Numericize,
.progress = "text")
I am unsure if the speed issue comes from an inefficient Numericize, an inefficient aaply, or something else entirely. I considered trying to set up parallel computing using the plyr package but I wouldn't think that such a simple command would require parallel processing.
On one hand I am concerned that I created a stack overflow for myself (see this for more), but I have applied other functions to similar arrays without problems.
ex.array <- array(dim = c(3,3,3))
ex.array[,,1] <- c("N","A","","1","0","N","A","","1")
ex.array[,,2] <- c("0","N","A","","1","0","N","A","")
ex.array[,,3] <- c("1","0","N","A","","1","0","N","A")
desired.array <- array(dim = c(3,3,3))
desired.array[,,1] <- c(NA,NA,NA,1,0,NA,NA,NA,1)
desired.array[,,2] <- c(0,NA,NA,NA,1,0,NA,NA,NA)
desired.array[,,3] <- c(1,0,NA,NA,NA,1,0,NA,NA)
ex.array
desired.array
Any suggestions?
You can just use a vectorized approach:
ex.array[ex.array %in% c("", "N", "A")] <- NA
storage.mode(ex.array) <- "integer"
You can simply use the second line and it will introduce NAs by coercion.

R Multiply second dimension of 3D Array by a Vector for each of the 3rd dimension

When trying to multiply the first dimension of an array by each index of a vector by the second dimension, my array is converted to a matrix and things get squirrelly. I can only do the proper multiplication long-hand.
What a mouth full...
It's easier to explain with code...
Arr <- array(runif(10*5*3), dim = c(10,5,3))
dim(Arr)
Vect <- c(1:5)
Arr[,1,1] <- Arr[,1,1]*Vect[1]
Arr[,1,2] <- Arr[,1,2]*Vect[1]
Arr[,1,3] <- Arr[,1,3]*Vect[1]
Arr[,2,1] <- Arr[,2,1]*Vect[2]
Arr[,2,2] <- Arr[,2,2]*Vect[2]
Arr[,2,3] <- Arr[,2,3]*Vect[2]
Arr[,3,1] <- Arr[,3,1]*Vect[3]
Arr[,3,2] <- Arr[,3,2]*Vect[3]
Arr[,3,3] <- Arr[,3,3]*Vect[3]
Arr[,4,1] <- Arr[,4,1]*Vect[4]
Arr[,4,2] <- Arr[,4,2]*Vect[4]
Arr[,4,3] <- Arr[,4,3]*Vect[4]
Arr[,5,1] <- Arr[,5,1]*Vect[5]
Arr[,5,2] <- Arr[,5,2]*Vect[5]
Arr[,5,3] <- Arr[,5,3]*Vect[5]
How do I clean this up to be one command?
Try:
sweep(Arr,2,Vect,FUN="*")
Cast Vect into an array first, then element multiply:
varr <- aperm(array(Vect, dim = c(5L, 10L, 3L)), perm = c(2L, 1L, 3L))
Arr <- varr * Arr
(of course we don't need to store varr if you want this in one command)
(also, turns out this is basically what sweep does under the hood...)
The aaply() function from the plyr package does exactly what you're looking for. It can operate on arrays of any dimension and split them however you like. In this case you're splitting by rows so:
library(plyr)
Arr2 <- aaply(Arr, 1, function(x,y){x*y}, Vect)
We can also replicate the 'Vect' and multiply with 'Arr'. The col is a convenient function that gives the numeric index of columns.
res1 <- Arr * Vect[col(Arr[,,1])]
Or we explicitly do the rep
res2 <- Arr* rep(Vect, each=dim(Arr)[1])
identical(res1, res2)
#[1] TRUE

How to create sub-arrays access the i-th dimension of an array within for()?

In a for-loop, I run in i over an array which I would like to sub-index in dimension i. How can this be done? So a minimal example would be
(A <- array(1:24, dim = 2:4))
A[2,,] # i=1
A[,1,] # i=2
A[,,3] # i=3
where I index 'by foot'. I tried something along the lines of this but wasn't successful. Of course one could could create "2,," as a string and then eval & parse the code, but that's ugly. Also, inside the for loop (over i), I could use aperm() to permute the array such that the new first dimension is the former ith, so that I can simply access the first component. But that's kind of ugly too and requires to permute the array back. Any ideas how to do it more R-like/elegantly?
The actual problem is for a multi-dimensional table() object, but I think the idea will remain the same.
Update
I accepted Rick's answer. I just present it with a for loop and simplified it further:
subindex <- c(2,1,3) # in the ith dimension, we would like to subindex by subindex[i]
for(i in seq_along(dim(A))) {
args <- list(1:2, 1:3, 1:4)
args[i] <- subindex[i]
print(do.call("[", c(list(A), args)))
}
#Build a multidimensional array
A <- array(1:24, dim = 2:4)
# Select a sub-array
indexNumber = 2
indexSelection = 1
# Build a parameter list indexing all the elements of A
parameters <- list(A, 1:2, 1:3, 1:4)
# Modify the appropriate list element to a single value
parameters[1 + indexNumber] <- indexSelection
# select the desired subarray
do.call("[", parameters)
# Now for something completely different!
#Build a multidimensional array
A <- array(1:24, dim = 2:4)
# Select a sub-array
indexNumber = 2
indexSelection = 1
reduced <- A[slice.index(A, indexNumber) == indexSelection]
dim(reduced) <- dim(A)[-indexNumber]
# Also works on the left-side
A[slice.index(A, 2)==2] <- -1:-8

How to read multiple files into a multi-dimensional array

I want to make array in 3 dimension.
Here is what I tried:
z<-c(160,720,420)
first_data_set <-array(dim = length(file_1), dimnames = z)
Data that I am reading is in one level. (only x and y)
There are other data in the same format, and I need to put them in the same array with the first data. So once I finish reading all data, all of them are in the same array but there is no overwriting.
So I think array has to be 3 dimensions; otherwise I cannot keep all data that I read in loop.
Say that you have two matrices of size 3x4:
m1 <- matrix(rnorm(12), nrow = 3, ncol = 4)
m2 <- matrix(rnorm(12), nrow = 3, ncol = 4)
If you want to place them in an array, first make an array of NA's:
A <- array(as.numeric(NA), dim = c(3,4,2))
Then populate the layers with data:
A[,,1] <- m1
A[,,2] <- m2
As suggested by #Justin, you could also just put the matrices together in a list:
A2 <- list()
A2[['m1']] <- m1
A2[['m2']] <- m2
To read matrices from files: using a list makes it easier to get these matrices from files in a directory, without having to specify the dimensions in advance. Assume you want all files with extension csv:
myfiles <- dir(pattern = ".csv")
for (i in 1:length(myfiles)){
A2[[myfiles[i]]] <- read.table(myfiles[i], sep = ',')
}

Resources