How can I access components of list elements in R - arrays

I have a ragged list that I would like to work with. i.e. I would like to use an apply function to quickly and simply pull out elements from the lists. The following code attempts to approximate my situation:
vec1 <- c("B","D","E","NA")
vec2 <- c("B","D","E","NA")
vec3 <- c("B","C","E","NA")
write.table(vec1, file="./vec1.csv", sep=",", quote=F)
write.table(vec2, file="./vec2.csv", sep=",", quote=F)
write.table(vec3, file="./vec3.csv", sep=",", quote=F)
vectors.files <- list.files(path=getwd(),recursive=F, pattern=paste("*.csv",sep=""))
vectors.list <- lapply(vectors.files, read.csv)
How would I then be able to create a new object that was for example the second row of each list element in vectors.list?
Thanks,
Matt

It's not really clear what you're after as the final output format, but you might want to try variations on the following template:
lapply(vectors.list, function(x) x[2, , drop = FALSE])
# [[1]]
# x
# 2 D
#
# [[2]]
# x
# 2 D
#
# [[3]]
# x
# 2 C
Here, we've just passed an anonymous function (function(x)) to the items in your "vectors.list". In this case, we've used basic subsetting using [ to extract the second row. The drop = FALSE is to retain the data.frame structure since the result is a single-column data.frame (which normally simplifies to a vector).
Note that the data.frames in the resulting list still have all the original levels for the "x" factor. Use droplevels if you want to retain only the specific factor in that row.
Compare:
str(lapply(vectors.list, function(x) x[2, , drop = FALSE]))
# List of 3
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 3 levels "B","D","E": 2
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 3 levels "B","D","E": 2
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 3 levels "B","C","E": 2
str(lapply(vectors.list, function(x) droplevels(x[2, , drop = FALSE])))
# List of 3
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 1 level "D": 1
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 1 level "D": 1
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 1 level "C": 1
You may also want to explore as.character(unlist(x[2, ]).

If you store your vectors in a data frame you can subset.
> df <- data.frame(vectors.list)
> row2 <- df[2,]
> row2
x x.1 x.2
2 D D C

Related

Arranging a 3 dimensional contingency table in R in order to run a Cochran-Mantel-Haenszel analysis?

I am attempting to run a Mantel-Haenszel analysis in R to determine whether or not a comparison of proportions test is still significant when accounting for a 'diagnosis' ratio within groups. This test is available in the stats package.
library(stats)
mantelhaen.test(x)
Having done some reading, I've found that this test can perform an odds ratio test on a contingency table that is n x n x k, as opposed to simply n x n. However, I am having trouble arranging my data in the proper way, as I am fairly new to R. I have created some example data...
ex.label <- c("A","A","A","A","A","A","A","B","B","B")
ex.status <- c("+","+","-","+","-","-","-","+","+","-")
ex.diag <- c("X","X","Z","Y","Y","Y","X","Y","Z","Z")
ex.data <- data.frame(ex.label,ex.diag,ex.status)
Which looks like this...
ex.label ex.diag ex.status
1 A X +
2 A X +
3 A Z -
4 A Y +
5 A Y -
6 A Y -
7 A X -
8 B Y +
9 B Z +
10 B Z -
I was originally able to use a simple N-1 chi-square to run a comparison of proportions test of + to - for only the A and B, but now I want to be able to account for the ex.diag as well. I'll show a graph here for what I wanted to be looking at, which is basically to compare the significance of the ratio in each column. I was able to do this, but I now want to be able to account for ex.diag.
I tried to use the ftable() function to arrange my data in a way that would work.
ex.ftable <- ftable(ex.data)
Which looks like this...
ex.status - +
ex.label ex.diag
A X 1 2
Y 2 1
Z 1 0
B X 0 0
Y 0 1
Z 1 1
However, when I run mantelhaen.test(ex.ftable), I get the error 'x' must be a 3-dimensional array. How can I arrange my data in such a way that I can actually run this test?
In mantelhaen.test the last dimension of the 3-dimensional contingency table x needs to be the stratification variable (ex.diag). This matrix can be generated as follows:
ex.label <- c("A","A","A","A","A","A","A","B","B","B")
ex.status <- c("+","+","-","+","-","-","-","+","+","-")
ex.diag <- c("X","X","Z","Y","Y","Y","X","Y","Z","Z")
# Now ex.diag is in the first column
ex.data <- data.frame(ex.diag, ex.label, ex.status)
# The flat table
( ex.ftable <- ftable(ex.data) )
# ex.status - +
# ex.diag ex.label
# X A 1 2
# B 0 0
# Y A 2 1
# B 0 1
# Z A 1 0
# B 1 1
The 3D matrix can be generated using aperm.
# Trasform the ftable into a 2 x 2 x 3 array
# First dimension: ex.label
# Second dimension: ex.status
# Third dimension: ex.diag
( mtx3D <- aperm(array(t(as.matrix(ex.ftable)),c(2,2,3)),c(2,1,3)) )
# , , 1
#
# [,1] [,2]
# [1,] 1 2
# [2,] 0 0
#
# , , 2
#
# [,1] [,2]
# [1,] 2 1
# [2,] 0 1
#
# , , 3
#
# [,1] [,2]
# [1,] 1 0
# [2,] 1 1
Now the Cochran-Mantel-Haenszel chi-squared test can be performed.
# Cochran-Mantel-Haenszel chi-squared test of the null that
# two nominal variables are conditionally independent in each stratum
#
mantelhaen.test(mtx3D, exact=FALSE)
The results of the test is
Mantel-Haenszel chi-squared test with continuity correction
data: mtx3D
Mantel-Haenszel X-squared = 0.23529, df = 1, p-value = 0.6276
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
NaN NaN
sample estimates:
common odds ratio
Inf
Given the low number of cases, it is preferable to compute an exact conditional test (option exact=TRUE).
mantelhaen.test(mtx3D, exact=T)
# Exact conditional test of independence in 2 x 2 x k tables
#
# data: mtx3D
# S = 4, p-value = 0.5
# alternative hypothesis: true common odds ratio is not equal to 1
# 95 percent confidence interval:
# 0.1340796 Inf
# sample estimates:
# common odds ratio
# Inf

How to find Consecutive Numbers Among multiple Arrays?

I right away give an example,
now suppose I have 3 arrays a,b,c such as
a = c(3,5)
b = c(6,1,8,7)
c = c(4,2,9)
I must be able to extract consecutive triplets among them i,e.,
c(1,2,3),c(4,5,6)
But this was just an example, I would be having a larger data set with even more than 10 arrays, hence must be able to find the consecutive series of length ten.
So could anyone provide an algorithm, to generally find the consecutive series of length 'n' among 'n' arrays.
I am actually doing this stuff in R, so its preferable if you give your code in R. Yet algorithm from any language is more than welcomed.
Reorganize the data first into a list containing value and array number.
Sort the list; you'd have smth like:
1-2
2-3
3-1 (i.e. " there' s a three in array 1" )
4-3
5-1
6-2
7-2
8-2
9-3
Then loop the list, check if there are actually n consecutive numbers, then check if these had different array numbers
Here's one approach. This assumes there are no breaks in the sequence of observations in the number of groups. Here the data.
N <- 3
a <- c(3,5)
b <- c(6,1,8,7)
c <- c(4,2,9)
Then i combine them together and order by the observations
dd <- lattice::make.groups(a,b,c)
dd <- dd[order(dd$data),]
Now I look for rows in this table where all three groups are represented
idx <- apply(embed(as.numeric(dd$which),N), 1, function(x) {
length(unique(x))==N
})
Then we can see the triplets with
lapply(which(idx), function(i) {
dd[i:(i+N-1),]
})
# [[1]]
# data which
# b2 1 b
# c2 2 c
# a1 3 a
#
# [[2]]
# data which
# c1 4 c
# a2 5 a
# b1 6 b
Here is a brute force method with expand.grid and three vectors as in the example
# get all combinations
df <- expand.grid(a,b,c)
Using combn to calculate difference for each pairwise combination.
# get all parwise differences
myDiffs <- combn(names(df), 2, FUN=function(x) abs(x[1]-x[2]))
# subset data using `rowSums` and `which`
df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]
df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]
Var1 Var2 Var3
2 5 6 4
11 3 1 2
I have hacked together a little recursive function that will find all the consecutive triplets amongst as many vectors as you pass it (need to pass at least three). It is probably a little crude, but seems to work.
The function uses the ellipsis, ..., for passing arguments. Hence it will take however many arguments (i.e. numeric vectors) you provide and put them in the list items. Then the smallest value amongst each passed vector is located, along with its index.
Then the indeces of the vectors corresponding to the smallest triplet are created and iterated through using a for() loop, where the output values are passed to the output vector out. The input vectors in items are pruned and passed again into the function in a recursive fashion.
Only, when all vectors are NA, i.e. there are no more values in the vectors, the function returns the final result.
library(magrittr)
# define function to find the triplets
tripl <- function(...){
items <- list(...)
# find the smallest number in each passed vector, along with its index
# output is a matrix of n-by-2, where n is the number of passed arguments
triplet.id <- lapply(items, function(x){
if(is.na(x) %>% prod) id <- c(NA, NA)
else id <- c(which(x == min(x)), x[which(x == min(x))])
}) %>% unlist %>% matrix(., ncol=2, byrow=T)
# find the smallest triplet from the passed vectors
index <- order(triplet.id[,2])[1:3]
# create empty vector for output
out <- vector()
# go through the smallest triplet's indices
for(i in index){
# .. append the coresponding item from the input vector to the out vector
# .. and remove the value from the input vector
if(length(items[[i]]) == 1) {
out <- append(out, items[[i]])
# .. if the input vector has no value left fill with NA
items[[i]] <- NA
}
else {
out <- append(out, items[[i]][triplet.id[i,1]])
items[[i]] <- items[[i]][-triplet.id[i,1]]
}
}
# recurse until all vectors are empty (NA)
if(!prod(unlist(is.na(items)))) out <- append(list(out),
do.call("tripl", c(items), quote = F))
else(out <- list(out))
# return result
return(out)
}
The function can be called by passing the input vectors as arguments.
# input vectors
a = c(3,5)
b = c(6,1,8,7)
c = c(4,2,9)
# find all the triplets using our function
y <- tripl(a,b,c)
The result is a list, which contains all the neccesary information, albeit unordered.
print(y)
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 4 5 6
#
# [[3]]
# [1] 7 9 NA
#
# [[4]]
# [1] 8 NA NA
Ordering everything can be done using sapply():
# put everything in order
sapply(y, function(x){x[order(x)]}) %>% t
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 4 5 6
# [3,] 7 9 NA
# [4,] 8 NA NA
The thing is, that it will use only one value per vector to find triplets.
It will therefore not find the consecutive triplet c(6,7,8) among e.g. c(6,7,11), c(8,9,13) and c(10,12,14).
In this instance it would return c(6,8,10) (see below).
a<-c(6,7,11)
b<-c(8,9,13)
c<-c(10,12,14)
y <- tripl(a,b,c)
sapply(y, function(x){x[order(x)]}) %>% t
# [,1] [,2] [,3]
# [1,] 6 8 10
# [2,] 7 9 12
# [3,] 11 13 14

How can I scale an array to another length saving it's approximate values in R

I have two arrays with different lengths
value <- c(1,1,1,4,4,4,1,1,1)
time <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
How can I resize the value array to make it same length as the time array, saving it's approximate values ?
approx() function tells that lengths are differ.
I want to get value array to be like
value <- c(1,1,1,1,1,4,4,4,4,4,4,1,1,1,1)
time <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
so lengths are equal
UPD
Okay, the main goal is to calculate correlation of v1 from v2, where
v1 inside of data.frame v1,t1 , and v2 inside of data.frame v2,t2.
the v1,t1 and v2,t2 data frames have different lengths, but we know that t1 and t2 is for equal time period so we can overlay them.
for t1 we have 1,3,5,7,9 and for t2 we have 1,2,3,4,5,6,7,8,9,10.
The problem is that two data frames are recorded separately but simultaneusly so I need to scale one of them to overlay another data.frame. And then I can calculate correlation of how v1 affects on v2.
That why I need to scale v1 to t2 length.
I'm sorry guys, I dont know how to write the goal correctly in english.
You may use the xout argument in approx
"xout: an optional set of numeric values specifying where interpolation is to take place.".
# create some fake data, which I _think_ may resemble the data you described in edit.
set.seed(123)
# "for t1 we have 1,3,5,7,9"
df1 <- data.frame(time = c(1, 3, 5, 7, 9), value = sample(1:10, 5))
df1
# "for t2 we have 1,2,3,4,5,6,7,8,9,10", the 'full time series'.
df2 <- data.frame(time = 1:10, value = sample(1:10))
# interpolate using approx and the xout argument
# The time values for 'full time series', df2$time, is used as `xout`.
# default values of arguments (e.g. linear interpolation, no extrapolation)
interpol1 <- with(df1, approx(x = time, y = value, xout = df2$time))
# some arguments you may wish to check
# extrapolation rules
interpol2 <- with(df1, approx(x = time, y = value, xout = df2$time,
rule = 2))
# interpolation method ('last observation carried forward")
interpol3 <- with(df1, approx(x = time, y = value, xout = df2$time,
rule = 2, method = "constant"))
df1
# time value
# 1 1 3
# 2 3 8
# 3 5 4
# 4 7 7
# 5 9 6
interpol1
# $x
# [1] 1 2 3 4 5 6 7 8 9 10
#
# $y
# [1] 3.0 5.5 8.0 6.0 4.0 5.5 7.0 6.5 6.0 NA
interpol3
# $x
# [1] 1 2 3 4 5 6 7 8 9 10
#
# $y
# [1] 3 3 8 8 4 4 7 7 6 6
# correlation between a vector of inter-(extra-)polated values
# and the 'full' time series
cor.test(interpol3$y, df2$value)
This little function tries to pad the values in the shorter vector out as evenly as possible and is generalisable. Haven't thought too much about edge cases, and I am sure there are many that break it. Plus it seems like it could be simplified, but is this what you are looking to do...
pad <- function(x,y){
fill <- length(y) - length(x)
run <- rle(x)
add <- fill %/% length(run$lengths)
pad <- diff( c( 0 , as.integer( seq( add , fill , length.out = length(run$lengths) ) ) ) )
rep(run$values , times = run$lengths+pad)
}
pad(value,time)
[1] 1 1 1 1 1 4 4 4 4 4 1 1 1 1 1
Or e.g.
value <- 1:2
time <- 1:10
pad(value,time)
[1] 1 1 1 1 1 2 2 2 2 2

R: JSON Package - importing data & missing values / null

I am reading in data with the JSON package.
Basically, the data has the following format:
{"a":1,"b":2,"c":3}
{"a": null,"b":2,"c":3}
I am storing the data as follows in R:
DAT<-data.table(read.csv("D:/file.csv"))
i<-1
#create unified variable names
while (i<=nrow(DAT)) {
OUT[[i]]<-fromJSON(as.character(DAT[i]$results))
vnames<-c(vnames,names(OUT[[i]]))
i<-i+1
}
#create the corresponding content
content <- NULL
Applicant <- NULL
i<-1
while (i<=nrow(DAT)) {
temp<-fromJSON(as.character(DAT[i]$results))
laenge <- length(fromJSON(as.character(DAT[i]$results)))
for(j in 1:laenge)
{
content_new <- as.character(temp[[j]])
content <- c(content, content_new)
}
i <- i+1
}
Then I want to join the lists via (in order to have the data in the typical format):
assets_mren = data.frame(asset_class=vnames, value=content)
Yet I receive an error message stating that vnames and content have different number of rows. I believe that the problem is "null" in the data to be read in. Do you have an idea how to read in "null" above or how to better read in the data?
Yes the problem is null. You get different structure for each row.
ll <- '{"a":1,"b":2,"c":3}
{"a": null,"b":2,"c":3}'
res <- lapply(ll,function(x)str(fromJSON(x)))
Named num [1:3] 1 2 3 ## named vector for the first line
- attr(*, "names")= chr [1:3] "a" "b" "c"
List of 3
$ a: NULL ## list for the second line
$ b: num 2
$ c: num 3
So you have to homogenise the output of each line. Here 2 options:
1- replace null by a dummy values (0 or -1) for example:
ll <- readLines(textConnection(gsub("null",-1,ll)))
do.call(rbind,lapply(ll,function(x)
fromJSON(x)))
a b c
[1,] 1 2 3
[2,] -1 2 3 ## res[res==-1] <- NA to replace dummy value
2- keep the null but you should use rbind.fill to get a data.frame:
ll <- readLines(textConnection(gsub("null",-1,ll)))
do.call(rbind,lapply(ll,function(x)
fromJSON(x)))
ll <- '{"a":1,"b":2,"c":3}
{"a": null,"b":2,"c":3}'
ll <- readLines(textConnection(ll))
res <- lapply(ll,function(x)
as.data.frame(t(as.matrix(unlist(fromJSON(x))))))
library(plyr)
rbind.fill(res)
a b c
1 1 2 3
2 NA 2 3

R: Combine a list to a data frame

I have a array of names and a function that returns a data frame. I want to combine this array and data frame. For e.g.:
>mynames<-c("a", "b", "c")
>df1 <- data.frame(val0=c("d", "e"),val1=4:5)
>df2 <- data.frame(val1=c("e", "f"),val2=5:6)
>df3 <- data.frame(val2=c("f", "g"),val3=6:7)
What I want is a data frame that joins this array with data frame. df1 corresponds to "a", df2 corresponds to "b" and so on. So, the final data frame looks like this:
Names Var Val
a d 4
a e 5
b e 5
b f 6
c f 6
c g 7
Can someone help me on this?
Thanks.
This answers this particular question, but I'm not sure how much help it will be for your actual problem:
myList <- list(df1, df2, df3)
do.call(rbind,
lapply(seq_along(mynames), function(x)
cbind(Names = mynames[x], setNames(myList[[x]],
c("Var", "Val")))))
# Names Var Val
# 1 a d 4
# 2 a e 5
# 3 b e 5
# 4 b f 6
# 5 c f 6
# 6 c g 7
Here, we create a list of your data.frames, and in our lapply call, we add in the new "Names" column and rename the existing columns so that we can use rbind to put them all together.

Resources