I'm slightly confused as to whether lapply works on a list or on a vector. See two examples below
Here, the mean function is applied over an array of numbers, ie, 1 to 5
x = list(a=1:5, b=rnorm(10))
x
#$a
#[1] 1 2 3 4 5
#
#$b
#[1] -0.57544290 0.51035240 0.43143241 -0.97971957 -0.99845378
#[6] 0.77389008 -0.08464382 0.68420547 1.64793617 -0.39688809
lapply(x,mean)
#$a
#[1] 3
#
#$b
#[1] 0.1012668
But here, the runif function is applied over each individual element of the array
x = 1:4
lapply(x,runif)
#[[1]]
#[1] 0.5914268
#[[2]]
#[1] 0.6762355 0.3072287
#[[3]]
#[1] 0.8439318 0.8488374 0.1158645
#[[4]]
#[1] 0.8519037 0.8384169 0.2894639 0.4066553
My question is, what exactly does lapply work on? an array or an individual element? And how does it choose it correctly depending on the function?
lapply will work on whatever is the highest level which defines the structure of the R object.
If I have 4 individual integers, lapply will work on each integer:
x <- 1:4
lapply(x, identity)
#[[1]]
#[1] 1
#
#[[2]]
#[1] 2
#
#[[3]]
#[1] 3
#
#[[4]]
#[1] 4
If however I have a list of length==2 each containing 2 values, lapply will work on each list object.
x <- list(1:2,3:4)
lapply(x, identity)
#[[1]]
#[1] 1 2
#
#[[2]]
#[1] 3 4
Related
I have a 1D array with an odd number of rows, 2435 rows. I want to split the array into smaller arrays and each time perform a small test.
Firstly, I want to split the big array into two smaller arrays.
Then I would like to split my array into 4 smaller arrays, then into 8 small arrays and so on.
Can anyone help with that?
An example is the following:
A<-1:2435
A1
1,2,3,4,...,1237
A2
1238, 1239,...,2435
Thanks in advance
Why not simply use the split() function? For example (using an odd length will return warnings but that's fine):
split(x = 1:11, f = 1:2) # to split into 2 distinct list elements
#$`1`
#[1] 1 3 5 7 9 11
#
#$`2`
#[1] 2 4 6 8 10
split(x = 1:11, f = 1:4)
#$`1`
#[1] 1 5 9
#
#$`2`
#[1] 2 6 10
#
#$`3`
#[1] 3 7 11
#
#$`4`
#[1] 4 8
And if you are really keen on splitting to 2, and then again by 2, you can always use the lapply() function which works on each element of a list:
lapply(split(x = 1:11, f = 1:2), split, f = 1:2)
#$`1`
#$`1`$`1`
#[1] 1 5 9
#
#$`1`$`2`
#[1] 3 7 11
#
#
#$`2`
#$`2`$`1`
#[1] 2 6 10
#
#$`2`$`2`
#[1] 4 8
The nested structure is a little bit of a pain but there are other methods for dealing with that, for example:
L <- split(x = 1:11, f = 1:2) # the main (first) split
names(L) <- letters[1:length(L)] # names the main split a and b
LL <- lapply(L, split, f = 1:2) # split the main split
unlist(LL, recursive = F)
#$a.1
#[1] 1 5 9
#
#$a.2
#[1] 3 7 11
#
#$b.1
#[1] 2 6 10
#
#$b.2
#[1] 4 8
If you want to split the data through the middle of the array, you can also use the split function:
a <- 1:2435
divide <- function(x, n = 2)
{
i <- ceiling(length(x)/n)
split(x,x%/%i+1)
}
divide(a)
and with more parts you can use
divide(a, n = 4)
Or in two itterations use
lapply(divide(a,2),function(x) divide(x,2))
With a higher value of n, the sizes will not be equal anymore, due to rounding issues. Which warrants the use of the nested approach.
Hi there i was wondering if is there a way to store a vector into an array or matrix.
for example,
array1<-array(dim=c(1,2))
vector1<-as.vector(1:5)
vector2<-as.vector(6:10)
array1[1,1]<-vector1
array1[1,2]<-vector2
so that when i call for
array1[1,1]
i will receive
[1] 1 2 3 4 5
I've tried doing what i did above and what i get the error
number of items to replace is not a multiple of replacement length
is there a way to get around this?
also, the problem that i face is that i do not know the vector length and that the vectors could have different length as well.
i.e vector 1 can be length of 6 and vector 2 can be a length of 7.
thanks!
Try with a list:
my_list <- list()
my_list[[1]] <- c(1:5)
my_list[[2]] <- c(6:11)
A list allows you to store vectors of varying length. The vectors can be retrieved by addressing the element of the list:
> my_list[[1]]
#[1] 1 2 3 4 5
You can use a matrix of lists:
m <- matrix(list(), 2, 2)
m[1,1][[1]] <- 1:2
m[1,2][[1]] <- 1:3
m[2,1][[1]] <- 1:4
m[2,2][[1]] <- 1:5
m
# [,1] [,2]
#[1,] Integer,2 Integer,3
#[2,] Integer,4 Integer,5
m[1, 2]
#[[1]]
#[1] 1 2 3
m[1, 2][[1]]
#[1] 1 2 3
I right away give an example,
now suppose I have 3 arrays a,b,c such as
a = c(3,5)
b = c(6,1,8,7)
c = c(4,2,9)
I must be able to extract consecutive triplets among them i,e.,
c(1,2,3),c(4,5,6)
But this was just an example, I would be having a larger data set with even more than 10 arrays, hence must be able to find the consecutive series of length ten.
So could anyone provide an algorithm, to generally find the consecutive series of length 'n' among 'n' arrays.
I am actually doing this stuff in R, so its preferable if you give your code in R. Yet algorithm from any language is more than welcomed.
Reorganize the data first into a list containing value and array number.
Sort the list; you'd have smth like:
1-2
2-3
3-1 (i.e. " there' s a three in array 1" )
4-3
5-1
6-2
7-2
8-2
9-3
Then loop the list, check if there are actually n consecutive numbers, then check if these had different array numbers
Here's one approach. This assumes there are no breaks in the sequence of observations in the number of groups. Here the data.
N <- 3
a <- c(3,5)
b <- c(6,1,8,7)
c <- c(4,2,9)
Then i combine them together and order by the observations
dd <- lattice::make.groups(a,b,c)
dd <- dd[order(dd$data),]
Now I look for rows in this table where all three groups are represented
idx <- apply(embed(as.numeric(dd$which),N), 1, function(x) {
length(unique(x))==N
})
Then we can see the triplets with
lapply(which(idx), function(i) {
dd[i:(i+N-1),]
})
# [[1]]
# data which
# b2 1 b
# c2 2 c
# a1 3 a
#
# [[2]]
# data which
# c1 4 c
# a2 5 a
# b1 6 b
Here is a brute force method with expand.grid and three vectors as in the example
# get all combinations
df <- expand.grid(a,b,c)
Using combn to calculate difference for each pairwise combination.
# get all parwise differences
myDiffs <- combn(names(df), 2, FUN=function(x) abs(x[1]-x[2]))
# subset data using `rowSums` and `which`
df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]
df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]
Var1 Var2 Var3
2 5 6 4
11 3 1 2
I have hacked together a little recursive function that will find all the consecutive triplets amongst as many vectors as you pass it (need to pass at least three). It is probably a little crude, but seems to work.
The function uses the ellipsis, ..., for passing arguments. Hence it will take however many arguments (i.e. numeric vectors) you provide and put them in the list items. Then the smallest value amongst each passed vector is located, along with its index.
Then the indeces of the vectors corresponding to the smallest triplet are created and iterated through using a for() loop, where the output values are passed to the output vector out. The input vectors in items are pruned and passed again into the function in a recursive fashion.
Only, when all vectors are NA, i.e. there are no more values in the vectors, the function returns the final result.
library(magrittr)
# define function to find the triplets
tripl <- function(...){
items <- list(...)
# find the smallest number in each passed vector, along with its index
# output is a matrix of n-by-2, where n is the number of passed arguments
triplet.id <- lapply(items, function(x){
if(is.na(x) %>% prod) id <- c(NA, NA)
else id <- c(which(x == min(x)), x[which(x == min(x))])
}) %>% unlist %>% matrix(., ncol=2, byrow=T)
# find the smallest triplet from the passed vectors
index <- order(triplet.id[,2])[1:3]
# create empty vector for output
out <- vector()
# go through the smallest triplet's indices
for(i in index){
# .. append the coresponding item from the input vector to the out vector
# .. and remove the value from the input vector
if(length(items[[i]]) == 1) {
out <- append(out, items[[i]])
# .. if the input vector has no value left fill with NA
items[[i]] <- NA
}
else {
out <- append(out, items[[i]][triplet.id[i,1]])
items[[i]] <- items[[i]][-triplet.id[i,1]]
}
}
# recurse until all vectors are empty (NA)
if(!prod(unlist(is.na(items)))) out <- append(list(out),
do.call("tripl", c(items), quote = F))
else(out <- list(out))
# return result
return(out)
}
The function can be called by passing the input vectors as arguments.
# input vectors
a = c(3,5)
b = c(6,1,8,7)
c = c(4,2,9)
# find all the triplets using our function
y <- tripl(a,b,c)
The result is a list, which contains all the neccesary information, albeit unordered.
print(y)
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 4 5 6
#
# [[3]]
# [1] 7 9 NA
#
# [[4]]
# [1] 8 NA NA
Ordering everything can be done using sapply():
# put everything in order
sapply(y, function(x){x[order(x)]}) %>% t
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 4 5 6
# [3,] 7 9 NA
# [4,] 8 NA NA
The thing is, that it will use only one value per vector to find triplets.
It will therefore not find the consecutive triplet c(6,7,8) among e.g. c(6,7,11), c(8,9,13) and c(10,12,14).
In this instance it would return c(6,8,10) (see below).
a<-c(6,7,11)
b<-c(8,9,13)
c<-c(10,12,14)
y <- tripl(a,b,c)
sapply(y, function(x){x[order(x)]}) %>% t
# [,1] [,2] [,3]
# [1,] 6 8 10
# [2,] 7 9 12
# [3,] 11 13 14
I have a ragged list that I would like to work with. i.e. I would like to use an apply function to quickly and simply pull out elements from the lists. The following code attempts to approximate my situation:
vec1 <- c("B","D","E","NA")
vec2 <- c("B","D","E","NA")
vec3 <- c("B","C","E","NA")
write.table(vec1, file="./vec1.csv", sep=",", quote=F)
write.table(vec2, file="./vec2.csv", sep=",", quote=F)
write.table(vec3, file="./vec3.csv", sep=",", quote=F)
vectors.files <- list.files(path=getwd(),recursive=F, pattern=paste("*.csv",sep=""))
vectors.list <- lapply(vectors.files, read.csv)
How would I then be able to create a new object that was for example the second row of each list element in vectors.list?
Thanks,
Matt
It's not really clear what you're after as the final output format, but you might want to try variations on the following template:
lapply(vectors.list, function(x) x[2, , drop = FALSE])
# [[1]]
# x
# 2 D
#
# [[2]]
# x
# 2 D
#
# [[3]]
# x
# 2 C
Here, we've just passed an anonymous function (function(x)) to the items in your "vectors.list". In this case, we've used basic subsetting using [ to extract the second row. The drop = FALSE is to retain the data.frame structure since the result is a single-column data.frame (which normally simplifies to a vector).
Note that the data.frames in the resulting list still have all the original levels for the "x" factor. Use droplevels if you want to retain only the specific factor in that row.
Compare:
str(lapply(vectors.list, function(x) x[2, , drop = FALSE]))
# List of 3
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 3 levels "B","D","E": 2
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 3 levels "B","D","E": 2
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 3 levels "B","C","E": 2
str(lapply(vectors.list, function(x) droplevels(x[2, , drop = FALSE])))
# List of 3
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 1 level "D": 1
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 1 level "D": 1
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 1 level "C": 1
You may also want to explore as.character(unlist(x[2, ]).
If you store your vectors in a data frame you can subset.
> df <- data.frame(vectors.list)
> row2 <- df[2,]
> row2
x x.1 x.2
2 D D C
Suppose I have an array such that:
temp<-array(0, dim=c(100,10,4))
I can merge matrices 1 & 2 from the array into a single matrix using cbind:
temp.merge<-cbind(temp[,,1],temp[,,2])
Is there a way to merge all n matrices (in this case 4) into a single one without having to write out the position of each one as above?
If you have the array set up right in memory, you can just reset the dimensions and it will work.
dim(temp) <- c(100, 40)
If #Neal's answer works, definately use it.
This also works:
# generates 100 X 40 array
do.call("cbind",lapply(1:4,function(x){return(temp[,,x])}))
You would think that:
do.call("cbind",list(temp[,,1:4])) ## generates 4000 X 1 array
would work, but it does not...
Also:
as.matrix(as.data.frame(temp))
Example:
> temp <- array(1:8, dim=c(2,2,2))
> temp
#, , 1
#
# [,1] [,2]
#[1,] 1 3
#[2,] 2 4
#
#, , 2
#
# [,1] [,2]
#[1,] 5 7
#[2,] 6 8
as.matrix(as.data.frame(temp))
# V1 V2 V3 V4
#[1,] 1 3 5 7
#[2,] 2 4 6 8