How to find Consecutive Numbers Among multiple Arrays? - arrays

I right away give an example,
now suppose I have 3 arrays a,b,c such as
a = c(3,5)
b = c(6,1,8,7)
c = c(4,2,9)
I must be able to extract consecutive triplets among them i,e.,
c(1,2,3),c(4,5,6)
But this was just an example, I would be having a larger data set with even more than 10 arrays, hence must be able to find the consecutive series of length ten.
So could anyone provide an algorithm, to generally find the consecutive series of length 'n' among 'n' arrays.
I am actually doing this stuff in R, so its preferable if you give your code in R. Yet algorithm from any language is more than welcomed.

Reorganize the data first into a list containing value and array number.
Sort the list; you'd have smth like:
1-2
2-3
3-1 (i.e. " there' s a three in array 1" )
4-3
5-1
6-2
7-2
8-2
9-3
Then loop the list, check if there are actually n consecutive numbers, then check if these had different array numbers

Here's one approach. This assumes there are no breaks in the sequence of observations in the number of groups. Here the data.
N <- 3
a <- c(3,5)
b <- c(6,1,8,7)
c <- c(4,2,9)
Then i combine them together and order by the observations
dd <- lattice::make.groups(a,b,c)
dd <- dd[order(dd$data),]
Now I look for rows in this table where all three groups are represented
idx <- apply(embed(as.numeric(dd$which),N), 1, function(x) {
length(unique(x))==N
})
Then we can see the triplets with
lapply(which(idx), function(i) {
dd[i:(i+N-1),]
})
# [[1]]
# data which
# b2 1 b
# c2 2 c
# a1 3 a
#
# [[2]]
# data which
# c1 4 c
# a2 5 a
# b1 6 b

Here is a brute force method with expand.grid and three vectors as in the example
# get all combinations
df <- expand.grid(a,b,c)
Using combn to calculate difference for each pairwise combination.
# get all parwise differences
myDiffs <- combn(names(df), 2, FUN=function(x) abs(x[1]-x[2]))
# subset data using `rowSums` and `which`
df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]
df[which(rowSums(myDiffs == 1) == ncol(myDiffs)-1), ]
Var1 Var2 Var3
2 5 6 4
11 3 1 2

I have hacked together a little recursive function that will find all the consecutive triplets amongst as many vectors as you pass it (need to pass at least three). It is probably a little crude, but seems to work.
The function uses the ellipsis, ..., for passing arguments. Hence it will take however many arguments (i.e. numeric vectors) you provide and put them in the list items. Then the smallest value amongst each passed vector is located, along with its index.
Then the indeces of the vectors corresponding to the smallest triplet are created and iterated through using a for() loop, where the output values are passed to the output vector out. The input vectors in items are pruned and passed again into the function in a recursive fashion.
Only, when all vectors are NA, i.e. there are no more values in the vectors, the function returns the final result.
library(magrittr)
# define function to find the triplets
tripl <- function(...){
items <- list(...)
# find the smallest number in each passed vector, along with its index
# output is a matrix of n-by-2, where n is the number of passed arguments
triplet.id <- lapply(items, function(x){
if(is.na(x) %>% prod) id <- c(NA, NA)
else id <- c(which(x == min(x)), x[which(x == min(x))])
}) %>% unlist %>% matrix(., ncol=2, byrow=T)
# find the smallest triplet from the passed vectors
index <- order(triplet.id[,2])[1:3]
# create empty vector for output
out <- vector()
# go through the smallest triplet's indices
for(i in index){
# .. append the coresponding item from the input vector to the out vector
# .. and remove the value from the input vector
if(length(items[[i]]) == 1) {
out <- append(out, items[[i]])
# .. if the input vector has no value left fill with NA
items[[i]] <- NA
}
else {
out <- append(out, items[[i]][triplet.id[i,1]])
items[[i]] <- items[[i]][-triplet.id[i,1]]
}
}
# recurse until all vectors are empty (NA)
if(!prod(unlist(is.na(items)))) out <- append(list(out),
do.call("tripl", c(items), quote = F))
else(out <- list(out))
# return result
return(out)
}
The function can be called by passing the input vectors as arguments.
# input vectors
a = c(3,5)
b = c(6,1,8,7)
c = c(4,2,9)
# find all the triplets using our function
y <- tripl(a,b,c)
The result is a list, which contains all the neccesary information, albeit unordered.
print(y)
# [[1]]
# [1] 1 2 3
#
# [[2]]
# [1] 4 5 6
#
# [[3]]
# [1] 7 9 NA
#
# [[4]]
# [1] 8 NA NA
Ordering everything can be done using sapply():
# put everything in order
sapply(y, function(x){x[order(x)]}) %>% t
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 4 5 6
# [3,] 7 9 NA
# [4,] 8 NA NA
The thing is, that it will use only one value per vector to find triplets.
It will therefore not find the consecutive triplet c(6,7,8) among e.g. c(6,7,11), c(8,9,13) and c(10,12,14).
In this instance it would return c(6,8,10) (see below).
a<-c(6,7,11)
b<-c(8,9,13)
c<-c(10,12,14)
y <- tripl(a,b,c)
sapply(y, function(x){x[order(x)]}) %>% t
# [,1] [,2] [,3]
# [1,] 6 8 10
# [2,] 7 9 12
# [3,] 11 13 14

Related

How to I calculate the correlation coefficient of matrices inside an array?

I have an relatively large array (242x240x2922). The first two dimensions are latitude and longitude, and the third dimension is time (daily satellite images).
I need to extract the correlation coefficient for a subset of that array that corresponds to data within a 6° Radius of each one of the (lon, lat) pairs. First, I created a loop that calculates the circle polygon for each of the lon, lat pairs. Then, I checked which points are inside the circle (with the point.in.polygon function), and extracted the subset of the larger array.
I know I could build a second, nested loop that can calculate the correlation of the time series of each lon, lat with the rest of the time series of the "sub array" (those that fall within the circle), but It would take too long... is there any straight forward way to calculate the correlation coefficient of a vector of size "L" with each of the vectors of an array that has NxMxL dimensions? For example, in a loop, the first round would calculate cor(myvector, myarray[,,1]).
I tried using apply(myarray, dim=3, cor), but I'm struggling to understand the results.
Thanks a lot in advance.
#define dimensions
M = 3; N = 4; L = 5
myarray <- array(data = rnorm(M*N*L), dim=c(M,N,L))
myvector <- myarray[1,1, ]
# Use apply function to cycle through all the vectors in 3rd dimension:
result <- apply(myarray, c(1,2), FUN=function(x)cor(myvector,x))
result
# [,1] [,2] [,3] [,4]
#[1,] 1.00000000 0.73804476 0.7356366 -0.1583484
#[2,] 0.03820936 -0.07797187 0.3798744 -0.4925700
#[3,] -0.52827708 -0.09036006 0.1895361 -0.2860481
# For testing compare with the loop result (which will be much slower for larger arrays):
for (i in 1:dim(myarray)[1])
for (j in 1:dim(myarray)[2])
print( cor(myvector,myarray[i,j,]))
# [1] 1
# [1] 0.7380448
# [1] 0.7356366
# [1] -0.1583484
# [1] 0.03820936
# [1] -0.07797187
# [1] 0.3798744
# [1] -0.49257
# [1] -0.5282771
# [1] -0.09036006
# [1] 0.1895361
# [1] -0.2860481

Arranging a 3 dimensional contingency table in R in order to run a Cochran-Mantel-Haenszel analysis?

I am attempting to run a Mantel-Haenszel analysis in R to determine whether or not a comparison of proportions test is still significant when accounting for a 'diagnosis' ratio within groups. This test is available in the stats package.
library(stats)
mantelhaen.test(x)
Having done some reading, I've found that this test can perform an odds ratio test on a contingency table that is n x n x k, as opposed to simply n x n. However, I am having trouble arranging my data in the proper way, as I am fairly new to R. I have created some example data...
ex.label <- c("A","A","A","A","A","A","A","B","B","B")
ex.status <- c("+","+","-","+","-","-","-","+","+","-")
ex.diag <- c("X","X","Z","Y","Y","Y","X","Y","Z","Z")
ex.data <- data.frame(ex.label,ex.diag,ex.status)
Which looks like this...
ex.label ex.diag ex.status
1 A X +
2 A X +
3 A Z -
4 A Y +
5 A Y -
6 A Y -
7 A X -
8 B Y +
9 B Z +
10 B Z -
I was originally able to use a simple N-1 chi-square to run a comparison of proportions test of + to - for only the A and B, but now I want to be able to account for the ex.diag as well. I'll show a graph here for what I wanted to be looking at, which is basically to compare the significance of the ratio in each column. I was able to do this, but I now want to be able to account for ex.diag.
I tried to use the ftable() function to arrange my data in a way that would work.
ex.ftable <- ftable(ex.data)
Which looks like this...
ex.status - +
ex.label ex.diag
A X 1 2
Y 2 1
Z 1 0
B X 0 0
Y 0 1
Z 1 1
However, when I run mantelhaen.test(ex.ftable), I get the error 'x' must be a 3-dimensional array. How can I arrange my data in such a way that I can actually run this test?
In mantelhaen.test the last dimension of the 3-dimensional contingency table x needs to be the stratification variable (ex.diag). This matrix can be generated as follows:
ex.label <- c("A","A","A","A","A","A","A","B","B","B")
ex.status <- c("+","+","-","+","-","-","-","+","+","-")
ex.diag <- c("X","X","Z","Y","Y","Y","X","Y","Z","Z")
# Now ex.diag is in the first column
ex.data <- data.frame(ex.diag, ex.label, ex.status)
# The flat table
( ex.ftable <- ftable(ex.data) )
# ex.status - +
# ex.diag ex.label
# X A 1 2
# B 0 0
# Y A 2 1
# B 0 1
# Z A 1 0
# B 1 1
The 3D matrix can be generated using aperm.
# Trasform the ftable into a 2 x 2 x 3 array
# First dimension: ex.label
# Second dimension: ex.status
# Third dimension: ex.diag
( mtx3D <- aperm(array(t(as.matrix(ex.ftable)),c(2,2,3)),c(2,1,3)) )
# , , 1
#
# [,1] [,2]
# [1,] 1 2
# [2,] 0 0
#
# , , 2
#
# [,1] [,2]
# [1,] 2 1
# [2,] 0 1
#
# , , 3
#
# [,1] [,2]
# [1,] 1 0
# [2,] 1 1
Now the Cochran-Mantel-Haenszel chi-squared test can be performed.
# Cochran-Mantel-Haenszel chi-squared test of the null that
# two nominal variables are conditionally independent in each stratum
#
mantelhaen.test(mtx3D, exact=FALSE)
The results of the test is
Mantel-Haenszel chi-squared test with continuity correction
data: mtx3D
Mantel-Haenszel X-squared = 0.23529, df = 1, p-value = 0.6276
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
NaN NaN
sample estimates:
common odds ratio
Inf
Given the low number of cases, it is preferable to compute an exact conditional test (option exact=TRUE).
mantelhaen.test(mtx3D, exact=T)
# Exact conditional test of independence in 2 x 2 x k tables
#
# data: mtx3D
# S = 4, p-value = 0.5
# alternative hypothesis: true common odds ratio is not equal to 1
# 95 percent confidence interval:
# 0.1340796 Inf
# sample estimates:
# common odds ratio
# Inf

storing a vector in a matrix in r with unknown vector length

Hi there i was wondering if is there a way to store a vector into an array or matrix.
for example,
array1<-array(dim=c(1,2))
vector1<-as.vector(1:5)
vector2<-as.vector(6:10)
array1[1,1]<-vector1
array1[1,2]<-vector2
so that when i call for
array1[1,1]
i will receive
[1] 1 2 3 4 5
I've tried doing what i did above and what i get the error
number of items to replace is not a multiple of replacement length
is there a way to get around this?
also, the problem that i face is that i do not know the vector length and that the vectors could have different length as well.
i.e vector 1 can be length of 6 and vector 2 can be a length of 7.
thanks!
Try with a list:
my_list <- list()
my_list[[1]] <- c(1:5)
my_list[[2]] <- c(6:11)
A list allows you to store vectors of varying length. The vectors can be retrieved by addressing the element of the list:
> my_list[[1]]
#[1] 1 2 3 4 5
You can use a matrix of lists:
m <- matrix(list(), 2, 2)
m[1,1][[1]] <- 1:2
m[1,2][[1]] <- 1:3
m[2,1][[1]] <- 1:4
m[2,2][[1]] <- 1:5
m
# [,1] [,2]
#[1,] Integer,2 Integer,3
#[2,] Integer,4 Integer,5
m[1, 2]
#[[1]]
#[1] 1 2 3
m[1, 2][[1]]
#[1] 1 2 3

sorting an array of lists based and counting duplicate elements -R

I have a vector of lists (effectively a 2D array). The lists contain certain IDs and the number of IDs vary from list to list. I want to sort the vector based on the lists (first ID -> second ID ->.. and so on). Also I want to find the number of duplicates occurring in the vector. (Duplicates would be same IDs in separate lists in any permutation).
For example:
vec = c( list(c(1,2)),list(c(1,2,3)),list(c(1,2)),list(c(2,3)),list(c(1,3,2)) )
vec
[[1]]
[1] 1 2
[[2]]
[1] 1 2 3
[[3]]
[1] 1 2
[[4]]
[1] 2 3
[[5]]
[1] 1 3 2
I want the output to sort the lists and provide the number of duplicates. Hence, the output must be in the order:
[[1]] -> [[2]] -> [[4]] with frequencies (2,2,1).
We can try
l1 <- lapply(vec, sort)
l2 <- l1[!duplicated(l1)]
l3 <- lapply(l2, `length<-`, min(lengths(l2)))
i4 <- order(as.numeric(sapply(l3, paste, collapse='')))
l2[i4]
To get the frequencies
table(sapply(l1, paste, collapse=''))[i4]

incorrect number of dimensions and incorrect number of subscripts in array

I am new to using R and thus my question might be a simple one, but nonetheless I have spent a lot of time trying to figure out what I am doing wrong and to no avail. I have discovered a lot of help on this site in the past week searching through other questions/answers (thank you!) but as someone new, it is often difficult to interpret other people's code.
I am trying to build a 3-dimensional array of multiple data files, each one with the same dimensions 57x57.
# read in 100 files
Files = lapply(Sys.glob('File*.txt'), read.table, sep='\t', as.is=TRUE)
# convert to dataframes
Files = lapply(Files[1:100], as.data.frame)
# check dimensions of first file (it's the same for all)
dim(Files[[1]])
[1] 57 57
# build empty array
Array = array(dim=c(57,57,100))
# read in the first data frame
Array[,,1] = Files[1]
# read in the second data frame
Array[,,2] = Files[2]
Error in Array[, , 2] = Files[2] : incorrect number of subscripts
# if I check...
Array[,,1] = Files[1]
Error in Array[, , 1] : incorrect number of dimensions
# The same thing happens when I do it in a loop:
x = 0
for(i in 1:100){
Array[,,x+1] = Files[[i]]
x = x + 1
}
Error in Array[, , 1] = Files[[1]] :
incorrect number of subscripts
You need to convert your data frames into matrices before you do the assignment:
l <- list(data.frame(x=1:2, y=3:4), data.frame(x=5:6, y=7:8))
arr <- array(dim=c(2, 2, 2))
arr[,,1] <- as.matrix(l[[1]])
arr[,,2] <- as.matrix(l[[2]])
arr
# , , 1
#
# [,1] [,2]
# [1,] 1 3
# [2,] 2 4
#
# , , 2
#
# [,1] [,2]
# [1,] 5 7
# [2,] 6 8
You can actually build the array in one line with the unlist function applied to a list of the matrices you want to combine:
arr2 <- array(unlist(lapply(l, as.matrix)), dim=c(dim(l[[1]]), length(l)))
all.equal(arr, arr2)
# [1] TRUE

Resources