Permutations of 3 elements within 6 positions: one equal neighbour - arrays

Taking into account the answers in this post Permutations of 3 elements within 6 positions, I think it's worth to open a new discussion about how ordering the elements.
The first condition was to have always sequences with alternate elements:
# Var1 Var2 Var3 Var4 Var5 Var6 V7
# 1 b c a b c a bcabca
# 2 c a b c a b cabcab
# 3 a b c a b c abcabc
# 4 b a b c a b babcab
# 5 c b c a b c cbcabc
# 6 a c a b c a acabca
However, the rest of the permutations could have value even if there is one coincidence of elements in like-neighbour restriction. For instance:
# Var1 Var2 Var3 Var4 Var5 Var6 Coincidence
# 1 b b a b c a -->[bb]
# 2 c c b c a b -->[cc]
# 3 a b c a a c -->[aa]
# 4 b a c c a b -->[cc]
Is it possible to use expand.grid for that too?

If it's "only one more", then I suggest the simplest way to allow it is to force it.
Using the start from the previous question:
r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
r[[1]] <- c(r[[1]], length(abc))
We now copy this single list (that is passed to expand.grid) and replace each of the 2nd through last elements with 0. Recall that we are using these numbers with cumsum to change from the previous value, so replacing 1:2 with 0 means that we are forcing the next element to be the same.
rs <- lapply(seq_len(length(r)-1) + 1, function(i) { r[[i]] <- 0; r; })
# ^^^^^^^^^^^^^^^^^^^^^^^^ or: seq_len(length(r))[-1]
str(rs[1:2])
# List of 2
# $ :List of 6
# ..$ : int [1:3] 1 2 3
# ..$ : num 0 <--- the second letter will repeat
# ..$ : int [1:2] 1 2
# ..$ : int [1:2] 1 2
# ..$ : int [1:2] 1 2
# ..$ : int [1:2] 1 2
# $ :List of 6
# ..$ : int [1:3] 1 2 3
# ..$ : int [1:2] 1 2
# ..$ : num 0 <--- the third letter will repeat
# ..$ : int [1:2] 1 2
# ..$ : int [1:2] 1 2
# ..$ : int [1:2] 1 2
### other rs's are similar
We can verify that this works as we think it should:
# rs[[1]] repeats the first 2
m <- t(apply(do.call(expand.grid, rs[[1]]), 1, cumsum) %% length(abc) + 1)
m[] <- abc[m]
head(as.data.frame(cbind(m, apply(m, 1, paste, collapse = ""))), n=3)
# Var1 Var2 Var3 Var4 Var5 Var6 V7
# 1 b b c a b c bbcabc
# 2 c c a b c a ccabca
# 3 a a b c a b aabcab
# rs[[3]] repeats the 3rd-4th
m <- t(apply(do.call(expand.grid, rs[[3]]), 1, cumsum) %% length(abc) + 1)
m[] <- abc[m]
head(as.data.frame(cbind(m, apply(m, 1, paste, collapse = ""))), n=3)
# Var1 Var2 Var3 Var4 Var5 Var6 V7
# 1 b c a a b c bcaabc
# 2 c a b b c a cabbca
# 3 a b c c a b abccab
From here, let's automate it by putting all of these into one list and lapplying them.
rs <- c(list(r), rs)
rets <- do.call(rbind.data.frame, c(stringsAsFactors=FALSE, lapply(rs, function(r) {
m <- t(apply(do.call(expand.grid, r), 1, cumsum) %% length(abc) + 1)
m[] <- abc[m]
as.data.frame(cbind(m, apply(m, 1, paste, collapse = "")), stringsAsFactors=FALSE)
})))
head(rets)
# Var1 Var2 Var3 Var4 Var5 Var6 V7
# 1 b c a b c a bcabca
# 2 c a b c a b cabcab
# 3 a b c a b c abcabc
# 4 b a b c a b babcab
# 5 c b c a b c cbcabc
# 6 a c a b c a acabca
tail(rets)
# Var1 Var2 Var3 Var4 Var5 Var6 V7
# 331 b c b a c c bcbacc
# 332 c a c b a a cacbaa
# 333 a b a c b b abacbb
# 334 b a c b a a bacbaa
# 335 c b a c b b cbacbb
# 336 a c b a c c acbacc
Walkthrough of additional steps:
rs <- c(list(r), rs) makes the first (non-repeating r) an enclosed list, then prepends it to the rs list.
lapply(rs, function(r) ...) does the ... from the previous question once for each element in the rs list. I named it r inside the anon-function to make it perfectly clear (inside the function) that each time it gets a new r, it does exactly the same steps as the last question.
do.call(rbind.data.frame, c(stringsAsFactors=FALSE, ... because each return from the lapply will be a data.frame, and we want to combine them into a single frame. I prefer no factors, but you can choose otherwise if you need. (Instead of rbind.data.frame, you could use data.table::rbindlist or dplyr::bind_rows, both without stringsAsFactors.)
Now the first 96 rows have no repeats, then the remaining five batches of 48 rows each (total 336 rows) have one repeat each. We "know" that 48 is the right number for each of the repeat-once lists, since by changing one of the positions from "1 2" possible to "0" (from 2 to 1 possible value) we halve the total number of possible combinations (96 / 2 == 48).
If for some reason your next question asks how to expand this to allow two repeats ... then I wouldn't necessarily recommend brute-forcing this aspect of it: there are 6 or 10 possible combinations (depending on if "aaa" is allowed) of repeats, and I would much prefer to go to a more programmatic handling than this brute-force appending of the one-constraint.

Related

Permutations of 3 elements within 6 positions

I'm looking to permute (or combine) c("a","b","c") within six positions under the condition to have always sequences with alternate elements, e.g abcbab.
Permutations could easily get with:
abc<-c("a","b","c")
permutations(n=3,r=6,v=abc,repeats.allowed=T)
I think is not possible to do that with gtools, and I've been trying to design a function for that -even though I think it may already exist.
Since you're looking for permutations, expand.grid can work as well as permutations. But since you don't want like-neighbors, we can shorten the dimensionality of it considerably. I think this is legitimate random-wise!
Up front:
r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
r[[1]] <- c(r[[1]], length(abc))
m <- t(apply(do.call(expand.grid, r), 1, cumsum) %% length(abc) + 1)
m[] <- abc[m]
dim(m)
# [1] 96 6
head(as.data.frame(cbind(m, apply(m, 1, paste, collapse = ""))))
# Var1 Var2 Var3 Var4 Var5 Var6 V7
# 1 b c a b c a bcabca
# 2 c a b c a b cabcab
# 3 a b c a b c abcabc
# 4 b a b c a b babcab
# 5 c b c a b c cbcabc
# 6 a c a b c a acabca
Walk-through:
since you want all recycled permutations of it, we can use gtools::permutations, or we can use expand.grid ... I'll use the latter, I don't know if it's much faster, but it does a short-cut I need (more later)
when dealing with constraints like this, I like to expand on the indices of the vector of values
however, since we don't want neighbors to be the same, I thought that instead of each row of values being the straight index, we cumsum them; by using this, we can control the ability of the cumulative sum to re-reach the same value ... by removing 0 and length(abc) from the list of possible values, we remove the possibility of (a) never staying the same, and (b) never increasing actually one vector-length (repeating the same value); as a walk-through:
head(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), n = 6)
# Var1 Var2 Var3 Var4 Var5 Var6
# 1 1 1 1 1 1 1
# 2 2 1 1 1 1 1
# 3 3 1 1 1 1 1
# 4 1 2 1 1 1 1
# 5 2 2 1 1 1 1
# 6 3 2 1 1 1 1
Since the first value can be all three values, it's 1:3, but each additional is intended to be 1 or 2 away from it.
head(t(apply(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), 1, cumsum)), n = 6)
# Var1 Var2 Var3 Var4 Var5 Var6
# [1,] 1 2 3 4 5 6
# [2,] 2 3 4 5 6 7
# [3,] 3 4 5 6 7 8
# [4,] 1 3 4 5 6 7
# [5,] 2 4 5 6 7 8
# [6,] 3 5 6 7 8 9
okay, that doesn't seem that useful (since it goes beyond the length of the vector), so we can invoke the modulus operator and a shift (since modulus returns 0-based, we want 1-based):
head(t(apply(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), 1, cumsum) %% 3 + 1), n = 6)
# Var1 Var2 Var3 Var4 Var5 Var6
# [1,] 2 3 1 2 3 1
# [2,] 3 1 2 3 1 2
# [3,] 1 2 3 1 2 3
# [4,] 2 1 2 3 1 2
# [5,] 3 2 3 1 2 3
# [6,] 1 3 1 2 3 1
To verify this works, we can do a diff across each row and look for 0:
m <- t(apply(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), 1, cumsum) %% 3 + 1)
any(apply(m, 1, diff) == 0)
# [1] FALSE
to automate this to an arbitrary vector, we enlist the help of replicate to generate the list of possible vectors:
r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
r[[1]] <- c(r[[1]], length(abc))
str(r)
# List of 6
# $ : int [1:3] 1 2 3
# $ : int [1:2] 1 2
# $ : int [1:2] 1 2
# $ : int [1:2] 1 2
# $ : int [1:2] 1 2
# $ : int [1:2] 1 2
and then do.call to expand it.
one you have the matrix of indices,
head(m)
# Var1 Var2 Var3 Var4 Var5 Var6
# [1,] 2 3 1 2 3 1
# [2,] 3 1 2 3 1 2
# [3,] 1 2 3 1 2 3
# [4,] 2 1 2 3 1 2
# [5,] 3 2 3 1 2 3
# [6,] 1 3 1 2 3 1
and then replace each index with the vector's value:
m[] <- abc[m]
head(m)
# Var1 Var2 Var3 Var4 Var5 Var6
# [1,] "b" "c" "a" "b" "c" "a"
# [2,] "c" "a" "b" "c" "a" "b"
# [3,] "a" "b" "c" "a" "b" "c"
# [4,] "b" "a" "b" "c" "a" "b"
# [5,] "c" "b" "c" "a" "b" "c"
# [6,] "a" "c" "a" "b" "c" "a"
and then we cbind the united string (via apply and paste)
Performance:
library(microbenchmark)
library(dplyr)
library(tidyr)
library(stringr)
microbenchmark(
tidy1 = {
gtools::permutations(n = 3, r = 6, v = abc, repeats.allowed = TRUE) %>%
data.frame() %>%
unite(united, sep = "", remove = FALSE) %>%
filter(!str_detect(united, "([a-c])\\1"))
},
tidy2 = {
filter(unite(data.frame(gtools::permutations(n = 3, r = 6, v = abc, repeats.allowed = TRUE)),
united, sep = "", remove = FALSE),
!str_detect(united, "([a-c])\\1"))
},
base = {
r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
r[[1]] <- c(r[[1]], length(abc))
m <- t(apply(do.call(expand.grid, r), 1, cumsum) %% length(abc) + 1)
m[] <- abc[m]
},
times=10000
)
# Unit: microseconds
# expr min lq mean median uq max neval
# tidy1 1875.400 2028.8510 2446.751 2165.651 2456.051 12790.901 10000
# tidy2 1745.402 1875.5015 2284.700 2000.051 2278.101 50163.901 10000
# base 796.701 871.4015 1020.993 919.801 1021.801 7373.901 10000
I tried the infix (non-%>%) tidy2 version just for kicks, and though I was confident it would theoretically be faster, I didn't realize it would shave over 7% off the run-times. (The 50163 is likely R garbage-collecting, not "real".) The price we pay for readability/maintainability.
There are probably cleaner methods, but here ya go:
abc <- letters[1:3]
library(tidyverse)
res <- gtools::permutations(n = 3, r = 6, v = abc, repeats.allowed = TRUE) %>%
data.frame() %>%
unite(united, sep = "", remove = FALSE) %>%
filter(!str_detect(united, "([a-c])\\1"))
head(res)
united X1 X2 X3 X4 X5 X6
1 ababab a b a b a b
2 ababac a b a b a c
3 ababca a b a b c a
4 ababcb a b a b c b
5 abacab a b a c a b
6 abacac a b a c a c
If you want a vector, you can use res$united or add %>% pull(united) as an additional step at the end of the pipes above.

Melt a array and make numeric values character

I have a array and I want to melt it based on the dimnames. The problem is that the dimension names are large numeric values and therefore making them character would convert them to a wrong ID see the example:
test <- array(1:18, dim = c(3,3,2), dimnames = list(c(00901291282245454545454,329293929929292,2929992929922929),
c("a", "b", "c"),
c("d", "e")))
library(reshape2)
library(data.table)
test2 <- data.table(melt(test))
test2[, Var1 := as.character(Var1)]
> test2
Var1 Var2 Var3 value
1: 9.01291282245455e+20 a d 1
2: 329293929929292 a d 2
3: 2929992929922929 a d 3
4: 9.01291282245455e+20 b d 4
5: 329293929929292 b d 5
6: 2929992929922929 b d 6
7: 9.01291282245455e+20 c d 7
8: 329293929929292 c d 8
9: 2929992929922929 c d 9
10: 9.01291282245455e+20 a e 10
11: 329293929929292 a e 11
12: 2929992929922929 a e 12
13: 9.01291282245455e+20 b e 13
14: 329293929929292 b e 14
15: 2929992929922929 b e 15
16: 9.01291282245455e+20 c e 16
17: 329293929929292 c e 17
18: 2929992929922929 c e 18
How could I make the first column with the large IDs character? What I am currently doing is pasting a character letter to the dimnames and then melt, making it a character and then take a substring, which is really inefficient. It is important that it is an efficient solution because the dataset is millions of rows. There are two problems,first the 0's are deleted if they are in front of the ID and it is converted to a e+20 character.
You need to define your dimnames as character and then slighly modify melt.array which is called when you do melt on your array:
test <- array(1:18, dim = c(3,3,2), dimnames = list(c("00901291282245454545454", "329293929929292", "2929992929922929"),
c("a", "b", "c"),
c("d", "e")))
Customise melt.array to add a parameter which permits to decide wether you want the conversion or not:
melt.array2 <- function (data, varnames = names(dimnames(data)), conv=TRUE, ...)
{
values <- as.vector(data)
dn <- dimnames(data)
if (is.null(dn))
dn <- vector("list", length(dim(data)))
dn_missing <- sapply(dn, is.null)
dn[dn_missing] <- lapply(dim(data), function(x) 1:x)[dn_missing]
if(conv){ # conv is the new parameter to know if conversion needs to be done
char <- sapply(dn, is.character)
dn[char] <- lapply(dn[char], type.convert)
}
indices <- do.call(expand.grid, dn)
names(indices) <- varnames
data.frame(indices, value = values)
}
Try the new function on your example (with conv=FALSE):
head(melt.array2(test, conv=FALSE))
# X1 X2 X3 value
# 1 00901291282245454545454 a d 1
# 2 329293929929292 a d 2
# 3 2929992929922929 a d 3
# 4 00901291282245454545454 b d 4
# 5 329293929929292 b d 5
# 6 2929992929922929 b d 6
EDIT
In the development version of reshape2 (devtools::install_github("hadley/reshape"), melt.array is differently defined and you can use parameter as.is to avoid the conversion:
melt(test, as.is=TRUE)
will give you the same result as above (with Var1 etc instead of X1 etc).

Replace corresponding parts of one array with another array in R

I have a array/ named vector that looks like this:
d f g
1 2 3
I want to fill up the empty slots, meaning I want this:
a b c d e f g
0 0 0 1 0 2 3
Is there an elegant way of doing this, without having to write loops and conditionals? In my actual problem, instead of abcd as my array names, it's numbers. Not sure if that makes a difference. Figured alphabet is easier to understand for a reproducible example.
Create a vector of the final names, nms and then create a named vector of zeros from it using sapply and replace the elements corresponding to input names with the input values.
v <- c(d = 1, f = 2, g = 3) # input
nms <- letters[letters <= max(names(v))] # names on output vector, i.e. letters[1:7]
replace(sapply(nms, function(x) 0), names(v), v) ##
giving:
a b c d e f g
0 0 0 1 0 2 3
If in your actual vector the names are not letters then just set nms yourself. For example, nms <- c("dogs", "cats", "d", "elephants", "f", "g") would work with the same line marked ## above.
2) An alternative is to replace the line marked ## above with:
unlist(modifyList(as.list(setNames(numeric(length(nms)), nms)), as.list(v)))
Data
x <- c(d=1L,f=2L,g=3L);
x;
## d f g
## 1 2 3
Solution 1: First match new names into x and extract values, then replace NAs with zero.
x <- setNames(x[match(letters[1:7],names(x))],letters[1:7]);
x[is.na(x)] <- 0L;
x;
## a b c d e f g
## 0 0 0 1 0 2 3
Solution 2: One-liner, using nomatch argument of match().
setNames(c(x,0L)[match(letters[1:7],names(x),nomatch=length(x)+1L)],letters[1:7]);
## a b c d e f g
## 0 0 0 1 0 2 3

How can I access components of list elements in R

I have a ragged list that I would like to work with. i.e. I would like to use an apply function to quickly and simply pull out elements from the lists. The following code attempts to approximate my situation:
vec1 <- c("B","D","E","NA")
vec2 <- c("B","D","E","NA")
vec3 <- c("B","C","E","NA")
write.table(vec1, file="./vec1.csv", sep=",", quote=F)
write.table(vec2, file="./vec2.csv", sep=",", quote=F)
write.table(vec3, file="./vec3.csv", sep=",", quote=F)
vectors.files <- list.files(path=getwd(),recursive=F, pattern=paste("*.csv",sep=""))
vectors.list <- lapply(vectors.files, read.csv)
How would I then be able to create a new object that was for example the second row of each list element in vectors.list?
Thanks,
Matt
It's not really clear what you're after as the final output format, but you might want to try variations on the following template:
lapply(vectors.list, function(x) x[2, , drop = FALSE])
# [[1]]
# x
# 2 D
#
# [[2]]
# x
# 2 D
#
# [[3]]
# x
# 2 C
Here, we've just passed an anonymous function (function(x)) to the items in your "vectors.list". In this case, we've used basic subsetting using [ to extract the second row. The drop = FALSE is to retain the data.frame structure since the result is a single-column data.frame (which normally simplifies to a vector).
Note that the data.frames in the resulting list still have all the original levels for the "x" factor. Use droplevels if you want to retain only the specific factor in that row.
Compare:
str(lapply(vectors.list, function(x) x[2, , drop = FALSE]))
# List of 3
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 3 levels "B","D","E": 2
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 3 levels "B","D","E": 2
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 3 levels "B","C","E": 2
str(lapply(vectors.list, function(x) droplevels(x[2, , drop = FALSE])))
# List of 3
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 1 level "D": 1
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 1 level "D": 1
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 1 level "C": 1
You may also want to explore as.character(unlist(x[2, ]).
If you store your vectors in a data frame you can subset.
> df <- data.frame(vectors.list)
> row2 <- df[2,]
> row2
x x.1 x.2
2 D D C

R: Combine a list to a data frame

I have a array of names and a function that returns a data frame. I want to combine this array and data frame. For e.g.:
>mynames<-c("a", "b", "c")
>df1 <- data.frame(val0=c("d", "e"),val1=4:5)
>df2 <- data.frame(val1=c("e", "f"),val2=5:6)
>df3 <- data.frame(val2=c("f", "g"),val3=6:7)
What I want is a data frame that joins this array with data frame. df1 corresponds to "a", df2 corresponds to "b" and so on. So, the final data frame looks like this:
Names Var Val
a d 4
a e 5
b e 5
b f 6
c f 6
c g 7
Can someone help me on this?
Thanks.
This answers this particular question, but I'm not sure how much help it will be for your actual problem:
myList <- list(df1, df2, df3)
do.call(rbind,
lapply(seq_along(mynames), function(x)
cbind(Names = mynames[x], setNames(myList[[x]],
c("Var", "Val")))))
# Names Var Val
# 1 a d 4
# 2 a e 5
# 3 b e 5
# 4 b f 6
# 5 c f 6
# 6 c g 7
Here, we create a list of your data.frames, and in our lapply call, we add in the new "Names" column and rename the existing columns so that we can use rbind to put them all together.

Resources