I have a vector of elements
p1 p2 p3 p4 ...
and I'm trying to build, from them, the following matrix
p1 p2 p3 1 1 1 1 1 1 1 1 1 ...
p1 p2 p3 p4 p5 p6 1 1 1 1 1 1 ...
p1 p2 p3 p4 p5 p6 p7 p8 p9 1 1 1 ...
p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 ...
and so on. I tried to use
blockmatrix
For example
p<-1:6
A<-head(p,3)
B<-c(1,1,1)
D<-tail(p,3)
blockmatrix(names=c("A","B","C","D"),A=A,C=B,B=A,D=D,dim=c(2,2))
But the problem is to obtain block matrices of larger size. (This code works only for two blocks, for three block I would add other labels in the field "names", and so on for four blocks, five...)
This is another solution, written after the read of answers:
x<-2:10
mat<- t(replicate(3, x))
mat[col(mat)>3*row(mat)] <- 1
This will give you a dataframe output, but you can transform it into a matrix.
library(dplyr)
library(tidyr)
# example vector
x = c(10,11,12,13,14,15)
expand.grid(id_row=1:length(x), x=x) %>% # combine vector values and a sequence of numbers (id = row positions)
group_by(id_row) %>% # for each row position
mutate(id_col = row_number()) %>% # create a vector of column positions (needed for reshaping later)
ungroup() %>% # forget the grouping
mutate(x = ifelse(id_col > id_row, 1, x), # replace values with 1 where necessary
id_col = paste0("Col_", id_col)) %>% # update names of this variable
spread(id_col, x) %>% # reshape data
select(-id_row) # remove unnecessary column
# # A tibble: 6 x 6
# Col_1 Col_2 Col_3 Col_4 Col_5 Col_6
# * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 10 1 1 1 1 1
# 2 10 11 1 1 1 1
# 3 10 11 12 1 1 1
# 4 10 11 12 13 1 1
# 5 10 11 12 13 14 1
# 6 10 11 12 13 14 15
It is not clear about all the conditions. Here is a base R option
m1 <- t(replicate(length(x), x))
m1[upper.tri(m1)] <- 1
m1
# [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] 10 1 1 1 1 1
#[2,] 10 11 1 1 1 1
#[3,] 10 11 12 1 1 1
#[4,] 10 11 12 13 1 1
#[5,] 10 11 12 13 14 1
#[6,] 10 11 12 13 14 15
data
x <- c(10,11,12,13,14,15)
this should give you desired output, in Base R
a<- c("p1","p2","p3","p4","p5","p6","p7","p8","p9","p10","p11","p12")
lena <- length(a)
b<- matrix(data=rep(a,lena%/%3),nrow=lena%/%3,ncol = lena,byrow=T)
for (i in (1:(nrow(b)-1)))
{
for (j in ((3*i+1):ncol(b)))
{
b[i,j] <- 1
}
}
Related
I'm looking to permute (or combine) c("a","b","c") within six positions under the condition to have always sequences with alternate elements, e.g abcbab.
Permutations could easily get with:
abc<-c("a","b","c")
permutations(n=3,r=6,v=abc,repeats.allowed=T)
I think is not possible to do that with gtools, and I've been trying to design a function for that -even though I think it may already exist.
Since you're looking for permutations, expand.grid can work as well as permutations. But since you don't want like-neighbors, we can shorten the dimensionality of it considerably. I think this is legitimate random-wise!
Up front:
r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
r[[1]] <- c(r[[1]], length(abc))
m <- t(apply(do.call(expand.grid, r), 1, cumsum) %% length(abc) + 1)
m[] <- abc[m]
dim(m)
# [1] 96 6
head(as.data.frame(cbind(m, apply(m, 1, paste, collapse = ""))))
# Var1 Var2 Var3 Var4 Var5 Var6 V7
# 1 b c a b c a bcabca
# 2 c a b c a b cabcab
# 3 a b c a b c abcabc
# 4 b a b c a b babcab
# 5 c b c a b c cbcabc
# 6 a c a b c a acabca
Walk-through:
since you want all recycled permutations of it, we can use gtools::permutations, or we can use expand.grid ... I'll use the latter, I don't know if it's much faster, but it does a short-cut I need (more later)
when dealing with constraints like this, I like to expand on the indices of the vector of values
however, since we don't want neighbors to be the same, I thought that instead of each row of values being the straight index, we cumsum them; by using this, we can control the ability of the cumulative sum to re-reach the same value ... by removing 0 and length(abc) from the list of possible values, we remove the possibility of (a) never staying the same, and (b) never increasing actually one vector-length (repeating the same value); as a walk-through:
head(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), n = 6)
# Var1 Var2 Var3 Var4 Var5 Var6
# 1 1 1 1 1 1 1
# 2 2 1 1 1 1 1
# 3 3 1 1 1 1 1
# 4 1 2 1 1 1 1
# 5 2 2 1 1 1 1
# 6 3 2 1 1 1 1
Since the first value can be all three values, it's 1:3, but each additional is intended to be 1 or 2 away from it.
head(t(apply(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), 1, cumsum)), n = 6)
# Var1 Var2 Var3 Var4 Var5 Var6
# [1,] 1 2 3 4 5 6
# [2,] 2 3 4 5 6 7
# [3,] 3 4 5 6 7 8
# [4,] 1 3 4 5 6 7
# [5,] 2 4 5 6 7 8
# [6,] 3 5 6 7 8 9
okay, that doesn't seem that useful (since it goes beyond the length of the vector), so we can invoke the modulus operator and a shift (since modulus returns 0-based, we want 1-based):
head(t(apply(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), 1, cumsum) %% 3 + 1), n = 6)
# Var1 Var2 Var3 Var4 Var5 Var6
# [1,] 2 3 1 2 3 1
# [2,] 3 1 2 3 1 2
# [3,] 1 2 3 1 2 3
# [4,] 2 1 2 3 1 2
# [5,] 3 2 3 1 2 3
# [6,] 1 3 1 2 3 1
To verify this works, we can do a diff across each row and look for 0:
m <- t(apply(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), 1, cumsum) %% 3 + 1)
any(apply(m, 1, diff) == 0)
# [1] FALSE
to automate this to an arbitrary vector, we enlist the help of replicate to generate the list of possible vectors:
r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
r[[1]] <- c(r[[1]], length(abc))
str(r)
# List of 6
# $ : int [1:3] 1 2 3
# $ : int [1:2] 1 2
# $ : int [1:2] 1 2
# $ : int [1:2] 1 2
# $ : int [1:2] 1 2
# $ : int [1:2] 1 2
and then do.call to expand it.
one you have the matrix of indices,
head(m)
# Var1 Var2 Var3 Var4 Var5 Var6
# [1,] 2 3 1 2 3 1
# [2,] 3 1 2 3 1 2
# [3,] 1 2 3 1 2 3
# [4,] 2 1 2 3 1 2
# [5,] 3 2 3 1 2 3
# [6,] 1 3 1 2 3 1
and then replace each index with the vector's value:
m[] <- abc[m]
head(m)
# Var1 Var2 Var3 Var4 Var5 Var6
# [1,] "b" "c" "a" "b" "c" "a"
# [2,] "c" "a" "b" "c" "a" "b"
# [3,] "a" "b" "c" "a" "b" "c"
# [4,] "b" "a" "b" "c" "a" "b"
# [5,] "c" "b" "c" "a" "b" "c"
# [6,] "a" "c" "a" "b" "c" "a"
and then we cbind the united string (via apply and paste)
Performance:
library(microbenchmark)
library(dplyr)
library(tidyr)
library(stringr)
microbenchmark(
tidy1 = {
gtools::permutations(n = 3, r = 6, v = abc, repeats.allowed = TRUE) %>%
data.frame() %>%
unite(united, sep = "", remove = FALSE) %>%
filter(!str_detect(united, "([a-c])\\1"))
},
tidy2 = {
filter(unite(data.frame(gtools::permutations(n = 3, r = 6, v = abc, repeats.allowed = TRUE)),
united, sep = "", remove = FALSE),
!str_detect(united, "([a-c])\\1"))
},
base = {
r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
r[[1]] <- c(r[[1]], length(abc))
m <- t(apply(do.call(expand.grid, r), 1, cumsum) %% length(abc) + 1)
m[] <- abc[m]
},
times=10000
)
# Unit: microseconds
# expr min lq mean median uq max neval
# tidy1 1875.400 2028.8510 2446.751 2165.651 2456.051 12790.901 10000
# tidy2 1745.402 1875.5015 2284.700 2000.051 2278.101 50163.901 10000
# base 796.701 871.4015 1020.993 919.801 1021.801 7373.901 10000
I tried the infix (non-%>%) tidy2 version just for kicks, and though I was confident it would theoretically be faster, I didn't realize it would shave over 7% off the run-times. (The 50163 is likely R garbage-collecting, not "real".) The price we pay for readability/maintainability.
There are probably cleaner methods, but here ya go:
abc <- letters[1:3]
library(tidyverse)
res <- gtools::permutations(n = 3, r = 6, v = abc, repeats.allowed = TRUE) %>%
data.frame() %>%
unite(united, sep = "", remove = FALSE) %>%
filter(!str_detect(united, "([a-c])\\1"))
head(res)
united X1 X2 X3 X4 X5 X6
1 ababab a b a b a b
2 ababac a b a b a c
3 ababca a b a b c a
4 ababcb a b a b c b
5 abacab a b a c a b
6 abacac a b a c a c
If you want a vector, you can use res$united or add %>% pull(united) as an additional step at the end of the pipes above.
I need to convert a 3 dimensional array into a data.frame.
Example:
#create fake data
#two vectors
vec1 = c(2,13,22,98,4,8,8,1,10)
vec2 = c(2,4,6,7,1,55,32,12,1)
#3 dim array
result = array(c(vec1,vec2),dim = c(3,3,2))
print(result)
, , 1
[,1] [,2] [,3]
[1,] 2 98 8
[2,] 13 4 1
[3,] 22 8 10
, , 2
[,1] [,2] [,3]
[1,] 2 7 32
[2,] 4 1 12
[3,] 6 55 1
How can I get the following 9 col data.frame, where letters are colnames (it could be also default values..) and each row represents the result[,,i] slice:
a b c d e f g h i
2 13 22 98 4 8 8 1 10
2 4 6 7 1 55 32 12 1
my real array dim = 140, 200, 20000
thanks
arrays and matrices are vectors with dimensions, so recast to a matrix, then data.frame:
data.frame(matrix(result, nrow=2, byrow=TRUE))
# generalisably:
data.frame(matrix(result, nrow=dim(result)[3], byrow=TRUE))
# X1 X2 X3 X4 X5 X6 X7 X8 X9
#1 2 13 22 98 4 8 8 1 10
#2 2 4 6 7 1 55 32 12 1
You could try mapply:
as.data.frame(mapply(function(x, y) c(x, y), result[,,1], result[,,2]))
# V1 V2 V3 V4 V5 V6 V7 V8 V9
#1 2 13 22 98 4 8 8 1 10
#2 2 4 6 7 1 55 32 12 1
This question already has answers here:
given value of matrix, getting it's coordinate
(2 answers)
Closed 7 years ago.
For example, if we have a matrix or say array with the following format
How can we find the index of rows or columns which only have numbers between 10 to 20 inside ?
M = array(c(1,1,12,34,0,19,15,1,0,17,12,0,21,1,11,1), dim=c(4,4))
And, also, I am not allowed to use for or while loops to do this.
Another thing is that the matrix or array may have a more than 2 dimensions. if the method can also apply to multi-dimensional matrix or array, it will be better for me. Thanks.
Instead of trying to find the index of qualified single elements, I need to find those rows or columns in which all the elements are between the interval.
In this example, I hope to have a result telling me that Row number 3 is a row that all the numbers within this row are between 10 to 20.
Use which(..., arr.ind = TRUE). Here I assume between means 10 and 20 are non-inclusive
which(M > 10 & M < 20, arr.ind = TRUE)
# row col
# [1,] 3 1
# [2,] 2 2
# [3,] 3 2
# [4,] 2 3
# [5,] 3 3
# [6,] 3 4
This will also work on 3-dimensional arrays (and higher).
## Three dimensions
dim(M) <- c(2, 4, 2)
which(M > 10 & M < 20, arr.ind = TRUE)
# dim1 dim2 dim3
# [1,] 1 2 1
# [2,] 2 3 1
# [3,] 1 4 1
# [4,] 2 1 2
# [5,] 1 2 2
# [6,] 1 4 2
## Four dimensions
dim(M) <- rep(2, 4)
which(M > 10 & M < 20, arr.ind = TRUE)
# dim1 dim2 dim3 dim4
# [1,] 1 2 1 1
# [2,] 2 1 2 1
# [3,] 1 2 2 1
# [4,] 2 1 1 2
# [5,] 1 2 1 2
# [6,] 1 2 2 2
## ... and so on
Note: To include 10 and 20, just use M >= 10 & M <= 20
Data:
M <- structure(c(1, 1, 12, 34, 0, 19, 15, 1, 0, 17, 12, 0, 21, 1,
11, 1), .Dim = c(4L, 4L))
Update: From your edit, you can find the row numbers for which all values are between 10 and 20 with
which(rowSums(M >= 10 & M <= 20) == ncol(M))
# [1] 3
I have an excel file (.csv) with a sorted column of variable names such as "QW1I1K5" and numerical values against them.
this list goes on for
W from 1 to 15
I from 1 to 4
K from 1 to 30
total elements = 15*4*30 = 1800
I want to store the numerical values against these variables in an array whose indices are derived from the variable name .
for example QW1I1K5 has a value 11 . this must be stored in an array element Q[1,1,5] = 11 ( index set of [1,1,5] corresponds to W1 , I1 , K5)
May be this helps
Q <- array(dat$Col2, dim=c(15,4,30))
dat$Col2[dat$Col1=='QW1I1K5']
#[1] 34
Q[1,1,5]
#[1] 34
dat$Col2[dat$Col1=='QW4I3K8']
#[1] 38
Q[4,3,8]
#[1] 38
If you want the index along with the values
library(reshape2)
d1 <- melt(Q)
head(d1,3)
# Var1 Var2 Var3 value
#1 1 1 1 12
#2 2 1 1 9
#3 3 1 1 29
Q[1,1,1]
#[1] 12
Q[3,1,1]
#[1] 29
Update
Suppose, your data is in the order as you described in the comments, which will be dat1
indx <- read.table(text=gsub('[^0-9]+', ' ', dat1$Col1), header=FALSE)
dat2 <- dat1[do.call(order, indx[,3:1]),]
Q1 <- array(dat2$Col2,dim=c(15,4,30))
Q1[1,1,2]
#[1] 20
dat2$Col2[dat2$Col1=='QW1I1K2']
#[1] 20
data
Col1 <- do.call(paste,c(expand.grid('QW', 1:15, 'I', 1:4, 'K',1:30),
list(sep='')))
set.seed(24)
dat <- data.frame(Col1, Col2=sample(1:40, 1800,replace=TRUE))
dat1 <- dat[order(as.numeric(gsub('[^0-9]+', '', dat$Col1))),]
row.names(dat1) <- NULL
I would suggest looking at using "data.table" and setting your key to the split columns. You can use cSplit from my "splitstackshape" function to easily split the column.
Sample Data:
df <- data.frame(
V1 = c("QW1I1K1", "QW1I1K2", "QW1I1K3",
"QW1I1K4", "QW2I1K5", "QW2I3K2"),
V2 = c(15, 20, 5, 6, 7, 9))
df
# V1 V2
# 1 QW1I1K1 15
# 2 QW1I1K2 20
# 3 QW1I1K3 5
# 4 QW1I1K4 6
# 5 QW2I1K5 7
# 6 QW2I3K2 9
Splitting the column:
library(splitstackshape)
out <- cSplit(df, "V1", "[A-Z]+", fixed = FALSE)
setnames(out, c("V2", "W", "I", "K"))
setcolorder(out, c("W", "I", "K", "V2"))
setkey(out, W, I, K)
out
# W I K V2
# 1: 1 1 1 15
# 2: 1 1 2 20
# 3: 1 1 3 5
# 4: 1 1 4 6
# 5: 2 1 5 7
# 6: 2 3 2 9
Extracting rows:
out[J(1, 1, 4)]
# W I K V2
# 1: 1 1 4 6
out[J(2, 3, 2)]
# W I K V2
# 1: 2 3 2 9
How can I convert A
A <- c(1,2,3,4,5,6,7,8,9)
to B
B <- c(0,0,1,2,3,0,0,4,5,6,0,0,7,8,9)
I tried this:
A <-c(1,2,3,4,5,6,7,8,9)
rows <- length(A)/3
dim(a) <- c(rows,3)
B <- matrix(0,rows,2+3)
B[,3:5] <- A
c(B)
but it doesn't work.
Why not to transform B with A:
b <- rep(c(0,0,1,1,1),time=length(A)/3)
b[b==1] <- A
b
[1] 0 0 1 2 3 0 0 4 5 6 0 0 7 8 9
Assuming that A has nonzero length and has length divisible by 3:
> c(vapply(seq(length(A)/3)-1,
function(x) c(0,0,A[(x*3+1):(x*3+3)]),
numeric(3+2)
)
)
[1] 0 0 1 2 3 0 0 4 5 6 0 0 7 8 9
It's not the prettiest line of code, but something like this should work too:
as.vector(sapply(split(A, rep(1:(length(A)/3), each = 3)),
function(x) c(0, 0, x)))
# [1] 0 0 1 2 3 0 0 4 5 6 0 0 7 8 9
First, we split the vector into sets of 3, then append two zeros to each set, and reconvert it to a vector.
Update
Here is an approach that is probably similar to the process you were thinking of:
A <- c(1,2,3,4,5,6,7,8,9) # Your vector
dim(A) <- c(3, 3) # As a matrix
B <- matrix(0, nrow=5, ncol=3) # An empty matrix to fill
B[c(3:5), ] <- A # We only want to fill these rows
dim(B) <- NULL # Remove the dims to get back to a vector
B # View your handiwork
# [1] 0 0 1 2 3 0 0 4 5 6 0 0 7 8 9
Build a receiving vector to be 5/3 the length of the original and fill in the correct entires calculated with modulo arithmetic:
> bb <- vector(length= length(A)*1.67) # Will initially be logical vector
> bb[ !seq_along(bb) %% 5 %in% 1:2 ] <- A # FALSE entries coerced to 0
> bb
[1] 0 0 1 2 3 0 0 4 5 6 0 0 7 8 9
The positions at modulo remainder 3,4,and 0 get sequentially filled with the values in A.
Here's another vectorized solution that is probably more in the spirit of your earlier efforts:
> c( rbind( matrix(0, nrow=2, ncol=length(A)/3),
matrix(A, nrow=3) ) )
[1] 0 0 1 2 3 0 0 4 5 6 0 0 7 8 9
And this shows that your solution was correct except for using column indexing when you should have been using row indexing on the LHS of the matrix assignment:
> B <- matrix(0, 5, 3)
> B[3:5, ] <- A
> B
[,1] [,2] [,3]
[1,] 0 0 0
[2,] 0 0 0
[3,] 1 4 7
[4,] 2 5 8
[5,] 3 6 9
> c(B)
[1] 0 0 1 2 3 0 0 4 5 6 0 0 7 8 9
Silly function I wrote for this, with some customizable features:
PastingZeros <- function (divide = 3, data = A, amountofzeros = 2) {
B <- NULL
n <- length(data)
index <- 1:n
for (i in 1:(n / divide)) {
B <- c(B, rep(0, amountofzeros), data[index[1:divide]])
index <- index[-1:-divide]
}
return(B)
}
PastingZeros(3, A, 0)
I don't see a solution that does not assume that A has a length that is a multiple of 3, so I'll throw one:
insert.every <- function(x, insert, every)
unlist(lapply(split(x, (seq_along(x)-1) %/% every), append, x = insert),
use.names = FALSE)
insert.every(1:9, c(0,0), 3)
# [1] 0 0 1 2 3 0 0 4 5 6 0 0 7 8 9
insert.every(1:9, c(0,0), 4)
# [1] 0 0 1 2 3 4 0 0 5 6 7 8 0 0 9