I have a data set grouped by clusters and would like to
find the covariance matrix of each pair of clusters.
For example:
df %>%
filter(clust %in% c(1,2)) %>%
cov()
Is there a way to do so for many clusters and
save the output in a tibble?
The following code run but it is not pretty
cv12 <-
df %>%
filter(clust %in% c(1,2)) %>%
cov()
cv13 <-
df %>%
filter(clust %in% c(1,3)) %>%
cov()
cv14 <-
df %>%
filter(clust %in% c(1,4)) %>%
cov()
cv <- rbind(cv12,cv13,cv14)
Related
I have seven variables X1,...,X7
I need to crosstabulate all my variables: X1 with 6 others.
Is it possible to do it and how with tbl_cross?
AGR %>%
tbl_cross(
row=X1,
col =X2,
percent = "row",
digits = c(0, 1))
Is this what you're after?
library(gtsummary)
library(tidyverse)
packageVersion("gtsummary")
#> [1] '1.6.0'
tbl <-
c("stage", "grade") %>%
map(
~ trial %>%
tbl_cross(row = all_of(.x), col = "trt", margin = "col") %>%
bold_labels()
) %>%
tbl_stack()
Created on 2022-05-20 by the reprex package (v2.0.1)
I would like to perform several t.tests to check the math scores differences between base conditionally on quest. After, check the english scores and the the science scores. By "hand", I'll have to loop within the dataset to do the following:
ds %>% filter(quest == "age_10") %>% {t.test(math ~ base, data = .)$p.value}
ds %>% filter(quest == "age_10") %>% {t.test(english ~ base, data = .)$p.value}
ds %>% filter(quest == "age_10") %>% {t.test(science ~ base, data = .)$p.value}
ds %>% filter(quest == "age_12") %>% {t.test(math ~ base, data = .)$p.value}
ds %>% filter(quest == "age_12") %>% {t.test(english ~ base, data = .)$p.value}
ds %>% filter(quest == "age_12") %>% {t.test(science ~ base, data = .)$p.value}
(etc)
Visually, my question is below
My hunt was almost there
ds %>%
select(quest, base, math:science) %>%
pivot_longer(cols = -c(quest, base))%>%
group_by(quest) %>%
summarise(pout = list(broom::tidy(t.test(value ~ base, data = .)$p.value))) %>%
unnest(pout) %>%
as.data.frame()
CODE
ds <- data.frame(quest = rep(c("age_10","age_12","age_14","age_16"), each=10),
base = c("base1","base2"),
math = rnorm(80,10,2),
english = rnorm(80,8,1),
science = rnorm(80,13,1))
if someone is looking for this answer, I could figure out the solution
Please use this following code
ds %>%
select(quest, base, math:science) %>%
pivot_longer(cols = -c(quest, base)) %>%
group_by(quest, name) %>%
nest() %>%
mutate(p = map(data, ~t.test(.x$value ~ .x$base)$p.value)) %>%
unnest(p) %>%
select(-data)
I fitted a logistic regression model in 10-fold cv. I can use the pROC package to get the AUC but it seems the AUC is not for the 10-fold CV because the cvAUC library gave a different AUC. I suspect the AUC from pROC is for one fold. Please how can extract the joint AUC for the 10-fold using the pROC library?
data(iris)
data <- iris[which(iris$Species=="setosa" | iris$Species=="versicolor"),]
data$ID <- seq.int(nrow(data))
table(data$Species)
data$Species <-as.factor(data$Species)
confusion_matrices <- list()
accuracy <- c()
for (i in c(1:10)) {
set.seed(3456)
folds <- caret::createFolds(data$Species, k = 10)
test <- data[data$ID %in% folds[[i]], ]
train <- data[data$ID %in% unlist(folds[-i]), ]
model1 <- glm(as.factor(Species)~ ., family = binomial, data = train)
summary(model1)
pred <- predict(model1, newdata = test, type = "response")
predR <- as.factor( pred >= 0.5)
df <- data.frame(cbind(test$Species, predR))
df_list <- lapply(df, as.factor)
confusion_matrices[[i]] <- caret::confusionMatrix(df_list[[2]], df_list[[1]])
accuracy[[i]] <- confusion_matrices[[i]]$overall["Accuracy"]
}
library(pander)
library(dplyr)
names(accuracy) <- c("Fold 1",....,"Fold 10")
accuracy %>%
pander::pandoc.table()
mean(accuracy)
I have many data frames of stock indices. Each has a price column
Here is an simple example
index1 <- c(2,3,5)
colnames(index1) <- "price"
index2 <- c(3,5,6)
colnames(index2) <- "price"
I put them in a list and I want to create their log return with the index name as the prefix. I can only create the log return column with:
indexList <- list(index1,index2)
indexList <- lapply(indexList, function(i) {
i <- i %>% mutate(log.ret = c(diff(log(price)),NA))
} )
What I want is
index1.log.ret
index2.log.ret
I need to convert a dataframe to an array of 3 dimensions. All columns in the dataframe are numeric. What is an elegant and/or efficient way to accomplish this?
Example:
x <- 1:3
y <- 1:3
g <- t(vapply(x, function(x){
vapply(y, function(y){
as.numeric(paste(x,y,sep="."))}, numeric(1))}, numeric(3)))
gdf <- data.frame( cbind(rep(1:3,each=3), rbind(g, g*2, g*3)) )
I want to convert "gdf" to an array where gdf$x1 defines the third dimension. The result would look like this:
ga <- array( c(g, g*2, g*3), dim=c(3,3,3) )
Thanks!
This works with your example, I hope it will be general enough for you:
gb <- aperm(array(unlist(gdf[, -1]), c(3, 3, 3)), c(1, 3, 2))
identical(ga, gb)
# [1] TRUE
I also found this way using the package abind:
abind( split(gdf, gdf$X1), along=3)