Error: unsupported use of matrix or array for column indexing - arrays

I have a list of variables name "comorbid_names". And I want to select people who have those comorbidities in "comorbidities". However, I want to select the variable names if they are true.
For example patient 1 has "chd" only, therefore only that will be displayed as TRUE
comorbid_names
[1] "chd" "heart_failure" "stroke"
[4] "hypertension" "diabetes" "copd"
[7] "epilepsy" "hypothyroidism" "cancer"
[10] "asthma" "ckd_stage3" "ckd_stage4"
[13] "ckd_stage5" "atrial_fibrilation" "learning_disability"
[16] "peripheral_arterial_disease" "osteoporosis"
class(comorbid_names)
[1] "character"
comorbidities <- names(p[, comorbid_names][p[, comorbid_names] == 1])
At this point I get this error
Error: Unsupported use of matrix or array for column indexing
I am not entirely sure why, but I think it's to do with comorbid_names being character
Does anyone have an advice?

If p is a tibble as opposed to or in addition to a data.frame, you might be dealing with the following:
https://blog.rstudio.org/2016/03/24/tibble-1-0-0/
Look at the bottom of the post:
Interacting with legacy code
A handful of functions are don’t work with tibbles because they expect df[, 1] to return a vector, not a data frame. If you encounter one of these functions, use as.data.frame() to turn a tibble back to a data frame:
class(as.data.frame(tbl_df(iris)))
You might get along by doing p <- as.data.frame(p) as well.

Simply using p[, comorbid_names] == 1 will gave you the table of TRUE/FALSE values for your selected morbidities. To add the patient names or IDs to that list, use cbind, like this: cbind(p["patient_id"], p[, comorbid_names] == 1) where "patient_id" is the name of the column that identifies patients.
Here's a complete reproducible example:
comorbid_names <- c("chd", "heart_failure","stroke", "hypertension",
"diabetes", "copd", "epilepsy", "hypothyroidism",
"cancer", "asthma", "ckd_stage3", "ckd_stage4",
"ckd_stage5", "atrial_fibrilation", "learning_disability",
"peripheral_arterial_disease", "osteoporosis")
all_morbidities <- c("chd", "heart_failure","stroke", "hypertension",
"diabetes", "copd", "epilepsy", "hypothyroidism",
"cancer", "asthma", "ckd_stage3", "ckd_stage4",
"ckd_stage5", "atrial_fibrilation", "learning_disability",
"peripheral_arterial_disease", "osteoporosis",
"hairyitis", "jellyitis", "transparency")
# Create dummy data frame "p" with patient ids and whether or not they suffer from each condition
patients <- data.frame(patient_id = 1:20)
conditions <- matrix(sample(0:1, nrow(patients)*length(all_morbidities), replace=TRUE),
nrow(patients),
length(all_morbidities))
p <- cbind(patients, conditions)
names(p) <- c(names(patients), all_morbidities)
# Final step: get patient IDs and whether they suffer from specific morbidities
comorbidities <- cbind(p["patient_id"], p[, comorbid_names] == 1)
If you want to select only those patients that suffer from at least one of the morbidities, do this:
comorbidities[rowSums(comorbidities[-1]) != 0]

Related

Extraction of matched dataset from MatchThem

I have browsed almost all possible pages on the subject and I still can't find a way to extract a matched data dataset with the MatchThem package.
By analogy, MatchIt allows via the function match.data() to extract the dataset of matched data for example 3:1. Although MatchThem's complete() function is the equivalent, this function apparently does not allow to extract exclusively the imputed AND matched dataset.
Here is an example of multiple imputation with 3:1 matching from which I am trying to extract multiple matched datasets:
library(mice)
library(MatchThem)
#Multiple imputations
mids_object <- mice(data, maxit = 5, m=3, seed= 20211022, printFlag = F) # m=3 is voluntarily low for this example.
#Matching
mimids_object <- matchthem(primary_subtype ~ age + bmi + ps, data = mids_object, approach = "within" ,ratio= 3, method = "optimal")
#Details of matched data
print(mimids_object)
Printing | dataset: #1
A matchit object
method: Variable ratio 3:1 optimal pair matching
distance: Propensity score
- estimated with logistic regression
number of obs: 761 (original), 177 (matched)
target estimand: ATT
covariates: age, bmi, ps
#Extracting matched dataset
complete(mimids_object, action = "long") -> complete_mi_matched
#Summary of extracted dataset to check correct number of match
summary(complete_mi_matched$primary_subtype)
classic ADK SRC
702 59
It should show the matched proportion 3:1 with 177 matched (177 classic ADK and 59 SRC)
I am missing something. Thanks in advance for your help or suggestions.

Contains value pairs across two fields - Groovy

I have a data set with two fields: Department_ID and Vendor_ID.
I want to flag any record where Department_ID equals a certain value and Vendor_ID equals a certain value. However, I have many pairs I'd like to flag. Without having to have a long script that repeats information redundantly, I want something short and easy.
Old logic:
${(DPT=='10'&& VND=='234' || DPT=='9'&& VND=='13' || DPT=='200'&& VND=='4987' || DPT=='598' && VND=='123')?"Yes":"NO"}
Preferred logic:
${['10','9','200','598'].contains(DPT)&&['234','13','4987','123'].contains(VND)?"Yes":"No"}
The issue with the preferred logic is that it would flag records that meet any combination of those values being present. ie, it would flag a record that had DPT=='10'&& VND=='13'... I need the contain feature but have it only look for certain pairs.
You can put the pairs into a set and then check, if the set contains them. E.g.
def known = [
// DPT, VND
["10","123"],
["20","456"],
].toSet()
assert known.contains(["30","789"])==false
assert known.contains(["10","789"])==false
assert known.contains(["30","123"])==false
assert known.contains(["10","123"])==true
You could check for the presence of a pair:
${[['10', '234'], ['9', '13'], ['200', '4987'], ['598', '123']].contains([DPT, VND])?"Yes":"No"}

Is there a more effective way to combine 24 columns into a single column as an array in R

I have a code below that works to take 24 columns (hours) of data and combine it into a single column array for each row in a dataframe:
# Adds all of the values into column twentyfourhours with "," as the separator.
agg_bluetooth_data$twentyfourhours <- paste(agg_bluetooth_data[,1],
agg_bluetooth_data[,2], agg_bluetooth_data[,3], agg_bluetooth_data[,4],
agg_bluetooth_data[,5], agg_bluetooth_data[,6], agg_bluetooth_data[,7],
agg_bluetooth_data[,8], agg_bluetooth_data[,9], agg_bluetooth_data[,10],
agg_bluetooth_data[,11], agg_bluetooth_data[,12], agg_bluetooth_data[,13],
agg_bluetooth_data[,14], agg_bluetooth_data[,15], agg_bluetooth_data[,16],
agg_bluetooth_data[,17], agg_bluetooth_data[,18], agg_bluetooth_data[,19],
agg_bluetooth_data[,20], agg_bluetooth_data[,21], agg_bluetooth_data[,22],
agg_bluetooth_data[,23], agg_bluetooth_data[,24], sep=",")
However, after this I still have to write more lines of code to remove spaces, add brackets around it, and delete the columns. None of this is difficult to do, but I feel like there should be a shorter/cleaner code to use to get the results I am looking for. Does anyone have any suggestions?
There is a built-in function to do rowSums. It looks like you want an analogous rowPaste function. We can do this with apply:
# create example dataset
df <- data.frame(
v=1:10,
x=letters[1:10],
y=letters[6:15],
z=letters[11:20],
stringsAsFactors = FALSE
)
# rowPaste columns 2 through 4
apply(df[, 2:4], 1, paste, collapse=",")
Another option, using #Dan Y's data (might be helpful if you posted a subset of your data using dput though).
library(tidyr)
library(dplyr)
df %>%
unite('new_col', v, x, y, z, sep = ',')
new_col
1 1,a,f,k
2 2,b,g,l
3 3,c,h,m
4 4,d,i,n
5 5,e,j,o
6 6,f,k,p
7 7,g,l,q
8 8,h,m,r
9 9,i,n,s
10 10,j,o,t
You can then perform the neccessary edits with mutate. There's also a fair amount of flexibility in the column selections within the unite call. Check out the "Useful Functions" section of the select documentation.

Prepping for apriori

Need to further prep my data set in order to apply apriori algorithm
There are only two columns:
First column as the transaction_id.
Second column is item_name and is formatted as c("" "a" "b" "c"...)
I run:
rules <- apriori(nz.mb, parameter = list(supp = 0.001, conf = 0.8))
I get an error:
Error in asMethod(object) :
column(s) 2 not logical or a factor. Discretize the columns first.
So I run:
nz.mb$item_name <- discretize(nz.mb$item_name)
I get another error:
Error in min(x, na.rm = TRUE) : invalid 'type' (list) of argument
What is my next step with item_name so that's it's formatted correctly for apriori?
Most Apriori implementation support Dataset like this:
a b c d
1 1 1 0 means a,b,c are there
1 0 0 1 means a,d are there
Either use this form or go to documentation and say the supported data for

Using R to save results of a lm model to a database

I'm trying to take the results of a linear regression performed in R and store those results in a database.
Specifically, what I'm after is the data in coef(summary(myModel). I can turn that into a dataframe and use sqlSave(), but the coefficient names are not a column in the dataframe. How to I get the coefficients and the variable names into a single dataframe that can be saved using sqlSave()?
For clarity, I'm trying to store the data in a database table that has the columns:
VariableName, Estimate, StdError, tValue, pValue
Is there an easier way to prepare this data to be stored in a database? As an example here's what the results of coef(summary(myModel)) gives:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51.52729727 2.623035966 19.64414439 1.941150e-58
factor(person)507 -0.73663931 2.627215539 -0.28038785 7.793456e-01
factor(person)713 -5.18612049 3.317899029 -1.56307363 1.189390e-01
TransCnt 0.02658798 0.005682853 4.67863266 4.132888e-06
factor(Month)5 0.67908563 1.119655304 0.60651312 5.445673e-01
factor(Month)6 2.09595623 1.169658148 1.79193915 7.400639e-02
factor(Month)7 2.91204838 1.333483558 2.18379024 2.964109e-02
datOut <- summary(myModel)$coef
datOut <- cbind(VariableName=rownames(datOut), datOut)
rownames(datOut) <- NULL
If you want to add your own column names:
colnames(datOut) <- c("VariableName", "Estimate", "StdError", "tValue", "pValue")
datOut
The table produced by summary.lm is a matrix. You can coerce toa dataframe with as.data.frame
df.coef <- as.data.frame( coef(summary(myModel)) )
The column names should be coerced to column names that have no spaces or quotes.

Resources