I have browsed almost all possible pages on the subject and I still can't find a way to extract a matched data dataset with the MatchThem package.
By analogy, MatchIt allows via the function match.data() to extract the dataset of matched data for example 3:1. Although MatchThem's complete() function is the equivalent, this function apparently does not allow to extract exclusively the imputed AND matched dataset.
Here is an example of multiple imputation with 3:1 matching from which I am trying to extract multiple matched datasets:
library(mice)
library(MatchThem)
#Multiple imputations
mids_object <- mice(data, maxit = 5, m=3, seed= 20211022, printFlag = F) # m=3 is voluntarily low for this example.
#Matching
mimids_object <- matchthem(primary_subtype ~ age + bmi + ps, data = mids_object, approach = "within" ,ratio= 3, method = "optimal")
#Details of matched data
print(mimids_object)
Printing | dataset: #1
A matchit object
method: Variable ratio 3:1 optimal pair matching
distance: Propensity score
- estimated with logistic regression
number of obs: 761 (original), 177 (matched)
target estimand: ATT
covariates: age, bmi, ps
#Extracting matched dataset
complete(mimids_object, action = "long") -> complete_mi_matched
#Summary of extracted dataset to check correct number of match
summary(complete_mi_matched$primary_subtype)
classic ADK SRC
702 59
It should show the matched proportion 3:1 with 177 matched (177 classic ADK and 59 SRC)
I am missing something. Thanks in advance for your help or suggestions.
I have a data set with two fields: Department_ID and Vendor_ID.
I want to flag any record where Department_ID equals a certain value and Vendor_ID equals a certain value. However, I have many pairs I'd like to flag. Without having to have a long script that repeats information redundantly, I want something short and easy.
Old logic:
${(DPT=='10'&& VND=='234' || DPT=='9'&& VND=='13' || DPT=='200'&& VND=='4987' || DPT=='598' && VND=='123')?"Yes":"NO"}
Preferred logic:
${['10','9','200','598'].contains(DPT)&&['234','13','4987','123'].contains(VND)?"Yes":"No"}
The issue with the preferred logic is that it would flag records that meet any combination of those values being present. ie, it would flag a record that had DPT=='10'&& VND=='13'... I need the contain feature but have it only look for certain pairs.
You can put the pairs into a set and then check, if the set contains them. E.g.
def known = [
// DPT, VND
["10","123"],
["20","456"],
].toSet()
assert known.contains(["30","789"])==false
assert known.contains(["10","789"])==false
assert known.contains(["30","123"])==false
assert known.contains(["10","123"])==true
You could check for the presence of a pair:
${[['10', '234'], ['9', '13'], ['200', '4987'], ['598', '123']].contains([DPT, VND])?"Yes":"No"}
I have a code below that works to take 24 columns (hours) of data and combine it into a single column array for each row in a dataframe:
# Adds all of the values into column twentyfourhours with "," as the separator.
agg_bluetooth_data$twentyfourhours <- paste(agg_bluetooth_data[,1],
agg_bluetooth_data[,2], agg_bluetooth_data[,3], agg_bluetooth_data[,4],
agg_bluetooth_data[,5], agg_bluetooth_data[,6], agg_bluetooth_data[,7],
agg_bluetooth_data[,8], agg_bluetooth_data[,9], agg_bluetooth_data[,10],
agg_bluetooth_data[,11], agg_bluetooth_data[,12], agg_bluetooth_data[,13],
agg_bluetooth_data[,14], agg_bluetooth_data[,15], agg_bluetooth_data[,16],
agg_bluetooth_data[,17], agg_bluetooth_data[,18], agg_bluetooth_data[,19],
agg_bluetooth_data[,20], agg_bluetooth_data[,21], agg_bluetooth_data[,22],
agg_bluetooth_data[,23], agg_bluetooth_data[,24], sep=",")
However, after this I still have to write more lines of code to remove spaces, add brackets around it, and delete the columns. None of this is difficult to do, but I feel like there should be a shorter/cleaner code to use to get the results I am looking for. Does anyone have any suggestions?
There is a built-in function to do rowSums. It looks like you want an analogous rowPaste function. We can do this with apply:
# create example dataset
df <- data.frame(
v=1:10,
x=letters[1:10],
y=letters[6:15],
z=letters[11:20],
stringsAsFactors = FALSE
)
# rowPaste columns 2 through 4
apply(df[, 2:4], 1, paste, collapse=",")
Another option, using #Dan Y's data (might be helpful if you posted a subset of your data using dput though).
library(tidyr)
library(dplyr)
df %>%
unite('new_col', v, x, y, z, sep = ',')
new_col
1 1,a,f,k
2 2,b,g,l
3 3,c,h,m
4 4,d,i,n
5 5,e,j,o
6 6,f,k,p
7 7,g,l,q
8 8,h,m,r
9 9,i,n,s
10 10,j,o,t
You can then perform the neccessary edits with mutate. There's also a fair amount of flexibility in the column selections within the unite call. Check out the "Useful Functions" section of the select documentation.
Need to further prep my data set in order to apply apriori algorithm
There are only two columns:
First column as the transaction_id.
Second column is item_name and is formatted as c("" "a" "b" "c"...)
I run:
rules <- apriori(nz.mb, parameter = list(supp = 0.001, conf = 0.8))
I get an error:
Error in asMethod(object) :
column(s) 2 not logical or a factor. Discretize the columns first.
So I run:
nz.mb$item_name <- discretize(nz.mb$item_name)
I get another error:
Error in min(x, na.rm = TRUE) : invalid 'type' (list) of argument
What is my next step with item_name so that's it's formatted correctly for apriori?
Most Apriori implementation support Dataset like this:
a b c d
1 1 1 0 means a,b,c are there
1 0 0 1 means a,d are there
Either use this form or go to documentation and say the supported data for
I'm trying to take the results of a linear regression performed in R and store those results in a database.
Specifically, what I'm after is the data in coef(summary(myModel). I can turn that into a dataframe and use sqlSave(), but the coefficient names are not a column in the dataframe. How to I get the coefficients and the variable names into a single dataframe that can be saved using sqlSave()?
For clarity, I'm trying to store the data in a database table that has the columns:
VariableName, Estimate, StdError, tValue, pValue
Is there an easier way to prepare this data to be stored in a database? As an example here's what the results of coef(summary(myModel)) gives:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 51.52729727 2.623035966 19.64414439 1.941150e-58
factor(person)507 -0.73663931 2.627215539 -0.28038785 7.793456e-01
factor(person)713 -5.18612049 3.317899029 -1.56307363 1.189390e-01
TransCnt 0.02658798 0.005682853 4.67863266 4.132888e-06
factor(Month)5 0.67908563 1.119655304 0.60651312 5.445673e-01
factor(Month)6 2.09595623 1.169658148 1.79193915 7.400639e-02
factor(Month)7 2.91204838 1.333483558 2.18379024 2.964109e-02
datOut <- summary(myModel)$coef
datOut <- cbind(VariableName=rownames(datOut), datOut)
rownames(datOut) <- NULL
If you want to add your own column names:
colnames(datOut) <- c("VariableName", "Estimate", "StdError", "tValue", "pValue")
datOut
The table produced by summary.lm is a matrix. You can coerce toa dataframe with as.data.frame
df.coef <- as.data.frame( coef(summary(myModel)) )
The column names should be coerced to column names that have no spaces or quotes.