Combining data of two vectors in a time series in R - arrays

I am a research assistent and have collected eye movement data, which I now try to analyze using R.
From the eye-tracker I use, every sample is marked as belonging to a saccade (which means the eye moves) or not and belonging to a blink or not. When someone starts to blink, the eye-tracker first identifies a saccade later identifies a blink. To be able to substitute all eye movement samples (lines in my data file), which belong to a blink, I need to create a variable that marks all saccades that contain a blink. A simple example is the following:
I have the data:
Data <- data.frame(Blink=c(0,0,0,1,1,0,0,0,1,1,0,0,0,0,0), Saccade=c(0,1,1,1,1,1,0,1,1,1,1,0,1,1,0))
I would like a variable like this as a result:
Data$Saccade_containing_blink <- c(0,1,1,1,1,1,0,1,1,1,1,0,0,0,0)
Which function would give me that result using R?

# example data
Data <- data.frame(Blink=c(0,0,0,1,1,0,0,0,1,1,0,0,0,0,0),
Saccade=c(0,1,1,1,1,1,0,1,1,1,1,0,1,1,0))
library(dplyr)
Data %>%
group_by(group = cumsum(Saccade==0)) %>% # group your Saccades
mutate(Saccade_containing_blink = max(Blink), # if there's a Blink update all rows within that Saccade
Saccade_containing_blink = ifelse(Saccade == 0, 0, Saccade_containing_blink)) %>% # update Saccade to exclude the 0s (0s separate Saccades)
ungroup() %>% # ungroup data
select(-group) # remove grouping column
# # A tibble: 15 x 3
# Blink Saccade Saccade_containing_blink
# <dbl> <dbl> <dbl>
# 1 0 0 0
# 2 0 1 1
# 3 0 1 1
# 4 1 1 1
# 5 1 1 1
# 6 0 1 1
# 7 0 0 0
# 8 0 1 1
# 9 1 1 1
# 10 1 1 1
# 11 0 1 1
# 12 0 0 0
# 13 0 1 0
# 14 0 1 0
# 15 0 0 0
The philosophy of this approach is to be able to group the Saccade column and check if there's a Blink in at least one of the rows within each Saccade. I assume that Saccades are separated by a 0 in column Saccade.

Related

Nurbs curve parameters in Maya?

I get a Maya .ma file, I want to understand the parameters for nurbsCurve, the file has contents like this:
createNode nurbsCurve; # create a nurbsCurve
setAttr -k off ".v"; # set attribute about knots?
setAttr ".cc" -type "nurbsCurve" # attribute setting
3 1 0 no 3
6 0 0 0 1 1 1
4 # 4 stand for the below has 4 coordiniates
7.82436 0.545707 8.54539
7.86896 0.545707 9.61357
7.28368 0.53563 9.8433
6.06638 0.53563 9.89412
;
...
I don't understand what the line 3 1 0 no 3 and 6 0 0 0 1 1 1 stands for, anybody understand what these lines stand for?
Here's what I know so far, only the first three figures.
[3] [1] [0]
corresponds to:
[degree] [span] [index of the form: open/closed/periodic]

Drop columns from a data frame but I keep getting this error below

enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
No matter how I try to code this in R, I still cannot drop my columns so that I can build my logistic regression model. I tried to run it two different ways
cols<-c("EmployeeCount","Over18","StandardHours")
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[-cols,]
Error in -cols : invalid argument to unary operator
cols<-c("EmployeeCount","Over18","StandardHours")
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[!cols,]
Error in !cols : invalid argument type
This may solve your problem:
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[ , !colnames(DAT_690_Attrition_Proj1EmpAttrTrain) %in% cols]
Please note that if you want to drop columns, you should put your code inside [ on the right side of the comma, not on the left side.
So [, your_code] not [your_code, ].
Here is an example of dropping columns using the code above.
cols <- c("cyl", "hp", "wt")
mtcars[, !colnames(mtcars) %in% cols]
# mpg disp drat qsec vs am gear carb
# Mazda RX4 21.0 160.0 3.90 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 160.0 3.90 17.02 0 1 4 4
# Datsun 710 22.8 108.0 3.85 18.61 1 1 4 1
# Hornet 4 Drive 21.4 258.0 3.08 19.44 1 0 3 1
# Hornet Sportabout 18.7 360.0 3.15 17.02 0 0 3 2
# Valiant 18.1 225.0 2.76 20.22 1 0 3 1
#...
Edit to Reproduce the Error
The error message you got indicates that there is a column that has only one, identical value in all rows.
To show this, let's try a logistic regression using a subset of mtcars data, which has only one, identical values in its cyl column, and then we use that column as a predictor.
mtcars_cyl4 <- mtcars |> subset(cyl == 4)
mtcars_cyl4
# mpg cyl disp hp drat wt qsec vs am gear carb
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
# Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
# Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
# Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
# Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
# Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
# Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
glm(am ~ as.factor(cyl) + mpg + disp, data = mtcars_cyl4, family = "binomial")
#Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
# contrasts can be applied only to factors with 2 or more levels
Now, compare it with the same logistic regression by using full mtcars data, which have various values in cyl column.
glm(am ~ as.factor(cyl) + mpg + disp, data = mtcars, family = "binomial")
# Call: glm(formula = am ~ as.factor(cyl) + mpg + disp, family = "binomial",
# data = mtcars)
#
# Coefficients:
# (Intercept) as.factor(cyl)6 as.factor(cyl)8 mpg disp
# -5.08552 2.40868 6.41638 0.37957 -0.02864
#
# Degrees of Freedom: 31 Total (i.e. Null); 27 Residual
# Null Deviance: 43.23
# Residual Deviance: 25.28 AIC: 35.28
It is likely that, even though you have drop three columns that have one,identical values in all the respective rows, there is another column in Trainingmodel1 that has one identical values. The identical values in the column were probably resulted during filtering the data frame and splitting data into training and test groups. Better to have a check by using summary(Trainingmodel1).
Further edit
I have checked the summary(Trainingmodel1) result, and it becomes clear that EmployeeNumber has one identical value (called "level" for a factor) in all rows. To run your regression properly, either you drop it from your model, or if EmployeeNumber has another level and you want to include it in your model, you should make sure that it contains at least two levels in the training data. It is possible to achieve that during splitting by repeating the random sampling until the randomly selected EmployeeNumber samples contain at least two levels. This can be done by looping using for, while, or repeat. It is possible, but I don't know how proper the repeated sampling is for your study.
As for your question about subsetting more than one variable, you can use subset and conditionals. For example, you want to get a subset of mtcars that has cyl == 4 and mpg > 20 :
mtcars |> subset(cyl == 4 & mpg > 20 )
If you want a subset that has cyl == 4 or mpg > 20:
mtcars |> subset(cyl == 4 | mpg > 20 )
You can also subset by using more columns as subset criteria:
mtcars |> subset((cyl > 4 & cyl <8) | (mpg > 20 & gear > 4 ))

how to move inside an array trasformed into an as.data.frame?

I have written the following code to declare an array as data frame:
b=as.data.frame(array(0,dim=c(NF,29,1,T+1),
dimnames=list(NULL,c(…..varnames))))
Now, I am not able to move inside the array.. for instance, if I need to show all the matrices in the following array position [,,1,1], what I need to write?
I have tried code like:
b$[].1.1
b$,1.1
b[,,1,1]"
but, of course, it does not work.
Thank you very much for your help!
from ?as.data.frame :
Arrays can be converted to data frames. One-dimensional arrays are
treated like vectors and two-dimensional arrays like matrices. Arrays
with more than two dimensions are converted to matrices by
‘flattening’ all dimensions after the first and creating suitable
column labels.
array1 <- array(1:8,dim = c(2,2,2),dimnames = split(paste0(rep(letters[1:2],each=3),1:3),1:3))
# , , 3 = a3
#
# 2
# 1 a2 b2
# a1 1 3
# b1 2 4
#
# , , 3 = b3
#
# 2
# 1 a2 b2
# a1 5 7
# b1 6 8
#
df1 <- as.data.frame(array1)
# a2.a3 b2.a3 a2.b3 b2.b3
# a1 1 3 5 7
# b1 2 4 6 8
df1$b2.a3
# [1] 3 4
I need to create the a data frame, starting from an array which dimension is (2,3,1,3):
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
Hence, the output that I need is:
debt loan stock debt loan stock debt loan stock
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
Is next code correct?
b=array(0, dim=c(3,3,1,4), dimnames=list(NULL,c("debt","loan","stock")))
output=as.data.frame(b)

How to get the best subset for a multinomial regression in R?

I am a new R user and I'm using a multinomial regression (i.e. logistic regression with the response variable which has more than 2 classes.) with the function 'vglm' in R. In my dataset there are 11 continuous predictors and 1 response variable which is categorical with 3 classes.
I want to get the best subset for my regression but I don't know how to do it. Is there any function for this or I must do it manually. Because the linear functions don't seem suitable.
I have tried bestglm function but its results don't seem to be suitable for a multinomial regression.
I have also tried a shrinkage method, glmnet which is relative to lasso. It chooses all the variables in the model. But on the other hand the multinomial regression using vglm reports some variables as insignificant.
I've searched a lot on the Internet including this website but haven't found any good answer. So I'm asking here because I need really a help on this.
Thanks
There's a few basic steps involved to get what you want:
define the model grid of all potential predictor combinations
model run all potential combinations of predictors
use a criteria (or a set of multiple criteria) to select the best subset of predictors
The model grid can be defined with the following function:
# define model grid for best subset regression
# defines which predictors are on/off; all combinations presented
model.grid <- function(n){
n.list <- rep(list(0:1), n)
expand.grid(n.list)
}
For example with 4 variables, we get n^2 or 16 combinations. A value of 1 indicates the model predictor is on and a value of zero indicates the predictor is off:
model.grid(4)
Var1 Var2 Var3 Var4
1 0 0 0 0
2 1 0 0 0
3 0 1 0 0
4 1 1 0 0
5 0 0 1 0
6 1 0 1 0
7 0 1 1 0
8 1 1 1 0
9 0 0 0 1
10 1 0 0 1
11 0 1 0 1
12 1 1 0 1
13 0 0 1 1
14 1 0 1 1
15 0 1 1 1
16 1 1 1 1
I provide another function below that will run all model combinations. It will also create a sorted dataframe table that ranks the different model fits using 5 criteria. The predictor combo at the top of the table is the "best" subset given the training data and the predictors supplied:
# function for best subset regression
# ranks predictor combos using 5 selection criteria
best.subset <- function(y, x.vars, data){
# y character string and name of dependent variable
# xvars character vector with names of predictors
# data training data with y and xvar observations
require(dplyr)
reguire(purrr)
require(magrittr)
require(forecast)
length(x.vars) %>%
model.grid %>%
apply(1, function(x) which(x > 0, arr.ind = TRUE)) %>%
map(function(x) x.vars[x]) %>%
.[2:dim(model.grid(length(x.vars)))[1]] %>%
map(function(x) tslm(paste0(y, " ~ ", paste(x, collapse = "+")), data = data)) %>%
map(function(x) CV(x)) %>%
do.call(rbind, .) %>%
cbind(model.grid(length(x.vars))[-1, ], .) %>%
arrange(., AICc)
}
You'll see the tslm() function is specified...others could be used such as vglm(), etc. Simply swap in the model function you want.
The function requires 4 installed packages. The function simply configures data and uses the map() function to iterate across all model combinations (e.g. no for loop). The forecast package then supplies the cross-validation function CV(), which has the 5 metrics or selection criteria to rank the predictor subsets
Here is an application example lifted from the book "Forecasting Principles and Practice." The example also uses data from the book, which is found in the fpp2 package.
library(fpp2)
# test the function
y <- "Consumption"
x.vars <- c("Income", "Production", "Unemployment", "Savings")
best.subset(y, x.vars, uschange)
The resulting table, which is sorted on the AICc metric, is shown below. The best subset minimizes the value of the metrics (CV, AIC, AICc, and BIC), maximizes adjusted R-squared and is found at the top of the list:
Var1 Var2 Var3 Var4 CV AIC AICc BIC AdjR2
1 1 1 1 1 0.1163 -409.3 -408.8 -389.9 0.74859
2 1 0 1 1 0.1160 -408.1 -407.8 -391.9 0.74564
3 1 1 0 1 0.1179 -407.5 -407.1 -391.3 0.74478
4 1 0 0 1 0.1287 -388.7 -388.5 -375.8 0.71640
5 1 1 1 0 0.2777 -243.2 -242.8 -227.0 0.38554
6 1 0 1 0 0.2831 -237.9 -237.7 -225.0 0.36477
7 1 1 0 0 0.2886 -236.1 -235.9 -223.2 0.35862
8 0 1 1 1 0.2927 -234.4 -234.0 -218.2 0.35597
9 0 1 0 1 0.3002 -228.9 -228.7 -216.0 0.33350
10 0 1 1 0 0.3028 -226.3 -226.1 -213.4 0.32401
11 0 0 1 1 0.3058 -224.6 -224.4 -211.7 0.31775
12 0 1 0 0 0.3137 -219.6 -219.5 -209.9 0.29576
13 0 0 1 0 0.3138 -217.7 -217.5 -208.0 0.28838
14 1 0 0 0 0.3722 -185.4 -185.3 -175.7 0.15448
15 0 0 0 1 0.4138 -164.1 -164.0 -154.4 0.05246
Only 15 predictor combinations are profiled in the output since the model combination with all predictors off has been dropped. Looking at the table, the best subset is the one with all predictors on. However, the second row uses only 3 of 4 variables and the performance results are roughly the same. Also note that after row 4, the model results begin to degrade. Thats because income and savings appear to be the key drivers of consumption. As these two variables are dropped from the predictors, model performance drops significantly.
The performance of the custom function is solid since the results presented here match those of the book referenced.
A good day to you.

R arrays of 3 dimensions with different inner size

I have an R list (docs) where its first 2 elements are as follows:
1. A. 1 2 5 6
B. 5 6 2
C. 7 8 1 2 3 5
2. A. 4 5 3
B. 1 2 3 5 4 7 8
What I want to achieve is another list with equal sizes but with zeros instead:
1. A. 0 0 0 0
B. 0 0 0
C. 0 0 0 0 0 0
2. A. 0 0 0
B. 0 0 0 0 0 0 0
I have tried:
sapply(docs, function(x) rep(0, length(x)))
but the behaviour is not the intended because it considers the size of the outer list. Could you please help me?
It appears that you have a list of lists, that is docs is a list containing the lists 1 and 2, which then contain numeric vectors. If this is the case, try the following:
# create test list
temp <- list("v1"=list("A"=1:4,"B"=5:7,"C"=1:8), "v2"=list("A"=1:3,"B"=5:10,"C"=3:8))
# get a list of zeros with the same dimension
answer <- lapply(temp, function(x) sapply(x, function(y) rep(0, length(y))))

Resources