Drop columns from a data frame but I keep getting this error below - analytics

enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
No matter how I try to code this in R, I still cannot drop my columns so that I can build my logistic regression model. I tried to run it two different ways
cols<-c("EmployeeCount","Over18","StandardHours")
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[-cols,]
Error in -cols : invalid argument to unary operator
cols<-c("EmployeeCount","Over18","StandardHours")
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[!cols,]
Error in !cols : invalid argument type

This may solve your problem:
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[ , !colnames(DAT_690_Attrition_Proj1EmpAttrTrain) %in% cols]
Please note that if you want to drop columns, you should put your code inside [ on the right side of the comma, not on the left side.
So [, your_code] not [your_code, ].
Here is an example of dropping columns using the code above.
cols <- c("cyl", "hp", "wt")
mtcars[, !colnames(mtcars) %in% cols]
# mpg disp drat qsec vs am gear carb
# Mazda RX4 21.0 160.0 3.90 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 160.0 3.90 17.02 0 1 4 4
# Datsun 710 22.8 108.0 3.85 18.61 1 1 4 1
# Hornet 4 Drive 21.4 258.0 3.08 19.44 1 0 3 1
# Hornet Sportabout 18.7 360.0 3.15 17.02 0 0 3 2
# Valiant 18.1 225.0 2.76 20.22 1 0 3 1
#...
Edit to Reproduce the Error
The error message you got indicates that there is a column that has only one, identical value in all rows.
To show this, let's try a logistic regression using a subset of mtcars data, which has only one, identical values in its cyl column, and then we use that column as a predictor.
mtcars_cyl4 <- mtcars |> subset(cyl == 4)
mtcars_cyl4
# mpg cyl disp hp drat wt qsec vs am gear carb
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
# Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
# Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
# Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
# Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
# Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
# Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
glm(am ~ as.factor(cyl) + mpg + disp, data = mtcars_cyl4, family = "binomial")
#Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
# contrasts can be applied only to factors with 2 or more levels
Now, compare it with the same logistic regression by using full mtcars data, which have various values in cyl column.
glm(am ~ as.factor(cyl) + mpg + disp, data = mtcars, family = "binomial")
# Call: glm(formula = am ~ as.factor(cyl) + mpg + disp, family = "binomial",
# data = mtcars)
#
# Coefficients:
# (Intercept) as.factor(cyl)6 as.factor(cyl)8 mpg disp
# -5.08552 2.40868 6.41638 0.37957 -0.02864
#
# Degrees of Freedom: 31 Total (i.e. Null); 27 Residual
# Null Deviance: 43.23
# Residual Deviance: 25.28 AIC: 35.28
It is likely that, even though you have drop three columns that have one,identical values in all the respective rows, there is another column in Trainingmodel1 that has one identical values. The identical values in the column were probably resulted during filtering the data frame and splitting data into training and test groups. Better to have a check by using summary(Trainingmodel1).
Further edit
I have checked the summary(Trainingmodel1) result, and it becomes clear that EmployeeNumber has one identical value (called "level" for a factor) in all rows. To run your regression properly, either you drop it from your model, or if EmployeeNumber has another level and you want to include it in your model, you should make sure that it contains at least two levels in the training data. It is possible to achieve that during splitting by repeating the random sampling until the randomly selected EmployeeNumber samples contain at least two levels. This can be done by looping using for, while, or repeat. It is possible, but I don't know how proper the repeated sampling is for your study.
As for your question about subsetting more than one variable, you can use subset and conditionals. For example, you want to get a subset of mtcars that has cyl == 4 and mpg > 20 :
mtcars |> subset(cyl == 4 & mpg > 20 )
If you want a subset that has cyl == 4 or mpg > 20:
mtcars |> subset(cyl == 4 | mpg > 20 )
You can also subset by using more columns as subset criteria:
mtcars |> subset((cyl > 4 & cyl <8) | (mpg > 20 & gear > 4 ))

Related

awk lookup table, blank column replacement

I'm trying to use a lookup table to do a search and replace for two specific columns and keep getting a blank column as output. I've followed the syntax for several examples of lookup tables that I've found on stack, but no joy. Here is a snippet from each of the files.
Sample lookup table -- want to search for instances of column 1 in my data file and replace them with the corresponding value in column 2 (first row is a header):
#xyz type
N 400
C13 401
13A 402
13B 402
13C 402
C14 405
The source file to be substituted has the following format:
1 N 0.293000 2.545000 16.605000 0 2 6 10 14
2 C13 0.197000 2.816000 15.141000 0 1
3 13A 1.173000 2.887000 14.676000 0
4 13B -0.319000 3.756000 14.937000 0
5 13C -0.351000 1.998000 14.678000 0
6 C14 0.749000 3.776000 17.277000 0 1
The corresponding values in column 2 of the lookup table will replace the values in column 6 of my source file (currently all zeroes). Here's the awk one-liner that I thought should work:
awk -v OFS='\t' 'NR==1 { next } FNR==NR { a[$1]=$2; next } $2 in a { $6=a[$1] }1' lookup.txt source.txt
But my output essentially deletes the entire entry for column 6:
1 N 0.293000 2.545000 16.605000 2 6 10 14
2 C13 0.197000 2.816000 15.141000 1
3 13A 1.173000 2.887000 14.676000
4 13B -0.319000 3.756000 14.937000
5 13C -0.351000 1.998000 14.678000
6 C14 0.749000 3.776000 17.277000 1
(The sixth column should be 400 to 405. I considered using sed, but I have duplicate values in the source and output columns of my lookup table, so that won't work in this case. What's frustrating is that I had this one-liner working on almost the exact same source file the other week, but now can only get this behavior. I'd love to be able to modify my awk call to do lookups of two different columns simultaneously, but wanted to start simple for now. Thanks!
You have $6=a[$1] instead of $6=a[$2] in your script.
$ awk -v OFS='\t' 'NR==FNR{map[$1]=$2; next} {$6=map[$2]} 1' file1 file2
1 N 0.293000 2.545000 16.605000 400 2 6 10 14
2 C13 0.197000 2.816000 15.141000 401 1
3 13A 1.173000 2.887000 14.676000 402
4 13B -0.319000 3.756000 14.937000 402
5 13C -0.351000 1.998000 14.678000 402
6 C14 0.749000 3.776000 17.277000 405 1

Apply a custom (weighted) dictionary to text based on sentiment analysis

I am looking to adjust this code so that I can assign each one of these modal verbs with a different weight. The idea is to use something similar to the NRC library, where we have the "numbers" 1-5 represent categories, rather than numbers.
modals<-data_frame(word=c("must", "will", "shall", "should", "may", "can"),
modal=c("5", "4", "4", "3", "2", "1"))
My problem is that when I run the following code I have that 5 "may"s count as the same as one "must". What I want is for each word to have a different weight so that when I run this analysis I can see the concentration of uses of the stronger "must" versus say the much weaker "can". *with "tidy.DF" being my corpus and "school" and "target" being the column names.
MODAL<-tidy.DF %>%
inner_join(modals) %>%
count(School, Target, modal, index=wordnumber %/% 50, modal) %>%
spread(modal, n, fill=0)
ggplot(MODAL, aes(index, 5, fill=Target)) +
geom_col(show.legend=FALSE) +
facet_wrap(~Target, ncol=2, scales="free_x")
Here's a suggestion for a better approach, using the quanteda package instead. The approach:
Create a named vector of weights, corresponding to your "dictionary".
Create a document feature matrix, selecting only the terms in the dictionary.
Weight the observed counts.
# set modal values as a named numeric vector
modals <- c(5, 4, 4, 3, 2, 1)
names(modals) <- c("must", "will", "shall", "should", "may", "can")
library("quanteda", warn.conflicts = FALSE)
## Package version: 1.4.0
## Parallel computing: 2 of 12 threads used.
## See https://quanteda.io for tutorials and examples.
I'll use the most recent inaugural speeches as a reproducible example here.
dfmat <- data_corpus_inaugural %>%
corpus_subset(Year > 2000) %>%
dfm() %>%
dfm_select(pattern = names(modals))
This produces the raw counts.
dfmat
## Document-feature matrix of: 5 documents, 6 features (26.7% sparse).
## 5 x 6 sparse Matrix of class "dfm"
## features
## docs will must can should may shall
## 2001-Bush 23 6 6 1 0 0
## 2005-Bush 22 6 7 1 3 0
## 2009-Obama 19 8 13 0 3 3
## 2013-Obama 20 17 7 0 4 0
## 2017-Trump 40 3 1 1 0 0
Weighting this now is as simple as calling dfm_weight() to reweight the counts by the values of your weight vector. The function will automatically apply the weights using fixed matching of the vector element names to the dfm features.
dfm_weight(dfmat, weight = modals)
## Document-feature matrix of: 5 documents, 6 features (26.7% sparse).
## 5 x 6 sparse Matrix of class "dfm"
## features
## docs will must can should may shall
## 2001-Bush 92 30 6 3 0 0
## 2005-Bush 88 30 7 3 6 0
## 2009-Obama 76 40 13 0 6 12
## 2013-Obama 80 85 7 0 8 0
## 2017-Trump 160 15 1 3 0 0

Combining data of two vectors in a time series in R

I am a research assistent and have collected eye movement data, which I now try to analyze using R.
From the eye-tracker I use, every sample is marked as belonging to a saccade (which means the eye moves) or not and belonging to a blink or not. When someone starts to blink, the eye-tracker first identifies a saccade later identifies a blink. To be able to substitute all eye movement samples (lines in my data file), which belong to a blink, I need to create a variable that marks all saccades that contain a blink. A simple example is the following:
I have the data:
Data <- data.frame(Blink=c(0,0,0,1,1,0,0,0,1,1,0,0,0,0,0), Saccade=c(0,1,1,1,1,1,0,1,1,1,1,0,1,1,0))
I would like a variable like this as a result:
Data$Saccade_containing_blink <- c(0,1,1,1,1,1,0,1,1,1,1,0,0,0,0)
Which function would give me that result using R?
# example data
Data <- data.frame(Blink=c(0,0,0,1,1,0,0,0,1,1,0,0,0,0,0),
Saccade=c(0,1,1,1,1,1,0,1,1,1,1,0,1,1,0))
library(dplyr)
Data %>%
group_by(group = cumsum(Saccade==0)) %>% # group your Saccades
mutate(Saccade_containing_blink = max(Blink), # if there's a Blink update all rows within that Saccade
Saccade_containing_blink = ifelse(Saccade == 0, 0, Saccade_containing_blink)) %>% # update Saccade to exclude the 0s (0s separate Saccades)
ungroup() %>% # ungroup data
select(-group) # remove grouping column
# # A tibble: 15 x 3
# Blink Saccade Saccade_containing_blink
# <dbl> <dbl> <dbl>
# 1 0 0 0
# 2 0 1 1
# 3 0 1 1
# 4 1 1 1
# 5 1 1 1
# 6 0 1 1
# 7 0 0 0
# 8 0 1 1
# 9 1 1 1
# 10 1 1 1
# 11 0 1 1
# 12 0 0 0
# 13 0 1 0
# 14 0 1 0
# 15 0 0 0
The philosophy of this approach is to be able to group the Saccade column and check if there's a Blink in at least one of the rows within each Saccade. I assume that Saccades are separated by a 0 in column Saccade.

Adding count to a vector while looping through dataframe

I am relatively new in R. I'm working on a project in which there is a column of IDs (PMID), a column of MESH terms which are basically a lot of biomedical summarized terms (MH), and a column for year that's organized sequentially (EDAT_Year). My goal is to create a vector that holds the count of a particular word from the MESH terms for each year. Basically, if a row contains the word (not how many times it's in the row but rather its presence), it should be counted and separated by year in the vector.
Here is an example. Suppose this is the dataframe:
PMID MH EDAT_Year
1 Male, Lung, Heart, Aneurysm 1978
2 Male, Male, Anemia, Lung 1978
3 Heart, Anemia, Adult 1980
4 Female, Heart, Blood, Acute 1980
5 Male, Blood, Adult, Lung 1980
6 Male, Kidney, Brain, Heart 1983
7 Male, Lung, Blood, Male 1983
Then, if I were to test "Male", I would want the output to be
2 1 2
to represent that there are 2 observations in 1978 that contain "Male", 1 in 1980, and 2 in 1983 (regardless of how many times it has appeared).
I am currently working with 3 years, but hope to expand to more. I was able to do this manually with 3 years with the following (years are 1978, 1980, 1983 by the way) in which I created multiple columns that only contained MESH terms if they belonged to that year:
# count occurrences in the three years
disease_78 <- length(grep("\\Male\\>", total$MH_78))
disease_80 <- length(grep("\\Male\\>", total$MH_80))
disease_83 <- length(grep("\\Male\\>", total$MH_83))
But now I am trying to write a function so that if I were to enter a phrase, I would get all the occurrences in one vector, instead of manually having to copy and paste or having hundreds of columns for each year. This is what I have so far:
# function of count occurences
count_fxn <- function(x)
{
# read in argument as character
phrase_to_count <- deparse(substitute(x))
# create a vector to store count values
count_occur <- numeric(0)
# a vector for how many years there are
num_years <- seq(1, 3, 1)
# loop through entire data frame
for (i in 1:length(total$PMID))
{
# loop through the three years
for(j in 1:length(num_years))
{
# if at least one occurence occurs in row cell, increment count
if (length(grep(phrase_to_count, total$MH[i]) > 0))
{
count_occur[j] <- count_occur[j] + 1
}
# if the next row's year is different than the current one's, move to
# next spot for next year in vector
if (total$EDAT_Year[i] != total$EDAT_Year[i+1])
{
j <- j + 1
}
# increment so go to next line to read in data
i <- i + 1
}
}
return(count_occur)
}
# using function
count_fxn(Male)
But this is the error I keep getting:
Error in if (total$EDAT_Year[i] != total$EDAT_Year[i + 1]) { :
missing value where TRUE/FALSE needed
When I change
if (total$EDAT_Year[i] != total$EDAT_Year[i + 1])
to
if (total$EDAT_Year[j] != total$EDAT_Year[j + 1])
I don't get any errors, but instead, the output is
NA NA NA
when it should be something like
3453 2343 5235
to represent how many observations contained "Male" in them, in the years 1978, 1980, and 1983 respectively.
Please advise. I'm not the strongest coder yet, and I've been working on this for 2 hours when I'm sure it could've been done in much less time.
You could use by().
with(df, lengths(by(MH, EDAT_Year, grep, pattern="Male")))
# EDAT_Year
# 1978 1980 1983
# 2 1 2
If you want to calculate the number of occurrences of every "word" in MH for every year without having to type out each word or create a list of words you can do so as follows:
DF <- read.table(text="PMID MH EDAT_Year
1 Male,Lung,Heart,Aneurysm 1978
2 Male,Male,Anemia,Lung 1978
3 Heart,Anemia,Adult 1980
4 Female,Heart,Blood,Acute 1980
5 Male,Blood,Adult,Lung 1980
6 Male,Kidney,Brain,Heart 1983
7 Male,Lung,Blood,Male 1983", header=T)
DF <- DF %>%
#Convert MH column to nested list
dplyr::mutate(MH = strsplit(as.character(MH), ",")) %>%
#reashape data into tidy format
tidyr::unnest(MH) %>%
#eliminate duplicates to not count PMIDs with multiple identical entries in MH
unique() %>%
#count entries for each value in MH by year
reshape2::dcast(EDAT_Year ~ MH)
DF
Results in:
EDAT_Year Acute Adult Anemia Aneurysm Blood Brain Female Heart Kidney Lung Male
1 1978 0 0 1 1 0 0 0 1 0 2 2
2 1980 1 2 1 0 2 0 1 2 0 1 1
3 1983 0 0 0 0 1 1 0 1 1 1 2

Obtaining LRR and BAF values from affymetrix SNP array

I am trying to extract the LRR and BAF values from an affymetrix SNP chip without success using linux based tools. I tried to use a small subset in windows designed software called Axiom™ CNV Summary Tools Software and it works perfectly. The problem is that I have a huge dataset and would be impossible to run in windows machine powerful enough.
Let´s expose my steps until this point. First, I obtained five tab delimited files which are require to linux and/or windows pipeline (1-3 obtained with APT affymetrix software).
1 - The Axiom calls.txt or genotype file:
calls <- 'probeset_id sample_1 sample_2 sample_3
AX-100010998 2 2 2
AX-100010999 1 0 1
AX-100011005 0 1 2
AX-100011007 2 2 1
AX-100011008 1 1 2
AX-100011010 2 2 2
AX-100011011 0 1 0
AX-100011012 0 1 0
AX-100011016 0 0 1
AX-100011017 0 0 2'
calls <- read.table(text=calls, header=T)
2 - The confidences.txt file:
conf<- 'probeset_id sample_1 sample_2 sample_3
AX-100010998 0.00001 0.0002 0.00006
AX-100010999 0.00001 0.00001 0.00001
AX-100011005 0.00007 0.00017 0.00052
AX-100011007 0.00001 0.00001 0.00001
AX-100011008 0.001 0.00152 0.00001
AX-100011010 0.00001 0.00001 0.00002
AX-100011011 0.00004 0.00307 0.00002
AX-100011012 0.00001 0.00001 0.00001
AX-100011016 0.00003 0.00001 0.00001
AX-100011017 0.00003 0.01938 0.00032'
conf <- read.table(text=conf, header=T)
3 - The summary.txt file:
summ <- 'probeset_id sample_1 sample_2 sample_3
AX-100010998-A 740.33229 655.41465 811.98053
AX-100010998-B 1139.25679 1659.55079 917.7128
AX-100010999-A 1285.67306 1739.03296 1083.48455
AX-100010999-B 1403.51265 341.85893 1237.48577
AX-100011005-A 1650.03408 1274.57594 485.5324
AX-100011005-B 430.3122 2674.70182 4070.90727
AX-100011007-A 411.28952 449.76345 2060.7136
AX-100011007-B 4506.77692 4107.12982 2065.58516
AX-100011008-A 427.78263 439.63541 333.86312
AX-100011008-B 1033.41335 1075.31617 1623.69271
AX-100011010-A 390.12996 350.54456 356.63156
AX-100011010-B 1183.29912 1256.01391 1650.82396
AX-100011011-A 3593.93578 2902.34079 2776.2503
AX-100011011-B 867.33447 2252.54552 961.31596
AX-100011012-A 2250.44699 1192.46116 1927.70581
AX-100011012-B 740.31957 1721.70283 662.1414
AX-100011016-A 1287.9221 1367.95468 1037.98191
AX-100011016-B 554.8795 666.93132 1487.2143
AX-100011017-A 2002.40468 1787.42982 490.28802
AX-100011017-B 849.92775 1025.44417 1429.96567'
summ <- read.table(text=summ, header=T)
4 - The gender.txt:
gender <- 'cel_files gender
sample_1 female
sample_2 female
sample_3 female'
gender <- read.table(text=gender, header=F)
And finally the map file map.db in windows (non readable) or map.txt in linux as follows:
map <- 'Name Chr Position
AX-100010998 Z 70667736
AX-100010999 4 36427048
AX-100011005 26 4016045
AX-100011007 6 25439800
AX-100011008 2 147800617
AX-100011010 1 98919397
AX-100011011 Z 66652642
AX-100011012 7 28180218
AX-100011016 1A 33254907
AX-100011017 5 1918020'
map <- read.table(text=map, header=T)
This is my result in windows based result for sample_1:
Name Chr Position sample_1.GType sample_1.Log R Ratio sample_1.B Allele Freq
AX-100010998 Z 70667736 BB Infinity 0.675637419295063
AX-100010999 4 36427048 AB 0.101639462657534 0.531373516807123
AX-100011005 26 4016045 AA -0.111910305454305 0
AX-100011007 6 25439800 BB 0.148781943283483 1
AX-100011008 2 147800617 AB -0.293273363654622 0.609503132331127
AX-100011010 1 98919397 BB -0.283993308525307 0.960031843823016
AX-100011011 Z 66652642 AA Infinity 0.00579049667757003
AX-100011012 7 28180218 AA 0.0245684274744242 0.032174599843476
AX-100011016 1A 33254907 AA -0.265925457515035 0
AX-100011017 5 1918020 AA -0.0091211520536838 0
The values from the windows based tool seems to be correct, but in linux output that´s is not the case. I am following the steps decribed at penncnv software (http://penncnv.openbioinformatics.org/en/latest/user-guide/input/) and I log2 transformed my summary.txt and did the quantile normalization with limma package using normalizeBetweenArrays(x), finishing with the corrsummary.txt:
corrsum <- 'probeset_id sample_1 sample_2 sample_3
AX-100010998-A 9.804932 9.285738 9.530882
AX-100010998-B 10.249239 10.528922 9.804932
AX-100010999-A 10.528922 10.641862 10.134816
AX-100010999-B 10.641862 8.472829 10.249239
AX-100011005-A 10.804446 10.249239 8.816931
AX-100011005-B 8.835381 11.186266 12.045852
AX-100011007-A 8.542343 8.835381 11.039756
AX-100011007-B 12.045852 12.045852 11.186266
AX-100011008-A 8.816931 8.816931 8.472829
AX-100011008-B 10.134816 9.910173 10.592867
AX-100011010-A 8.472829 8.542343 8.542343
AX-100011010-B 10.374032 10.134816 10.641862
AX-100011011-A 11.593784 11.593784 11.593784
AX-100011011-B 10.012055 11.039756 9.910173
AX-100011012-A 11.186266 10.012055 10.804446
AX-100011012-B 9.530882 10.592867 9.285738
AX-100011016-A 10.592867 10.374032 10.012055
AX-100011016-B 9.285738 9.530882 10.528922
AX-100011017-A 11.039756 10.804446 8.835381
AX-100011017-B 9.910173 9.804932 10.374032'
corrsum <- read.table(text=corrsum, header=T)
Thus I applied:
./generate_affy_geno_cluster.pl calls.txt confidences.txt corrsummary.txt --locfile map.txt --sexfile gender.txt --output gencluster
and
./normalize_affy_geno_cluster.pl --locfile map.txt gencluster calls.txt --output lrrbaf.txt
And my linux based result (lrrbaf.txt) which must contain LRR and BAF information looks like that:
output <- 'Name Chr Position sample_1.LogRRatio sample_1.BAlleleFreq sample_2.LogRRatio sample_2.BAlleleFreq sample_3.LogRRatio sample_3.BAlleleFreq
AX-100010999 4 36427048 -1952.0739 2 -1953.0739 2 -1952.0739 2
AX-100011005 26 4016045 -2245.1784 2 -2244.1784 2 -2243.1784 2
AX-100011007 6 25439800 -4433.4661 2 -4433.4661 2 -4434.4661 2
AX-100011008 2 147800617 -1493.2287 2 -1493.2287 2 -1492.2287 2
AX-100011011 Z 66652642 -4088.2311 2 -4087.2311 2 -4088.2311 2
AX-100011012 7 28180218 -2741.2623 2 -2740.2623 2 -2741.2623 2
AX-100011016 1A 33254907 -2117.7005 2 -2117.7005 2 -2116.7005 2
AX-100011017 5 1918020 -3067.4077 2 -3067.4077 2 -3065.4077 2'
output <- read.table(text=output, header=T)
As showed above the linux result is completely different from windows based results (and make much less sense) and additionally do not contain the GType column in the output. Sorry to compose such a long question, but my intention was to make it as reproducible as possible. I would be grateful for any light to solve this problem as well any important remarks about this kind of data that I maybe forgot.

Resources