row-wise operation with sf - sf

I am trying to obtain a list of nearby counties in another column. I am using sf and county boundaries from 'USAboundaries'. I wrote the following code, but I feel like there must be a computationally more efficient way. Any idea?
Thank you in advance.
library(USAboundaries)
library(sf)
county <- us_counties() %>%
filter(str_detect(statefp,"02|72|78|15|66")==FALSE) %>%
mutate(geoid = as.integer(geoid))
for(i in 1:nrow(county)){
county$neighbor[i] <- county$geoid[unlist(st_is_within_distance(county[i,], county, 70000, sparse = TRUE ))]
}

A more efficient approach may be:
to buffer the county layer by the neighboring threshold you choose, such as 70000 m,
use st_intersects on the "buffered" layer against the original layer, to find out which counties each "buffered" county intersects,
and paste the list of IDs so that they fit into a character column
Here is a reproducible example:
library(USAboundaries)
library(sf)
library(dplyr)
library(stringr)
county <- us_counties() %>%
filter(str_detect(statefp,"02|72|78|15|66")==FALSE) %>%
mutate(geoid = as.integer(geoid))
# for(i in 1:nrow(county)){
# county$neighbor1[i] <- county$geoid[unlist(st_is_within_distance(county[i,], county, 70000, sparse = TRUE ))]
#
# }
county = st_transform(county, 2163) # Transform to EPSG:2163
x = st_buffer(county, 70000) # Calculate buffered county layer
int = st_intersects(x, county) # Find intersections
int = lapply(int, function(x) county$geoid[x]) # Translate intersection indices to IDs (i.e., geoid)
int = sapply(int, paste, collapse = "|") # Paste into single string
county$neighbors = int # Bind to the county layer
The result is printed below. The new neighbors column, at the end, contains the neighbor geoids:
head(county)
## Simple feature collection with 6 features and 13 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 95466.62 ymin: -1693172 xmax: 1475761 ymax: 24168.64
## epsg (SRID): 2163
## proj4string: +proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs
## statefp countyfp countyns affgeoid geoid name lsad
## 1 39 131 01074078 0500000US39131 39131 Pike 06
## 2 46 003 01266983 0500000US46003 46003 Aurora 06
## 3 55 035 01581077 0500000US55035 55035 Eau Claire 06
## 4 48 259 01383915 0500000US48259 48259 Kendall 06
## 5 40 015 01101795 0500000US40015 40015 Caddo 06
## 6 19 093 00465235 0500000US19093 19093 Ida 06
## aland awater state_name state_abbr jurisdiction_type
## 1 1140324458 9567612 Ohio OH state
## 2 1834813753 11201379 South Dakota SD state
## 3 1652211310 18848512 Wisconsin WI state
## 4 1715747531 1496797 Texas TX state
## 5 3310745124 30820525 Oklahoma OK state
## 6 1117599859 1406461 Iowa IA state
## geometry
## 1 MULTIPOLYGON (((1424415 -50...
## 2 MULTIPOLYGON (((95466.62 -1...
## 3 MULTIPOLYGON (((656691.6 17...
## 4 MULTIPOLYGON (((104718.2 -1...
## 5 MULTIPOLYGON (((124977.8 -1...
## 6 MULTIPOLYGON (((348649.9 -2...
## neighbors
## 1 39131|39047|39027|39097|21043|39129|39045|54053|39023|39049|54011|39165|21135|21069|39025|39057|39079|39087|39145|21019|39141|21205|39073|21161|39015|21023|54099|39001|39009|39163|21089|39127|39071|39053|39105
## 2 46003|46043|46067|46077|46015|31015|46061|46087|46073|31103|31089|46135|46009|46035|46005|46111|46059|46069|46053|46085|46097|31107|46023|46065|46017|46123
## 3 55035|55107|55141|55017|55005|27169|27109|55119|55081|55121|55033|55011|55053|55019|55093|55099|55057|55063|55095|55091|27157|55073|27049|55109
## 4 48259|48163|48171|48385|48463|48029|48055|48453|48265|48187|48325|48493|48013|48019|48319|48209|48267|48299|48053|48031|48091
## 5 40015|40149|40011|40109|40049|40141|40039|40073|40043|40031|40067|40009|40083|40051|40087|40017|40027|40093|40033|40075|40137|40065|40129|40055|40019
## 6 19093|19147|19165|19009|19193|19141|19077|19027|19041|31043|19133|19035|19021|19167|19151|31173|19085|19149|19047|19025|31177|31021|46127|19073|19161
Just to make sure the result makes sense, the following code also calculates neighbor count:
county = st_transform(county, 2163)
x = st_buffer(county, 70000)
int = st_intersects(x, county)
county$n = sapply(int, length)
And here is a plot of neighbor count:
plot(county[, "n"])

Related

Drop columns from a data frame but I keep getting this error below

enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
No matter how I try to code this in R, I still cannot drop my columns so that I can build my logistic regression model. I tried to run it two different ways
cols<-c("EmployeeCount","Over18","StandardHours")
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[-cols,]
Error in -cols : invalid argument to unary operator
cols<-c("EmployeeCount","Over18","StandardHours")
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[!cols,]
Error in !cols : invalid argument type
This may solve your problem:
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[ , !colnames(DAT_690_Attrition_Proj1EmpAttrTrain) %in% cols]
Please note that if you want to drop columns, you should put your code inside [ on the right side of the comma, not on the left side.
So [, your_code] not [your_code, ].
Here is an example of dropping columns using the code above.
cols <- c("cyl", "hp", "wt")
mtcars[, !colnames(mtcars) %in% cols]
# mpg disp drat qsec vs am gear carb
# Mazda RX4 21.0 160.0 3.90 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 160.0 3.90 17.02 0 1 4 4
# Datsun 710 22.8 108.0 3.85 18.61 1 1 4 1
# Hornet 4 Drive 21.4 258.0 3.08 19.44 1 0 3 1
# Hornet Sportabout 18.7 360.0 3.15 17.02 0 0 3 2
# Valiant 18.1 225.0 2.76 20.22 1 0 3 1
#...
Edit to Reproduce the Error
The error message you got indicates that there is a column that has only one, identical value in all rows.
To show this, let's try a logistic regression using a subset of mtcars data, which has only one, identical values in its cyl column, and then we use that column as a predictor.
mtcars_cyl4 <- mtcars |> subset(cyl == 4)
mtcars_cyl4
# mpg cyl disp hp drat wt qsec vs am gear carb
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
# Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
# Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
# Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
# Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
# Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
# Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
glm(am ~ as.factor(cyl) + mpg + disp, data = mtcars_cyl4, family = "binomial")
#Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
# contrasts can be applied only to factors with 2 or more levels
Now, compare it with the same logistic regression by using full mtcars data, which have various values in cyl column.
glm(am ~ as.factor(cyl) + mpg + disp, data = mtcars, family = "binomial")
# Call: glm(formula = am ~ as.factor(cyl) + mpg + disp, family = "binomial",
# data = mtcars)
#
# Coefficients:
# (Intercept) as.factor(cyl)6 as.factor(cyl)8 mpg disp
# -5.08552 2.40868 6.41638 0.37957 -0.02864
#
# Degrees of Freedom: 31 Total (i.e. Null); 27 Residual
# Null Deviance: 43.23
# Residual Deviance: 25.28 AIC: 35.28
It is likely that, even though you have drop three columns that have one,identical values in all the respective rows, there is another column in Trainingmodel1 that has one identical values. The identical values in the column were probably resulted during filtering the data frame and splitting data into training and test groups. Better to have a check by using summary(Trainingmodel1).
Further edit
I have checked the summary(Trainingmodel1) result, and it becomes clear that EmployeeNumber has one identical value (called "level" for a factor) in all rows. To run your regression properly, either you drop it from your model, or if EmployeeNumber has another level and you want to include it in your model, you should make sure that it contains at least two levels in the training data. It is possible to achieve that during splitting by repeating the random sampling until the randomly selected EmployeeNumber samples contain at least two levels. This can be done by looping using for, while, or repeat. It is possible, but I don't know how proper the repeated sampling is for your study.
As for your question about subsetting more than one variable, you can use subset and conditionals. For example, you want to get a subset of mtcars that has cyl == 4 and mpg > 20 :
mtcars |> subset(cyl == 4 & mpg > 20 )
If you want a subset that has cyl == 4 or mpg > 20:
mtcars |> subset(cyl == 4 | mpg > 20 )
You can also subset by using more columns as subset criteria:
mtcars |> subset((cyl > 4 & cyl <8) | (mpg > 20 & gear > 4 ))

Apply a custom (weighted) dictionary to text based on sentiment analysis

I am looking to adjust this code so that I can assign each one of these modal verbs with a different weight. The idea is to use something similar to the NRC library, where we have the "numbers" 1-5 represent categories, rather than numbers.
modals<-data_frame(word=c("must", "will", "shall", "should", "may", "can"),
modal=c("5", "4", "4", "3", "2", "1"))
My problem is that when I run the following code I have that 5 "may"s count as the same as one "must". What I want is for each word to have a different weight so that when I run this analysis I can see the concentration of uses of the stronger "must" versus say the much weaker "can". *with "tidy.DF" being my corpus and "school" and "target" being the column names.
MODAL<-tidy.DF %>%
inner_join(modals) %>%
count(School, Target, modal, index=wordnumber %/% 50, modal) %>%
spread(modal, n, fill=0)
ggplot(MODAL, aes(index, 5, fill=Target)) +
geom_col(show.legend=FALSE) +
facet_wrap(~Target, ncol=2, scales="free_x")
Here's a suggestion for a better approach, using the quanteda package instead. The approach:
Create a named vector of weights, corresponding to your "dictionary".
Create a document feature matrix, selecting only the terms in the dictionary.
Weight the observed counts.
# set modal values as a named numeric vector
modals <- c(5, 4, 4, 3, 2, 1)
names(modals) <- c("must", "will", "shall", "should", "may", "can")
library("quanteda", warn.conflicts = FALSE)
## Package version: 1.4.0
## Parallel computing: 2 of 12 threads used.
## See https://quanteda.io for tutorials and examples.
I'll use the most recent inaugural speeches as a reproducible example here.
dfmat <- data_corpus_inaugural %>%
corpus_subset(Year > 2000) %>%
dfm() %>%
dfm_select(pattern = names(modals))
This produces the raw counts.
dfmat
## Document-feature matrix of: 5 documents, 6 features (26.7% sparse).
## 5 x 6 sparse Matrix of class "dfm"
## features
## docs will must can should may shall
## 2001-Bush 23 6 6 1 0 0
## 2005-Bush 22 6 7 1 3 0
## 2009-Obama 19 8 13 0 3 3
## 2013-Obama 20 17 7 0 4 0
## 2017-Trump 40 3 1 1 0 0
Weighting this now is as simple as calling dfm_weight() to reweight the counts by the values of your weight vector. The function will automatically apply the weights using fixed matching of the vector element names to the dfm features.
dfm_weight(dfmat, weight = modals)
## Document-feature matrix of: 5 documents, 6 features (26.7% sparse).
## 5 x 6 sparse Matrix of class "dfm"
## features
## docs will must can should may shall
## 2001-Bush 92 30 6 3 0 0
## 2005-Bush 88 30 7 3 6 0
## 2009-Obama 76 40 13 0 6 12
## 2013-Obama 80 85 7 0 8 0
## 2017-Trump 160 15 1 3 0 0

Fastest way to fill a 3D array from another 3D arrray in R

I am using the code below to fill a 3D array from another 3D array. I have used the sapply function to apply the code lines at each individual (3rd dimension) as in Efficient way to fill a 3D array.
Here is my code.
ind <- 1000
individuals <- as.character(seq(1, ind, by = 1))
maxCol <- 7
col <- 4
line <- 0
a <- 0
b <- 0
c <- 0
col_array <- c("year","time", "ID", "age", as.vector(outer(c(paste(seq(0, 1, by = 1), "year", sep="_"), paste(seq(2, maxCol, by = 1), "years", sep="_")), c("S_F", "I_F", "R_F"), paste, sep="_")))
array1 <- array(sample(1:100, length(col_array), replace = T), dim=c(2, length(col_array), ind), dimnames=list(NULL, col_array, individuals)) ## 3rd dimension = individual ID
## print(array1)
col_array <- c("year","time", "ID", "age", as.vector(outer(c(paste(seq(0, 1, by = 1), "year", sep="_"), paste(seq(2, maxCol, by = 1), "years", sep="_")), c("S_M", "I_M", "R_M"), paste, sep="_")))
array2 <- array(NA, dim=c(2, length(col_array), ind), dimnames=list(NULL, col_array, individuals)) ## 3rd dimension = individual ID
## print(array2)
tic("array2")
array2 <- sapply(individuals, function(i){
## Fill the first columns
array2[line + 1, c("year", "time", "ID", "age"), i] <- c(a, b, i, c)
## Define column indexes for individuals S
col_start_S_F <- which(colnames(array1[,,i])=="0_year_S_F")
col_end_S_F <- which(colnames(array1[,,i])==paste(maxCol,"years_S_F", sep="_"))
col_start_S_M <- which(colnames(array2[,,i])=="0_year_S_M")
col_end_S_M <- which(colnames(array2[,,i])==paste(maxCol,"years_S_M", sep="_"))
## Fill the columns for individuals S
p_S_M <- sapply(0:maxCol, function(x){pnorm(x, 4, 1)})
array2[line + 1, col_start_S_M:col_end_S_M, i] <- round(as.numeric(as.vector(array1[line + 1, col_start_S_F:col_end_S_F, i]))*p_S_M)
## Define column indexes for individuals I
col_start_I_F <- which(colnames(array1[,,i])=="0_year_I_F")
col_end_I_F <- which(colnames(array1[,,i])==paste(maxCol,"years_I_F", sep="_"))
col_start_I_M <- which(colnames(array2[,,i])=="0_year_I_M")
col_end_I_M <- which(colnames(array2[,,i])==paste(maxCol,"years_I_M", sep="_"))
## Fill the columns for individuals I
p_I_M <- sapply(0:maxCol, function(x){pnorm(x, 2, 1)})
array2[line + 1, col_start_I_M:col_end_I_M, i] <- round(as.numeric(as.vector(array1[line + 1, col_start_I_F:col_end_I_F, i]))*p_I_M)
## Define column indexes for individuals R
col_start_R_M <- which(colnames(array2[,,i])=="0_year_R_M")
col_end_R_M <- which(colnames(array2[,,i])==paste(maxCol,"years_R_M", sep="_"))
## Fill the columns for individuals R
array2[line + 1, col_start_R_M:col_end_R_M, i] <- as.numeric(as.vector(array2[line + 1, col_start_S_M:col_end_S_M, i])) +
as.numeric(as.vector(array2[line + 1, col_start_I_M:col_end_I_M, i]))
return(array2[,,i])
## print(array2[,,i])
}, simplify = "array")
## print(array2)
toc()
Is there a way to increase the performance/speed of my code (i.e., < 1 sec)? There are 500000 observations for the 3rd dimension. Any suggestions?
TL;DR: Here's a tidyverse solution that transforms the sample array into a dataframe and applies the requested changes. EDIT: I've added steps 1+2 to transform the original post's sample data into the format I used in step 3. The actual calculation in Step 3 is very fast (<0.1 sec), but the bottleneck is step 2, which takes 10 seconds for 500k rows.
Step 0: Create sample data for 500k individuals
ind <- 500000
individuals <- as.character(seq(1, ind, by = 1))
maxCol <- 7
col <- 4
line <- 0
a <- 0
b <- 0
c <- 0
col_array <- c("year","time", "ID", "age", as.vector(outer(c(paste(seq(0, 1, by = 1), "year", sep="_"), paste(seq(2, maxCol, by = 1), "years", sep="_")), c("S_F", "I_F", "R_F"), paste, sep="_")))
array1 <- array(sample(1:100, length(col_array), replace = T), dim=c(2, length(col_array), ind), dimnames=list(NULL, col_array, individuals)) ## 3rd dimension = individual ID
dim(array1)
# [1] 2 28 500000 # Two rows x 28 measures x 500k individuals
Step 1: Subset array and convert to data frame.
library(tidyverse)
# OP only uses first line of array1. If other rows needed, replace with "array1 %>%"
# and adjust renaming below to account for different Var1.
array1_dt <- array1[1,,] %>%
as.data.frame.table(stringsAsFactors = FALSE)
Step 2: Break out the stats into different columns, with one row for each individual-year. This is the slowest step (especially the spread line), and takes 0.05 sec for 1000 individuals but 10 seconds for 500k. I expect a data.table solution could make it much faster, if needed.
array1_dt_reshape <- array1_dt %>%
rename(stat = Var1, ID = Var2) %>%
filter(!stat %in% c("year", "time", "ID", "age")) %>%
mutate(year = stat %>% str_sub(end = 1),
col = stat %>% str_sub(start = -3)) %>%
select(-stat) %>%
spread(col, Freq) %>%
arrange(ID)
Step 3: Apply requested transformation. This function calculates the distribution with two sets of parameters, and uses these to scale the input table's columns. It takes 0.03 sec for 500k of individuals.
array_transform <- function(input_data = array1_dt_reshape,
max_yr = 7, S_M_mean = 4, I_M_mean = 2) {
tictoc::tic()
# First calculate the distribution function values to apply to all individuals,
# depending on year.
p_S_M_vals <- sapply(0:max_yr, function(x){pnorm(x, S_M_mean, 1)})
p_I_M_vals <- sapply(0:max_yr, function(x){pnorm(x, I_M_mean, 1)})
# For each year, scale S_M + I_M by the respective distribution functions.
# This solution relies on the fact that each ID has 8 rows every time,
# so we can recycle the 8 values in the distribution functions.
output <- input_data %>%
# group_by(ID) %>% <-- Not needed
mutate(S_M = S_F * p_S_M_vals,
I_M = I_F * p_I_M_vals,
R_M = S_M + I_M) # %>% ungroup <-- Not needed
tictoc::toc()
return(output)
}
array1_output <- array_transform(array1_dt_reshape)
Results
head(array1_output)
ID year I_F R_F S_F S_M I_M R_M
1 1 0 16 76 23 7.284386e-04 0.3640021 0.3647305
2 1 1 46 96 80 1.079918e-01 7.2981417 7.4061335
3 1 2 27 57 76 1.729010e+00 13.5000000 15.2290100
4 1 3 42 64 96 1.523090e+01 35.3364793 50.5673837
5 1 4 74 44 57 2.850000e+01 72.3164902 100.8164902
6 1 5 89 90 64 5.384606e+01 88.8798591 142.7259228
7 1 6 23 16 44 4.299899e+01 22.9992716 65.9982658
8 1 7 80 46 90 8.987851e+01 79.9999771 169.8784862
9 2 0 16 76 23 7.284386e-04 0.3640021 0.3647305
10 2 1 46 96 80 1.079918e-01 7.2981417 7.406133

Assigning a tuple from a dictionary to an array Python

I have a pivot table array with factors and X and Y coordinates such as the one below, and I have a look up table with 64 colours that have RGB values. I have assigned a colour to each factor combination using a dictionary of tuples, but I am having a hard time figuring out how to now compare the keys of my dictonary (which are the different combination of factors) to my array so that each row that has that factor combination can be assigned the colour given in the dictionary.
This is an example of the Pivot Table:
A B C D Xpoint Ypoint
0 1 0 0 20 20
0 1 1 0 30 30
0 1 0 0 40 40
1 0 1 0 50 50
1 0 1 0 60 60
EDIT: This is an example of the LUT:
R G B
0 0 0
1 0 103
0 21 68
95 173 58
and this is an example of the dictionary that was made:
{
(0, 1, 0, 0): (1, 0, 103),
(0, 1, 1, 0): (12, 76, 161),
(1, 0, 1, 0): (0, 0, 0)
}
This is the code that I have used:
import numpy as np
from PIL import Image, ImageDraw
## load in LUT of 64 colours ##
LUT = np.loadtxt('LUT64.csv', skiprows=1, delimiter=',')
print LUT
## load in XY COordinates ##
PivotTable = np.loadtxt('PivotTable_2017-07-13_001.txt', skiprows=1, delimiter='\t')
print PivotTable
## Bring in image ##
IM = Image.open("mothTest.tif")
#bring in number of factors
numFactors = 4
#assign colour vectors to factor combos
iterColours = iter(LUT)
colour_dict = dict() # size will tell you how many colours will be used
for entry in PivotTable:
key = tuple(entry[0:numBiomarkers])
if key not in colour_dict:
colour_dict[key] = next(iterColours)
print(colour_dict)
Is there a way to compare the tuples in this dictionary to the rows in the pivot table array, or maybe there is a better way of doing this? Any help would be greatly appreciated!
If your target is, as I suppose in my comment above, to trace back the colors to the ntuple, then you already did everything. But I do not catch which role is played by the tif file ... Please note I corrected the reference to the non-existent NumBiomarkers variable...
import numpy as np
from PIL import Image, ImageDraw
## load in LUT of 64 colours ##
LUT = np.loadtxt('LUT64.csv', skiprows=1, delimiter=',')
print LUT
## load in XY COordinates ##
PivotTable = np.loadtxt('PivotTable_2017-07-13_001.txt', skiprows=1, delimiter=',')
print PivotTable
## Bring in image ##
IM = Image.open("Lenna.tif")
#bring in number of factors
numFactors = 4
#assign colour vectors to factor combos
iterColours = iter(LUT)
colour_dict = dict() # size will tell you how many colours will be used
for entry in PivotTable:
key = tuple(entry[0:numFactors])
if key not in colour_dict:
colour_dict[key] = next(iterColours)
print(colour_dict)
print '===='
for entry in PivotTable:
key = tuple(entry[0:numFactors])
print str(entry) + ' ' + str(colour_dict[key])
can you please add a short example for LUT64.csv, for PivotTable_2017-07-13_001.txt ? Maybe for this one you should also use a different delimiter than \t to ensure portability of your examples.
Regards

How can I access components of list elements in R

I have a ragged list that I would like to work with. i.e. I would like to use an apply function to quickly and simply pull out elements from the lists. The following code attempts to approximate my situation:
vec1 <- c("B","D","E","NA")
vec2 <- c("B","D","E","NA")
vec3 <- c("B","C","E","NA")
write.table(vec1, file="./vec1.csv", sep=",", quote=F)
write.table(vec2, file="./vec2.csv", sep=",", quote=F)
write.table(vec3, file="./vec3.csv", sep=",", quote=F)
vectors.files <- list.files(path=getwd(),recursive=F, pattern=paste("*.csv",sep=""))
vectors.list <- lapply(vectors.files, read.csv)
How would I then be able to create a new object that was for example the second row of each list element in vectors.list?
Thanks,
Matt
It's not really clear what you're after as the final output format, but you might want to try variations on the following template:
lapply(vectors.list, function(x) x[2, , drop = FALSE])
# [[1]]
# x
# 2 D
#
# [[2]]
# x
# 2 D
#
# [[3]]
# x
# 2 C
Here, we've just passed an anonymous function (function(x)) to the items in your "vectors.list". In this case, we've used basic subsetting using [ to extract the second row. The drop = FALSE is to retain the data.frame structure since the result is a single-column data.frame (which normally simplifies to a vector).
Note that the data.frames in the resulting list still have all the original levels for the "x" factor. Use droplevels if you want to retain only the specific factor in that row.
Compare:
str(lapply(vectors.list, function(x) x[2, , drop = FALSE]))
# List of 3
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 3 levels "B","D","E": 2
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 3 levels "B","D","E": 2
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 3 levels "B","C","E": 2
str(lapply(vectors.list, function(x) droplevels(x[2, , drop = FALSE])))
# List of 3
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 1 level "D": 1
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 1 level "D": 1
# $ :'data.frame': 1 obs. of 1 variable:
# ..$ x: Factor w/ 1 level "C": 1
You may also want to explore as.character(unlist(x[2, ]).
If you store your vectors in a data frame you can subset.
> df <- data.frame(vectors.list)
> row2 <- df[2,]
> row2
x x.1 x.2
2 D D C

Resources