How to use puts with multiple variable in one argument in Tcl? - loops

I have a series of list like a1, a2, a3, a4, ....
I want to print them in a loop, Thus both a and i are variables. How can I do that?
This doesn't work.
for {set i 1}....
puts $a$i
}
Thanks in advance

Ideally you'd have them in an associative array, but otherwise…
To read from a variable whose name is not constant, use the set command with just one argument instead of the $ syntax. (This use of set predates $.)
for {set i 1} {$i <= 3} {incr i} {
puts [set a$i]
}
However, it would be better to use an array (so the variables would effectively be named a(1), a(2), etc instead of a1, a2, etc). But it does mean a change to the code that creates the variables too.
for {set i 1} {$i <= 3} {incr i} {
puts $a($i)
}
The other realistic option for doing access to variable variables is the upvar command. It's not really recommended in the global namespace, but in a procedure it's very effective.
proc printThree {} {
for {set i 1} {$i <= 3} {incr i} {
upvar 1 a$i vbl
puts $vbl
}
}
This works particularly well when doing more complex operations than just printing the value.

As list:
set a1 [list 1 2 3 4]
set a2 [list 10 20 30 40]
set a3 [list 100 200 300 400]
set a4 [list 1000 2000 3000 4000]
foreach i [list {*}$a1] { puts "\$a$i: [set a$i]" }
# $a1: 1 2 3 4
# $a2: 10 20 30 40
# $a3: 100 200 300 400
# $a4: 1000 2000 3000 4000
As array:
array set a {
1 {1 2 3 4}
2 {10 20 30 40}
3 {100 200 300 400}
4 {1000 2000 3000 4000}
}
parray a
# a(1) = 1 2 3 4
# a(2) = 10 20 30 40
# a(3) = 100 200 300 400
# a(4) = 1000 2000 3000 4000
As dict:
set a [dict create \
a1 {1 2 3 4} \
a2 {10 20 30 40} \
a3 {100 200 300 400} \
a4 {1000 2000 3000 4000}]
dict for {k v} $a { puts "$k: $v" }
# a1: 1 2 3 4
# a2: 10 20 30 40
# a3: 100 200 300 400
# a4: 1000 2000 3000 4000
As matrix:
package require struct::matrix
::struct::matrix xdata
xdata add columns 4
xdata add rows 4
xdata set rect 0 0 {
{1 2 3 4}
{10 20 30 40}
{100 200 300 400}
{1000 2000 3000 4000}
}
join [xdata get rect 0 0 end end] \n
# 1 2 3 4
# 10 20 30 40
# 100 200 300 400
# 1000 2000 3000 4000
#cleanup
xdata destroy

Related

Drop columns from a data frame but I keep getting this error below

enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
enter image description here
No matter how I try to code this in R, I still cannot drop my columns so that I can build my logistic regression model. I tried to run it two different ways
cols<-c("EmployeeCount","Over18","StandardHours")
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[-cols,]
Error in -cols : invalid argument to unary operator
cols<-c("EmployeeCount","Over18","StandardHours")
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[!cols,]
Error in !cols : invalid argument type
This may solve your problem:
Trainingmodel1 <- DAT_690_Attrition_Proj1EmpAttrTrain[ , !colnames(DAT_690_Attrition_Proj1EmpAttrTrain) %in% cols]
Please note that if you want to drop columns, you should put your code inside [ on the right side of the comma, not on the left side.
So [, your_code] not [your_code, ].
Here is an example of dropping columns using the code above.
cols <- c("cyl", "hp", "wt")
mtcars[, !colnames(mtcars) %in% cols]
# mpg disp drat qsec vs am gear carb
# Mazda RX4 21.0 160.0 3.90 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 160.0 3.90 17.02 0 1 4 4
# Datsun 710 22.8 108.0 3.85 18.61 1 1 4 1
# Hornet 4 Drive 21.4 258.0 3.08 19.44 1 0 3 1
# Hornet Sportabout 18.7 360.0 3.15 17.02 0 0 3 2
# Valiant 18.1 225.0 2.76 20.22 1 0 3 1
#...
Edit to Reproduce the Error
The error message you got indicates that there is a column that has only one, identical value in all rows.
To show this, let's try a logistic regression using a subset of mtcars data, which has only one, identical values in its cyl column, and then we use that column as a predictor.
mtcars_cyl4 <- mtcars |> subset(cyl == 4)
mtcars_cyl4
# mpg cyl disp hp drat wt qsec vs am gear carb
# Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
# Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
# Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
# Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
# Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
# Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
# Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
# Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
glm(am ~ as.factor(cyl) + mpg + disp, data = mtcars_cyl4, family = "binomial")
#Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
# contrasts can be applied only to factors with 2 or more levels
Now, compare it with the same logistic regression by using full mtcars data, which have various values in cyl column.
glm(am ~ as.factor(cyl) + mpg + disp, data = mtcars, family = "binomial")
# Call: glm(formula = am ~ as.factor(cyl) + mpg + disp, family = "binomial",
# data = mtcars)
#
# Coefficients:
# (Intercept) as.factor(cyl)6 as.factor(cyl)8 mpg disp
# -5.08552 2.40868 6.41638 0.37957 -0.02864
#
# Degrees of Freedom: 31 Total (i.e. Null); 27 Residual
# Null Deviance: 43.23
# Residual Deviance: 25.28 AIC: 35.28
It is likely that, even though you have drop three columns that have one,identical values in all the respective rows, there is another column in Trainingmodel1 that has one identical values. The identical values in the column were probably resulted during filtering the data frame and splitting data into training and test groups. Better to have a check by using summary(Trainingmodel1).
Further edit
I have checked the summary(Trainingmodel1) result, and it becomes clear that EmployeeNumber has one identical value (called "level" for a factor) in all rows. To run your regression properly, either you drop it from your model, or if EmployeeNumber has another level and you want to include it in your model, you should make sure that it contains at least two levels in the training data. It is possible to achieve that during splitting by repeating the random sampling until the randomly selected EmployeeNumber samples contain at least two levels. This can be done by looping using for, while, or repeat. It is possible, but I don't know how proper the repeated sampling is for your study.
As for your question about subsetting more than one variable, you can use subset and conditionals. For example, you want to get a subset of mtcars that has cyl == 4 and mpg > 20 :
mtcars |> subset(cyl == 4 & mpg > 20 )
If you want a subset that has cyl == 4 or mpg > 20:
mtcars |> subset(cyl == 4 | mpg > 20 )
You can also subset by using more columns as subset criteria:
mtcars |> subset((cyl > 4 & cyl <8) | (mpg > 20 & gear > 4 ))

Permutations of 3 elements within 6 positions

I'm looking to permute (or combine) c("a","b","c") within six positions under the condition to have always sequences with alternate elements, e.g abcbab.
Permutations could easily get with:
abc<-c("a","b","c")
permutations(n=3,r=6,v=abc,repeats.allowed=T)
I think is not possible to do that with gtools, and I've been trying to design a function for that -even though I think it may already exist.
Since you're looking for permutations, expand.grid can work as well as permutations. But since you don't want like-neighbors, we can shorten the dimensionality of it considerably. I think this is legitimate random-wise!
Up front:
r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
r[[1]] <- c(r[[1]], length(abc))
m <- t(apply(do.call(expand.grid, r), 1, cumsum) %% length(abc) + 1)
m[] <- abc[m]
dim(m)
# [1] 96 6
head(as.data.frame(cbind(m, apply(m, 1, paste, collapse = ""))))
# Var1 Var2 Var3 Var4 Var5 Var6 V7
# 1 b c a b c a bcabca
# 2 c a b c a b cabcab
# 3 a b c a b c abcabc
# 4 b a b c a b babcab
# 5 c b c a b c cbcabc
# 6 a c a b c a acabca
Walk-through:
since you want all recycled permutations of it, we can use gtools::permutations, or we can use expand.grid ... I'll use the latter, I don't know if it's much faster, but it does a short-cut I need (more later)
when dealing with constraints like this, I like to expand on the indices of the vector of values
however, since we don't want neighbors to be the same, I thought that instead of each row of values being the straight index, we cumsum them; by using this, we can control the ability of the cumulative sum to re-reach the same value ... by removing 0 and length(abc) from the list of possible values, we remove the possibility of (a) never staying the same, and (b) never increasing actually one vector-length (repeating the same value); as a walk-through:
head(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), n = 6)
# Var1 Var2 Var3 Var4 Var5 Var6
# 1 1 1 1 1 1 1
# 2 2 1 1 1 1 1
# 3 3 1 1 1 1 1
# 4 1 2 1 1 1 1
# 5 2 2 1 1 1 1
# 6 3 2 1 1 1 1
Since the first value can be all three values, it's 1:3, but each additional is intended to be 1 or 2 away from it.
head(t(apply(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), 1, cumsum)), n = 6)
# Var1 Var2 Var3 Var4 Var5 Var6
# [1,] 1 2 3 4 5 6
# [2,] 2 3 4 5 6 7
# [3,] 3 4 5 6 7 8
# [4,] 1 3 4 5 6 7
# [5,] 2 4 5 6 7 8
# [6,] 3 5 6 7 8 9
okay, that doesn't seem that useful (since it goes beyond the length of the vector), so we can invoke the modulus operator and a shift (since modulus returns 0-based, we want 1-based):
head(t(apply(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), 1, cumsum) %% 3 + 1), n = 6)
# Var1 Var2 Var3 Var4 Var5 Var6
# [1,] 2 3 1 2 3 1
# [2,] 3 1 2 3 1 2
# [3,] 1 2 3 1 2 3
# [4,] 2 1 2 3 1 2
# [5,] 3 2 3 1 2 3
# [6,] 1 3 1 2 3 1
To verify this works, we can do a diff across each row and look for 0:
m <- t(apply(expand.grid(1:3, 1:2, 1:2, 1:2, 1:2, 1:2), 1, cumsum) %% 3 + 1)
any(apply(m, 1, diff) == 0)
# [1] FALSE
to automate this to an arbitrary vector, we enlist the help of replicate to generate the list of possible vectors:
r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
r[[1]] <- c(r[[1]], length(abc))
str(r)
# List of 6
# $ : int [1:3] 1 2 3
# $ : int [1:2] 1 2
# $ : int [1:2] 1 2
# $ : int [1:2] 1 2
# $ : int [1:2] 1 2
# $ : int [1:2] 1 2
and then do.call to expand it.
one you have the matrix of indices,
head(m)
# Var1 Var2 Var3 Var4 Var5 Var6
# [1,] 2 3 1 2 3 1
# [2,] 3 1 2 3 1 2
# [3,] 1 2 3 1 2 3
# [4,] 2 1 2 3 1 2
# [5,] 3 2 3 1 2 3
# [6,] 1 3 1 2 3 1
and then replace each index with the vector's value:
m[] <- abc[m]
head(m)
# Var1 Var2 Var3 Var4 Var5 Var6
# [1,] "b" "c" "a" "b" "c" "a"
# [2,] "c" "a" "b" "c" "a" "b"
# [3,] "a" "b" "c" "a" "b" "c"
# [4,] "b" "a" "b" "c" "a" "b"
# [5,] "c" "b" "c" "a" "b" "c"
# [6,] "a" "c" "a" "b" "c" "a"
and then we cbind the united string (via apply and paste)
Performance:
library(microbenchmark)
library(dplyr)
library(tidyr)
library(stringr)
microbenchmark(
tidy1 = {
gtools::permutations(n = 3, r = 6, v = abc, repeats.allowed = TRUE) %>%
data.frame() %>%
unite(united, sep = "", remove = FALSE) %>%
filter(!str_detect(united, "([a-c])\\1"))
},
tidy2 = {
filter(unite(data.frame(gtools::permutations(n = 3, r = 6, v = abc, repeats.allowed = TRUE)),
united, sep = "", remove = FALSE),
!str_detect(united, "([a-c])\\1"))
},
base = {
r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
r[[1]] <- c(r[[1]], length(abc))
m <- t(apply(do.call(expand.grid, r), 1, cumsum) %% length(abc) + 1)
m[] <- abc[m]
},
times=10000
)
# Unit: microseconds
# expr min lq mean median uq max neval
# tidy1 1875.400 2028.8510 2446.751 2165.651 2456.051 12790.901 10000
# tidy2 1745.402 1875.5015 2284.700 2000.051 2278.101 50163.901 10000
# base 796.701 871.4015 1020.993 919.801 1021.801 7373.901 10000
I tried the infix (non-%>%) tidy2 version just for kicks, and though I was confident it would theoretically be faster, I didn't realize it would shave over 7% off the run-times. (The 50163 is likely R garbage-collecting, not "real".) The price we pay for readability/maintainability.
There are probably cleaner methods, but here ya go:
abc <- letters[1:3]
library(tidyverse)
res <- gtools::permutations(n = 3, r = 6, v = abc, repeats.allowed = TRUE) %>%
data.frame() %>%
unite(united, sep = "", remove = FALSE) %>%
filter(!str_detect(united, "([a-c])\\1"))
head(res)
united X1 X2 X3 X4 X5 X6
1 ababab a b a b a b
2 ababac a b a b a c
3 ababca a b a b c a
4 ababcb a b a b c b
5 abacab a b a c a b
6 abacac a b a c a c
If you want a vector, you can use res$united or add %>% pull(united) as an additional step at the end of the pipes above.

Merging two file based on columns and sorting

I have two files, FILE1 and FILE2, that have a different number of
columns and some columns in common. In both files the first column is
a row identifier. I want to merge the two files (FILE1 and FILE2)
without changing the order of the columns, and where there is a missing
value input the value '5'.
For example FILE1 (first column is the row ID, A1 is the first row, A2
the second, ...):
A1 1 2 5 1
A2 0 2 1 1
A3 1 0 2 2
The column names for FILE1 is (these are specified in another file),
Affy1
Affy3
Affy4
Affy5
which is to say that the value in row A1, column Affy1 is 1
and the value in row A3, column Affy5 is 2
v~~~~~ Affy3
A1 1 2 5 1
A2 0 2 1 1
A3 1 0 2 2
^~~~ Affy1
Similarly for FILE2
B1 1 2 0
B2 0 1 1
B3 5 1 1
And its column names,
Affy1
Affy2
Affy3
Meaning that
v~~~~~ Affy2
B1 1 2 0
B2 0 1 1
B3 5 1 1
^~~~ Affy1
I want to merge and sort columns based on the column names and put a
'5' for missing values. so the merged result would be as follows:
A1 1 5 2 5 1
A2 0 5 2 1 1
A3 1 5 0 2 2
B1 1 2 0 5 5
B2 0 1 1 5 5
B3 5 1 1 5 5
And the columns:
Affy1
Affy2
Affy3
Affy4
Affy5
Which is to say,
v~~~~~~~ Affy2
A1 1 5 2 5 1
A2 0 5 2 1 1
A3 1 5 0 2 2
B1 1 2 0 5 5
B2 0 1 1 5 5
B3 5 1 1 5 5
^~~~ Affy1
In reality I have over 700K columns and over 2K rows in each file. Thanks in advance!
The difficult part is ordering the headers when some of them appear only in one file. The best way I know is to build a directed graph using the Graph module and sort the elements topologically
Once that's done it's simply a matter of assigning the values from each file to the correct columns and filling the blanks with 5s
I've incorporated the headers as the first line of each data file, so this program works with this data
file1.txt
ID Affy1 Affy3 Affy4 Affy5
A1 1 2 5 1
A2 0 2 1 1
A3 1 0 2 2
file2.txt
ID Affy1 Affy2 Affy3
B1 1 2 0
B2 0 1 1
B3 5 1 1
And here's the code
consolidate_columns.pl
use strict;
use warnings 'all';
use Graph::Directed;
my #files = qw/ file1.txt file2.txt /;
# Make an array of two file handles
#
my #fh = map {
open my $fh, '<', $_ or die qq{Unable to open "$_" for input: $!};
$fh;
} #files;
# Make an array of two lists of header names
#
my #file_heads = map { [ split ' ', <$_> ] } #fh;
# Use a directed grapoh to sort all of the header names so thet they're
# still in the order that they were at the top of both files
#
my #ordered_headers = do {
my $g = Graph::Directed->new;
for my $f ( 0, 1 ) {
my $file_heads = $file_heads[$f];
$g->add_edge($file_heads->[$_], $file_heads->[$_+1]) for 0 .. $#$file_heads-1;
}
$g->topological_sort;
};
# Form a hash converting header names to column indexes for output
#
my %ordered_headers = map { $ordered_headers[$_] => $_ } 0 .. $#ordered_headers;
# Print the header and the reformed records from each file. Use the hash to
# convert the header names into column indexes
#
print "#ordered_headers\n";
for my $i ( 0 .. $#fh ) {
my $fh = $fh[$i];
my #file_heads = #{ $file_heads[$i] };
my #splice = map { $ordered_headers{$_} } #file_heads;
while ( <$fh> ) {
next unless /\S/;
my #columns;
#columns[#splice] = split;
$_ //= 5 for #columns[0 .. $#ordered_headers];
print "#columns\n";
}
}
output
ID Affy1 Affy2 Affy3 Affy4 Affy5
A1 1 5 2 5 1
A2 0 5 2 1 1
A3 1 5 0 2 2
B1 1 2 0 5 5
B2 0 1 1 5 5
B3 5 1 1 5 5
For the fun of it -- HTH
#!/usr/bin/perl
use warnings;
use strict;
use constant {A => 1, B => 2, BOTH =>3};
# I don't read data from file
my #columns = qw(Affy1 Affy2 Affy3 Affy4 Affy5);
my #locations = (BOTH, B, BOTH, A, A);
my #contentA = (["A1", 1, 2, 5, 1],
["A2", 0, 2, 1, 1],
["A3", 1, 0, 2, 2]);
my #contentB = (["B1", 1, 2, 0],
["B2", 0, 1, 1],
["B3", 5, 1, 1]);
#I assume both files have the same amount of lines
my #ares = ();
my #bres = ();
for(my $i = 0; $i < #contentA; ++$i){
# this uses a lot of memory whith huge amounts of data
# maybe you write this in two temp result files and cat them
# together at the end
# another alternative would be to iterate first over
# file A and then over file A
my #row_a = ();
my #row_b = ();
push #row_a, shift #{$contentA[$i]}; #id
push #row_b, shift #{$contentB[$i]}; #id
foreach my $loc (#locations){
if(A == $loc){
push #row_a, shift #{$contentA[$i]};
push #row_b, 5;
}
if(B == $loc){
push #row_a, 5;
push #row_b, shift #{$contentB[$i]};
}
if(BOTH == $loc){
push #row_a, shift #{$contentA[$i]};
push #row_b, shift #{$contentB[$i]};
}
}
push #ares, \#row_a;
push #bres, \#row_b;
}
foreach my $ar(#ares){
print join " ", #{$ar};
print "\n";
}
foreach my $br(#bres){
print join " ", #{$br};
print "\n";
}
print join("\n", #columns);
print "\n";

How to find a row or column under specific conditions in a matrix in R [duplicate]

This question already has answers here:
given value of matrix, getting it's coordinate
(2 answers)
Closed 7 years ago.
For example, if we have a matrix or say array with the following format
How can we find the index of rows or columns which only have numbers between 10 to 20 inside ?
M = array(c(1,1,12,34,0,19,15,1,0,17,12,0,21,1,11,1), dim=c(4,4))
And, also, I am not allowed to use for or while loops to do this.
Another thing is that the matrix or array may have a more than 2 dimensions. if the method can also apply to multi-dimensional matrix or array, it will be better for me. Thanks.
Instead of trying to find the index of qualified single elements, I need to find those rows or columns in which all the elements are between the interval.
In this example, I hope to have a result telling me that Row number 3 is a row that all the numbers within this row are between 10 to 20.
Use which(..., arr.ind = TRUE). Here I assume between means 10 and 20 are non-inclusive
which(M > 10 & M < 20, arr.ind = TRUE)
# row col
# [1,] 3 1
# [2,] 2 2
# [3,] 3 2
# [4,] 2 3
# [5,] 3 3
# [6,] 3 4
This will also work on 3-dimensional arrays (and higher).
## Three dimensions
dim(M) <- c(2, 4, 2)
which(M > 10 & M < 20, arr.ind = TRUE)
# dim1 dim2 dim3
# [1,] 1 2 1
# [2,] 2 3 1
# [3,] 1 4 1
# [4,] 2 1 2
# [5,] 1 2 2
# [6,] 1 4 2
## Four dimensions
dim(M) <- rep(2, 4)
which(M > 10 & M < 20, arr.ind = TRUE)
# dim1 dim2 dim3 dim4
# [1,] 1 2 1 1
# [2,] 2 1 2 1
# [3,] 1 2 2 1
# [4,] 2 1 1 2
# [5,] 1 2 1 2
# [6,] 1 2 2 2
## ... and so on
Note: To include 10 and 20, just use M >= 10 & M <= 20
Data:
M <- structure(c(1, 1, 12, 34, 0, 19, 15, 1, 0, 17, 12, 0, 21, 1,
11, 1), .Dim = c(4L, 4L))
Update: From your edit, you can find the row numbers for which all values are between 10 and 20 with
which(rowSums(M >= 10 & M <= 20) == ncol(M))
# [1] 3

storing value against variable name "QW1I5K20" in an array element Q[1,5,20] using R

I have an excel file (.csv) with a sorted column of variable names such as "QW1I1K5" and numerical values against them.
this list goes on for
W from 1 to 15
I from 1 to 4
K from 1 to 30
total elements = 15*4*30 = 1800
I want to store the numerical values against these variables in an array whose indices are derived from the variable name .
for example QW1I1K5 has a value 11 . this must be stored in an array element Q[1,1,5] = 11 ( index set of [1,1,5] corresponds to W1 , I1 , K5)
May be this helps
Q <- array(dat$Col2, dim=c(15,4,30))
dat$Col2[dat$Col1=='QW1I1K5']
#[1] 34
Q[1,1,5]
#[1] 34
dat$Col2[dat$Col1=='QW4I3K8']
#[1] 38
Q[4,3,8]
#[1] 38
If you want the index along with the values
library(reshape2)
d1 <- melt(Q)
head(d1,3)
# Var1 Var2 Var3 value
#1 1 1 1 12
#2 2 1 1 9
#3 3 1 1 29
Q[1,1,1]
#[1] 12
Q[3,1,1]
#[1] 29
Update
Suppose, your data is in the order as you described in the comments, which will be dat1
indx <- read.table(text=gsub('[^0-9]+', ' ', dat1$Col1), header=FALSE)
dat2 <- dat1[do.call(order, indx[,3:1]),]
Q1 <- array(dat2$Col2,dim=c(15,4,30))
Q1[1,1,2]
#[1] 20
dat2$Col2[dat2$Col1=='QW1I1K2']
#[1] 20
data
Col1 <- do.call(paste,c(expand.grid('QW', 1:15, 'I', 1:4, 'K',1:30),
list(sep='')))
set.seed(24)
dat <- data.frame(Col1, Col2=sample(1:40, 1800,replace=TRUE))
dat1 <- dat[order(as.numeric(gsub('[^0-9]+', '', dat$Col1))),]
row.names(dat1) <- NULL
I would suggest looking at using "data.table" and setting your key to the split columns. You can use cSplit from my "splitstackshape" function to easily split the column.
Sample Data:
df <- data.frame(
V1 = c("QW1I1K1", "QW1I1K2", "QW1I1K3",
"QW1I1K4", "QW2I1K5", "QW2I3K2"),
V2 = c(15, 20, 5, 6, 7, 9))
df
# V1 V2
# 1 QW1I1K1 15
# 2 QW1I1K2 20
# 3 QW1I1K3 5
# 4 QW1I1K4 6
# 5 QW2I1K5 7
# 6 QW2I3K2 9
Splitting the column:
library(splitstackshape)
out <- cSplit(df, "V1", "[A-Z]+", fixed = FALSE)
setnames(out, c("V2", "W", "I", "K"))
setcolorder(out, c("W", "I", "K", "V2"))
setkey(out, W, I, K)
out
# W I K V2
# 1: 1 1 1 15
# 2: 1 1 2 20
# 3: 1 1 3 5
# 4: 1 1 4 6
# 5: 2 1 5 7
# 6: 2 3 2 9
Extracting rows:
out[J(1, 1, 4)]
# W I K V2
# 1: 1 1 4 6
out[J(2, 3, 2)]
# W I K V2
# 1: 2 3 2 9

Resources