Nested for-loop: error variable already defined - loops

I have a nested loop in Stata with four levels of foreach statements. With this loop, I am trying to create a new variable named strata that ranges from 1 to 40.
foreach x in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 {
foreach r in 1 2 3 4 5 {
foreach s in 1 2 {
foreach a in 1 2 3 4 {
gen strata= `x' if race==`r' & sex==`s' & age==`a'
}
}
}
}
I get an error :
"variable strata already defined"
Even with the error, the loop does assign strata = 1, but not the rest of the strata. All other cells are missing/empty.
Example data:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(age sex race)
1 2 2
1 2 1
1 1 1
1 1 1
1 2 1
2 2 1
2 2 1
4 2 1
1 2 1
4 2 1
3 2 1
2 2 1
4 2 1
4 2 2
3 2 1
4 1 3
4 2 1
4 2 1
2 1 2
4 2 1
2 2 1
3 2 1
3 2 1
1 2 3
4 2 1
1 2 5
4 2 1
4 2 1
4 2 2
4 2 1
2 2 1
4 1 1
3 2 1
1 2 1
2 2 1
4 2 1
1 2 2
2 2 3
1 1 3
4 2 1
2 2 3
1 2 1
1 1 1
2 2 3
1 2 1
1 1 3
1 2 1
2 2 1
3 2 1
1 2 1
4 2 1
1 2 2
1 2 1
2 2 1
4 2 1
4 2 1
1 2 1
1 2 1
4 2 1
2 2 1
4 2 1
1 2 1
1 1 3
2 2 1
1 1 1
4 1 1
3 2 1
2 2 1
1 2 1
1 1 1
2 2 3
4 2 2
2 2 1
2 2 1
3 2 1
2 2 2
3 2 1
2 1 1
1 1 1
3 2 1
1 2 3
4 2 1
4 2 1
2 2 1
1 2 1
1 1 1
3 2 1
4 2 1
2 2 3
1 2 3
4 2 1
3 2 1
2 2 1
4 2 1
3 2 1
2 1 1
1 2 1
2 2 1
2 2 3
1 1 1
end
label values sex sex
label def sex 1 "male (1)", modify
label def sex 2 "female (2)", modify
label values race race
label def race 1 "non-Hispanic white (1)", modify
label def race 2 "black (2)", modify
label def race 3 "AAPI/other (3)", modify
label def race 5 "Hispanic (5)", modify

generate is for generating new variables. The second time your code reaches a generate statement, the code fails for the reason given.
One answer is that you need to generate your variable outside the loops and then replace inside.
For other reasons your code can be rewritten in stages.
First, integer sequences can be more easily and efficiently specified with forvalues, which can be abbreviated: I tend to write forval.
gen strata = .
forval x = 1/40 {
forval r = 1/5 {
forval s = 1/2 {
forval a = 1/4 {
replace strata = `x' if race==`r' & sex==`s' & age==`a'
}
}
}
}
Second, the code is flawed any way. Everything ends up as 40!
Third, you can do allocations much more directly, say by
gen strata = 8 * (race - 1) + 4 * (sex - 1) + age
This is a self-contained reproducible demonstration:
clear
set obs 5
gen race = _n
expand 2
bysort race : gen sex = _n
expand 4
bysort race sex : gen age = _n
gen strata = 8 * (race - 1) + 4 * (sex - 1) + age
isid strata
Clearly you can and should vary the recipe for a different preferred scheme.

Related

Error: Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent

I have the following dataset about the choices of different car brands and their attributes. I would like to create a matrix based on each attribute of the cars.
RespNum Task Concept Make Exterior.Design Interior.design
1 100086500 1 1 3 2 3
2 100086500 1 2 1 3 2
3 100086500 1 3 4 1 1
4 100086500 1 4 0 0 0
5 100086500 2 1 1 3 2
6 100086500 2 2 5 1 3
Driving.performance Driving.attributes Comfort Practibility Safety
1 1 1 1 3 3
2 3 3 3 2 1
3 2 2 2 1 2
4 0 0 0 0 0
5 3 2 1 1 3
6 1 3 3 3 2
Quality Equipment Sustainability Economy Price Response
1 2 1 1 3 1 0
2 1 3 3 1 3 0
3 3 2 2 2 2 1
4 0 0 0 0 0 0
5 3 2 1 1 4 0
6 1 3 3 3 8 0
I am using the function:
Make = attribcoding(6,4,'Other')
The first input (6) is the number of levels, the second (4) is the column position in the dataset, and the last ('Other') is the name of the outside option. However, I get the following error message:
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent

Unique Columns Across an Array?

I have an array structured like so:
a = [1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 5 5 5 5;
1 1 1 1 2 2 2 2 2 2 2 1 1 1 1 2 2 3 3 1 1 1 2 3 4 4 4 1 1 1 1 2 2 3 3];
Pretty much, it's a 2 by n (I simplified my matrix in this question with reduced number of columns for simplicity's sake), no real pattern. I want to be able to find the unique number of columns. So in this simplified example, I can (but it'll take a while) count by hand and noticed that my unique matrix b is:
b= 1 1 2 2 2 3 3 3 3 4 5 5
1 2 1 2 3 1 2 3 4 1 2 3
In MATLAB, I can do something like
size(b,2)
To get the number of unique columns. In this example
size(b,2) = 12
My question is, how do I go from matrix a to matrix b so that I can do this computationally for very large n dimensional matrices that I have?
Use unique:
a = [1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 5 5 5 5;
1 1 1 1 2 2 2 2 2 2 2 1 1 1 1 2 2 3 3 1 1 1 2 3 4 4 4 1 1 1 1 2 2 3 3];
% Transpose to leverage the rows flag, then transpose back
b = unique(a.', 'rows').';
Which returns:
b =
1 1 2 2 2 3 3 3 3 4 5 5
1 2 1 2 3 1 2 3 4 1 2 3

How to concatenate submatrix into a bigger matrix in Octave

I'm trying to solve the following issue: I have an 3x3x4 array like this:
A(:,:,1) = A(:,:,2) = A(:,:,3) = A(:,:,4) =
1 1 1 2 2 2 3 3 3 4 4 4
1 1 1 2 2 2 3 3 3 4 4 4
1 1 1 2 2 2 3 3 3 4 4 4
I would like to produce a 6x6 matrix like the following:
B =
1 1 1 3 3 3
1 1 1 3 3 3
1 1 1 3 3 3
2 2 2 4 4 4
2 2 2 4 4 4
2 2 2 4 4 4
My first thought was to use something like the reshape function, but since it operates columnwise, the result is not what I want.
Do you have any ideas to perform it efficiently?
Thanks in advance
This is for a general case of converting a 3D array into such a 2D array -
m = 2; %// number of 3D slices to be vertically concatenated to form the rows
m1 = size(A,1)*m;
m2 = size(A,3)/m;
B = reshape(permute(reshape(permute(A,[1 3 2]),m1,m2,[]),[1 3 2]),m1,[])
Sample run -
A(:,:,1) =
1 1 7
1 9 1
1 7 2
A(:,:,2) =
3 9 2
9 4 7
9 3 7
A(:,:,3) =
2 6 8
4 8 4
1 8 4
A(:,:,4) =
1 1 7
8 3 4
1 9 8
A(:,:,5) =
7 9 2
6 8 5
4 1 6
A(:,:,6) =
3 2 8
4 9 1
4 4 4
B =
1 1 7 2 6 8 7 9 2
1 9 1 4 8 4 6 8 5
1 7 2 1 8 4 4 1 6
3 9 2 1 1 7 3 2 8
9 4 7 8 3 4 4 9 1
9 3 7 1 9 8 4 4 4
Since your sub-matrices are all of the same size you can assign them directly into B:
clear
B = zeros(6);
A(:,:,1) = ones(3);
A(:,:,2) = 2*ones(3);
A(:,:,3) = 3*ones(3);
A(:,:,4) = 4*ones(3);
B = [A(:,:,1) A(:,:,3); A(:,:,2) A(:,:,4)]
B =
1 1 1 3 3 3
1 1 1 3 3 3
1 1 1 3 3 3
2 2 2 4 4 4
2 2 2 4 4 4
2 2 2 4 4 4
This might prove cumbersome if you have many more sub-matrices though but that could be automated.
permute is much more efficient (à la Divakar) or manually slicing into a 2D array (à la Benoit), but I'll add something to the mix for future readers. One way I can suggest is to take each plane and place it into a 1D cell array, reshape the cell array into a 2 x 2 grid, then convert the 2 x 2 grid into a final matrix. Something like:
B = arrayfun(#(x) A(:,:,x), 1:4, 'uni', 0);
B = reshape(B, 2, 2);
B = cell2mat(B)
B =
1 1 1 3 3 3
1 1 1 3 3 3
1 1 1 3 3 3
2 2 2 4 4 4
2 2 2 4 4 4
2 2 2 4 4 4

how to vectorize the following for loop?

can any one help me to Vectorized this loop.
i have large Matrix and i want to replace all the pixel values whose length is less then some threshold Value For simplicity lets say
a = randi([1 5],10,10);
for i = 1:length(a)
someMat=a(a==i);
if length(someMat)<20
a(a==i)=0;
end
end
but its killing me.
Example:
a = randi([1 5],10,10)
a =
5 2 1 5 5 5 2 2 3 2
3 3 5 4 4 4 3 1 1 5
5 1 3 5 3 3 4 1 3 1
3 1 5 3 2 5 1 1 5 1
1 1 4 3 4 3 4 4 5 1
1 4 3 5 1 1 2 2 2 1
3 3 5 2 4 1 1 3 2 4
4 1 5 3 4 5 3 4 3 3
5 3 5 5 4 3 1 3 4 1
4 1 1 3 5 5 1 3 3 5
Result for Thresold 20
5 0 1 5 5 5 0 0 3 0
3 3 5 0 0 0 3 1 1 5
5 1 3 5 3 3 0 1 3 1
3 1 5 3 0 5 1 1 5 1
1 1 0 3 0 3 0 0 5 1
1 0 3 5 1 1 0 0 0 1
3 3 5 0 0 1 1 3 0 0
0 1 5 3 0 5 3 0 3 3
5 3 5 5 0 3 1 3 0 1
0 1 1 3 5 5 1 3 3 5
length of pixel 4 was 17
length of pixel 2 was 10
i try it by some thing like
[nVal Index] = histc(a(:),unique(a)); %
nVal(nVal>20) = 1; % just some threshold value and assigning by some Number may be zero as well
But I dont Know how to replace the Index Values of the corresponding Pixal and apply reshape to get it in original form. Here Even i am not sure that i will get the same Matrix With Reshape . Please Help me.....
thanks
I think this does what you want:
threshold_length = 20;
replace_value = 0;
u = unique(a); %// values of a
h = histc(a(:), u); %// count for each value
r = u(h<threshold_length); %// values to be removed
a(ismember(a,r)) = replace_value; %// remove those values
I see #LuisMendo arrived at mostly the same solution quicker than I did, but an alternative to using ismember is to use more of what unique gives you:
threshold = 20;
[vals, ~, ix] = unique(a); % capture the values and their indices
counts = histc(a(:), vals); % count the occurrences of each value
vals(counts<threshold) = 0; % zero the values that aren't common enough
a(:) = vals(ix); % recreate the matrix with updated values

Reshape acast() remove missing values

I have this dataframe:
df <- data.frame(subject = c(rep("one", 20), c(rep("two", 20))),
score1 = sample(1:3, 40, replace=T),
score2 = sample(1:6, 40, replace=T),
score3 = sample(1:3, 40, replace=T),
score4 = sample(1:4, 40, replace=T))
subject score1 score2 score3 score4
1 one 2 4 2 2
2 one 3 3 1 2
3 one 1 2 1 3
4 one 3 4 1 2
5 one 1 2 2 3
6 one 1 5 2 4
7 one 2 5 3 2
8 one 1 5 1 3
9 one 3 5 2 2
10 one 2 3 3 4
11 one 3 2 1 3
12 one 2 5 2 1
13 one 2 4 1 4
14 one 2 2 1 3
15 one 1 3 1 4
16 one 1 6 1 3
17 one 3 4 2 2
18 one 3 2 1 3
19 one 2 5 3 1
20 one 3 6 2 1
21 two 1 6 3 4
22 two 1 2 1 2
23 two 3 2 1 2
24 two 1 2 2 1
25 two 2 3 1 3
26 two 1 5 3 3
27 two 2 4 1 4
28 two 2 6 2 4
29 two 1 6 2 2
30 two 1 5 1 4
31 two 2 1 2 4
32 two 3 6 1 1
33 two 1 1 3 1
34 two 2 4 2 3
35 two 2 1 3 2
36 two 2 3 1 3
37 two 1 2 3 4
38 two 3 5 2 2
39 two 2 1 3 4
40 two 2 1 1 3
Note that the scores have different ranges of values. Score 1 ranges from 1-3, score 2 from -6, score 3 from 1-3, score 4 from 1-4
I'm trying to reshape data like this:
library(reshape2)
dfMelt <- melt(df, id.vars="subject")
acast(dfMelt, subject ~ value ~ variable)
Aggregation function missing: defaulting to length
, , score1
1 2 3 4 5 6
one 6 7 7 0 0 0
two 8 9 3 0 0 0
, , score2
1 2 3 4 5 6
one 0 5 3 4 6 2
two 5 4 2 2 3 4
, , score3
1 2 3 4 5 6
one 10 7 3 0 0 0
two 8 6 6 0 0 0
, , score4
1 2 3 4 5 6
one 3 6 7 4 0 0
two 3 5 5 7 0 0
Note that the output array includes scores as "0" if they are missing. Is there any way to stop these missing scores being outputted by acast?
In this case, you might do better sticking to base R's table feature. I'm not sure that you can have an irregular array like you are looking for.
For example:
> lapply(df[-1], function(x) table(df[[1]], x))
$score1
x
1 2 3
one 9 6 5
two 11 4 5
$score2
x
1 2 3 4 5 6
one 2 5 4 3 3 3
two 4 2 2 3 4 5
$score3
x
1 2 3
one 9 5 6
two 4 11 5
$score4
x
1 2 3 4
one 4 4 8 4
two 2 6 5 7
Or, using your "long" data:
with(dfMelt, by(dfMelt, variable,
FUN = function(x) table(x[["subject"]], x[["value"]])))
Since each "score" subset is going to have a different shape, you will not be able to preserve the array structure. One option is to use lists of two-dim arrays or data.frames. eg:
# your original acast call
res <- acast(dfMelt, subject ~ value ~ variable)
# remove any columns that are all zero
apply(res, 3, function(x) x[, apply(x, 2, sum)!=0] )
Which gives:
$score1
1 2 3
one 7 8 5
two 6 8 6
$score2
1 2 3 4 5 6
one 4 2 6 4 1 3
two 2 5 3 4 3 3
$score3
1 2 3
one 5 10 5
two 5 11 4
$score4
1 2 3 4
one 5 4 4 7
two 4 6 6 4

Resources