Conditional array to calculate percentiles

Conditional array to calculate percentiles - arrays

I have some data as follows:
val crit perc
0.415605498 1 perc1
0.475426007 1 perc1
0.418621318 1 perc1
0.51608229 1 perc1
0.452307882 1 perc1
0.496691416 1 perc1
0.402689126 1 perc1
0.494381345 1 perc1
0.532406777 1 perc1
0.839352016 2 perc2
0.618221702 2 perc2
0.83947033 2 perc2
0.621734007 2 perc2
0.548656662 2 perc2
0.711919796 2 perc2
0.758178085 2 perc2
0.820954467 2 perc2
0.478645786 2 perc2
0.848323655 2 perc2
0.844986383 2 perc2
0.418155292 2 perc2
1.182637063 3 perc3
1.248876472 3 perc3
1.218368809 3 perc3
0.664934398 3 perc3
0.951692853 3 perc3
0.848111264 3 perc3
0.58887439 3 perc3
0.931530464 3 perc3
0.676314176 3 perc3
1.270797783 3 perc3
I'm trying to use the percentile.inc() function to calculate the 5th percentile for each level of crit (since I have categorized the variable var into classes).
I've tried to use {=PERCENTILE.INC(IF($B$2:$B$32=1,$A$2:$A$32,IF($B$2:$B$32=2,$A$2:$A$32,IF($B$2:$B$32=3,$A$2:$A$32,""))),0.05)} but all it does is calculate the percentile for the whole array and does not give me back the conditional percentiles.
Any help would be most welcome (and FYI, I've got to do this on 26000 rows with 20 levels of crit)!

This worked for me. I have the following layout:
And I used the following formula in G3:
=PERCENTILE.INC(IF(B:B=F3,A:A),0.05)
This is an Array formula, so enter with Ctrl+Shift+Enter.
Drag down as suited.

Related

Error: Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent

I have the following dataset about the choices of different car brands and their attributes. I would like to create a matrix based on each attribute of the cars.
RespNum Task Concept Make Exterior.Design Interior.design
1 100086500 1 1 3 2 3
2 100086500 1 2 1 3 2
3 100086500 1 3 4 1 1
4 100086500 1 4 0 0 0
5 100086500 2 1 1 3 2
6 100086500 2 2 5 1 3
Driving.performance Driving.attributes Comfort Practibility Safety
1 1 1 1 3 3
2 3 3 3 2 1
3 2 2 2 1 2
4 0 0 0 0 0
5 3 2 1 1 3
6 1 3 3 3 2
Quality Equipment Sustainability Economy Price Response
1 2 1 1 3 1 0
2 1 3 3 1 3 0
3 3 2 2 2 2 1
4 0 0 0 0 0 0
5 3 2 1 1 4 0
6 1 3 3 3 8 0
I am using the function:
Make = attribcoding(6,4,'Other')
The first input (6) is the number of levels, the second (4) is the column position in the dataset, and the last ('Other') is the name of the outside option. However, I get the following error message:
Error in dimnames(x) <- dn :
length of 'dimnames' [2] not equal to array extent

Nested for-loop: error variable already defined

I have a nested loop in Stata with four levels of foreach statements. With this loop, I am trying to create a new variable named strata that ranges from 1 to 40.
foreach x in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 {
foreach r in 1 2 3 4 5 {
foreach s in 1 2 {
foreach a in 1 2 3 4 {
gen strata= `x' if race==`r' & sex==`s' & age==`a'
}
}
}
}
I get an error :
"variable strata already defined"
Even with the error, the loop does assign strata = 1, but not the rest of the strata. All other cells are missing/empty.
Example data:
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(age sex race)
1 2 2
1 2 1
1 1 1
1 1 1
1 2 1
2 2 1
2 2 1
4 2 1
1 2 1
4 2 1
3 2 1
2 2 1
4 2 1
4 2 2
3 2 1
4 1 3
4 2 1
4 2 1
2 1 2
4 2 1
2 2 1
3 2 1
3 2 1
1 2 3
4 2 1
1 2 5
4 2 1
4 2 1
4 2 2
4 2 1
2 2 1
4 1 1
3 2 1
1 2 1
2 2 1
4 2 1
1 2 2
2 2 3
1 1 3
4 2 1
2 2 3
1 2 1
1 1 1
2 2 3
1 2 1
1 1 3
1 2 1
2 2 1
3 2 1
1 2 1
4 2 1
1 2 2
1 2 1
2 2 1
4 2 1
4 2 1
1 2 1
1 2 1
4 2 1
2 2 1
4 2 1
1 2 1
1 1 3
2 2 1
1 1 1
4 1 1
3 2 1
2 2 1
1 2 1
1 1 1
2 2 3
4 2 2
2 2 1
2 2 1
3 2 1
2 2 2
3 2 1
2 1 1
1 1 1
3 2 1
1 2 3
4 2 1
4 2 1
2 2 1
1 2 1
1 1 1
3 2 1
4 2 1
2 2 3
1 2 3
4 2 1
3 2 1
2 2 1
4 2 1
3 2 1
2 1 1
1 2 1
2 2 1
2 2 3
1 1 1
end
label values sex sex
label def sex 1 "male (1)", modify
label def sex 2 "female (2)", modify
label values race race
label def race 1 "non-Hispanic white (1)", modify
label def race 2 "black (2)", modify
label def race 3 "AAPI/other (3)", modify
label def race 5 "Hispanic (5)", modify

generate is for generating new variables. The second time your code reaches a generate statement, the code fails for the reason given.
One answer is that you need to generate your variable outside the loops and then replace inside.
For other reasons your code can be rewritten in stages.
First, integer sequences can be more easily and efficiently specified with forvalues, which can be abbreviated: I tend to write forval.
gen strata = .
forval x = 1/40 {
forval r = 1/5 {
forval s = 1/2 {
forval a = 1/4 {
replace strata = `x' if race==`r' & sex==`s' & age==`a'
}
}
}
}
Second, the code is flawed any way. Everything ends up as 40!
Third, you can do allocations much more directly, say by
gen strata = 8 * (race - 1) + 4 * (sex - 1) + age
This is a self-contained reproducible demonstration:
clear
set obs 5
gen race = _n
expand 2
bysort race : gen sex = _n
expand 4
bysort race sex : gen age = _n
gen strata = 8 * (race - 1) + 4 * (sex - 1) + age
isid strata
Clearly you can and should vary the recipe for a different preferred scheme.

How to remove reverse rows in a permutation matrix?

I'm looking for a quick way in MATLAB to do the following:
Given a permutation matrix of a vector, say [1, 2, 3], I would like to remove all duplicate reverse rows.
So the matrix P = perms([1, 2, 3])
3 2 1
3 1 2
2 3 1
2 1 3
1 3 2
1 2 3
becomes
3 2 1
3 1 2
2 3 1

You can noticed that, symetrically, the first element of each rows have to be bigger than the last one:
n = 4; %row size
x = perms(1:n) %all perms
p = x(x(:,1)>x(:,n),:) %non symetrical perms
Or you can noticed that the number of rows contained by the p matrix follows this OEIS sequence for each n and correspond to size(x,1)/2 so since perms output the permutation in reverse lexicographic order:
n = 4; %row size
x = perms(1:n) %all perms
p = x(1:size(x,1)/2,:) %non symetrical perms

You can use MATLAB's fliplr method to flip your array left to right, and then use ismember to find rows of P in the flipped version. At last, iterate all locations and select already found rows.
Here's some code (tested with Octave 5.2.0 and MATLAB Online):
a = [1, 2, 3];
P = perms(a)
% Where can row x be found in the left right flipped version of row x?
[~, Locb] = ismember(P, fliplr(P), 'rows');
% Set up logical vector to store indices to take from P.
n = length(Locb);
idx = true(n, 1);
% Iterate all locations and set already found row to false.
for I = 1:n
if (idx(I))
idx(Locb(I)) = false;
end
end
% Generate result matrix.
P_star = P(idx, :)
Your example:
P =
3 2 1
3 1 2
2 3 1
2 1 3
1 3 2
1 2 3
P_star =
3 2 1
3 1 2
2 3 1
Added 4 to the example:
P =
4 3 2 1
4 3 1 2
4 2 3 1
4 2 1 3
4 1 3 2
4 1 2 3
3 4 2 1
3 4 1 2
3 2 4 1
3 2 1 4
3 1 4 2
3 1 2 4
2 4 3 1
2 4 1 3
2 3 4 1
2 3 1 4
2 1 4 3
2 1 3 4
1 4 3 2
1 4 2 3
1 3 4 2
1 3 2 4
1 2 4 3
1 2 3 4
P_star =
4 3 2 1
4 3 1 2
4 2 3 1
4 2 1 3
4 1 3 2
4 1 2 3
3 4 2 1
3 4 1 2
3 2 4 1
3 1 4 2
2 4 3 1
2 3 4 1
As demanded in your question (at least from my understanding), rows are taken from top to bottom.

Here's another approach:
result = P(all(~triu(~pdist2(P,P(:,end:-1:1)))),:);
pdist computes the distance between rows of P and rows of P(:,end:-1:1).
~ negates the result, so that true corresponds to coincident pairs.
triu keeps only the upper triangular part of the matrix, so that only one of the two rows of the coincident pair will be removed.
~ negates back, so that true corresponds to non-coincident pairs.
all gives a row vector with true for rows that should be kept (because they do not coincide with any previous row).
This is used as a logical index to select rows of P.

Unique Columns Across an Array?

I have an array structured like so:
a = [1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 5 5 5 5;
1 1 1 1 2 2 2 2 2 2 2 1 1 1 1 2 2 3 3 1 1 1 2 3 4 4 4 1 1 1 1 2 2 3 3];
Pretty much, it's a 2 by n (I simplified my matrix in this question with reduced number of columns for simplicity's sake), no real pattern. I want to be able to find the unique number of columns. So in this simplified example, I can (but it'll take a while) count by hand and noticed that my unique matrix b is:
b= 1 1 2 2 2 3 3 3 3 4 5 5
1 2 1 2 3 1 2 3 4 1 2 3
In MATLAB, I can do something like
size(b,2)
To get the number of unique columns. In this example
size(b,2) = 12
My question is, how do I go from matrix a to matrix b so that I can do this computationally for very large n dimensional matrices that I have?

Use unique:
a = [1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 5 5 5 5;
1 1 1 1 2 2 2 2 2 2 2 1 1 1 1 2 2 3 3 1 1 1 2 3 4 4 4 1 1 1 1 2 2 3 3];
% Transpose to leverage the rows flag, then transpose back
b = unique(a.', 'rows').';
Which returns:
b =
1 1 2 2 2 3 3 3 3 4 5 5
1 2 1 2 3 1 2 3 4 1 2 3

Matching two matrices depending on diagonal elements of two matrices in matlab

Below are the two adjacency matrices.I have to find which row of matrix1 is correspond to which row in matrix2 depending on diagonal values.In below example
1st row=1st row(diagonal value=4)
2nd row=5th row(diagonal value=5)
3rd row=4th row(diagonal value=1)
4th row=2nd row(diagonal value=3)
5th row=3rd row(diagonal value=2)
4 4 1 3 2
4 5 1 3 2
1 1 1 1 1
3 3 1 3 2
2 2 1 2 2
4 3 2 1 4
3 3 2 1 3
2 2 2 1 2
1 1 1 1 1
4 3 2 1 5
How it can be done in matlab?

Use the second output of ismember:
[~, result] = ismember(diag(matrix1), diag(matrix2))
In your example, this returns
result =
1
5
4
2
3

Assuming mat1 and mat2 to be the first and second matrices respectively and that you are looking to find the first match of diagonal values, try this -
[~,ind] = max(bsxfun(#eq,diag(mat2),diag(mat1)'))
or
[~,ind] = max(bsxfun(#eq,diag(mat1),diag(mat2)'),[],2)
If you are certain that there are always unique matches, you can use find too -
[ind,~] = find(bsxfun(#eq,diag(mat2),diag(mat1)'))