Table sort and lookup - arrays

I have an excel table (25x25) which looks like this,
C1 C2 C3
R1 5 6 7
R2 1 7 9
R3 2 3 0
my goal is to make it look like this,
C3 R3 0
C1 R2 1
C1 R3 2
C2 R3 3
C1 R1 5
C2 R1 6
C2 R2 7
C3 R1 7
C3 R2 9
It generates a new table ranked by the values in the first. It also tells the corresponding column and row name.The table has duplicates, negatives and decimals.
I'm doing this because I'd like to find the 3 closest candidates (and hence the C's and R's) of a given value. And VLOOKUP() requires a sorted table.
Another problem (a step forward) is that VLOOKUP() returns the closest smaller value instead of actually the smallest. Is there a better way to do it or a workaround? So that the result is a neat table like such,
Value to look up = 2.8
>> C2 R3 3
>> C1 R3 2
>> C1 R1 5
For some reasons I cannot use VBA for this project. Any solutions with just built-in functions in MS Excel?

If you need to use only native worksheet functions, this can be accomplished; even without array formulas.
        
With your original data in A1:D4, the formulas in F3:H3 are,
=INDEX(B$1:D$1, AGGREGATE(15, 6, COLUMN($A:$C)/(B$2:D$4=H3), COUNTIF(H$3:H3, H3)))
=INDEX(A$2:A$4, AGGREGATE(15, 6, ROW($1:$3)/(B$2:D$4=H3), COUNTIF(H$3:H3, H3)))
=SMALL(B$2:D$4,ROW(1:1))
Fill down as necessary.
The formulas in K5:N5 are,
=INDEX(B$1:D$1, AGGREGATE(15, 6, COLUMN($A:$C)/(B$2:D$4=M5), COUNTIF(M$5:M5, M5)))
=INDEX(A$2:A$4, AGGREGATE(15, 6, ROW($1:$3)/(B$2:D$4=H3), COUNTIF(M$5:M5, M5)))
=IF(COUNTIF($B$2:$D$4, N5+$K$2)>=COUNTIF(N$5:N5, N5), N5+$K$2, $K$2-N5)
=AGGREGATE(15,6,ABS($B$2:$D$4-$K$2),ROW(1:1))
Fill down as necessary.
I've included enough rows in the K5:N13 matrix that you can see how the two 7 values are handled.

Related

Create a list of combinations of all values from one column with all values from another in Google Sheets

Is there any formulaic way of recursively working through one column and combining it with values from another? For illustration, given...
Column A:
A
B
C
and
Column B:
1
2
3
can I generate...
Column C:
A1
A2
A3
B1
B2
B3
C1
C2
C3
try:
=ARRAYFORMULA(SORT(
TRANSPOSE(SPLIT(REPT(CONCATENATE(A1:A&CHAR(9)), COUNTA(B1:B)), CHAR(9)))&
TRANSPOSE(SPLIT(CONCATENATE(REPT(B1:B&CHAR(9), COUNTA(A1:A))), CHAR(9)))))

How to select the values greater than the mean in an array?

I want to apply feature selection on a dataset (lung.mat)
After loading the data, I computed the mean of distances between each feature with others by Jaccard measure. Then I sorted the distances descendingly in B1. And then I selected for example 25 number of all the features and saved the matrix in databs1.
I want to select the features that have distance values greater than the mean of the array (B1).
close all;
clc
load lung.mat
data=lung;
[n,m]=size(data);
for i=1:m-1
for j=i+1:m
t1(i,j)=fjaccard(data(:,i),data(:,j));
b1=sum(t1)/(m-1);
end
end
[B1,indB1]=sort(b1,'descend');
databs1=data(:,indB1(1:25));
databs1=[databs1,data(:,m)]; %jaccard
save('databs1.mat');
I’ll be grateful to have your opinions about how to define this in B1, selecting values of B1 which are greater than the mean of the array B1, It means cutting the rest of smaller values than the mean of B1.
I used this line,
B1(B1>mean(B1(:)))
after running, B1 still has the full number of features(column) equal to the full dataset, for example, lung.mat has 57 features and B1 by this line still has 57 columns,
I considered that by this line B1 will be cut to the number of features that are greater than the mean of B1.
the general answer to your question is here (this seems clear to you based on your code):
a=randi(10,1,10) %example data
a>mean(a) %get binary matrix of which elements are larger than mean
a(a>mean(a)) %select elements from a that are larger than mean
a =
1 9 10 7 8 8 4 7 2 8
ans =
1×10 logical array
0 1 1 1 1 1 0 1 0 1
ans =
9 10 7 8 8 7 8

Summing up multiple variable scores depending on their score

tl;dr: I need to first dichotomize a set of variables to 0/1, then sum up these values. I need to do this for 14x8 variables, so I am looking for a way to to this in a loop.
Hi guys,
I have a very specific problem I need your help with:
Description of problem:
In my dataset I have 14 sets of 8 variables each (e.g. a1 to a8, b1 to b8, c1 to c8, etc.) with scores ranging from 1 to 6. Note that the variables are non-contiguous, with string variables in between them (which I need for a different purpose).
I know want to compute scores for each set of these variables (e.g. scoreA, scoreB, scoreC). The score should be computed according the following rule:
scoreA = 0.
If a1 > 1 then increment scoreA by 1.
If a2 > 1 then increment scoreA by 1.
... etc.
Example:
Dataset:
1 5 6 3 2 1 1 5
1 1 1 3 4 6 2 3
scores:
5
5
My previous attempts:
I know I could do this task by first recoding the variables to dichotomize them, and then sum up these values. This has two large drawbacks for me: Firstly it creates a lot of new variables which I don't need. Secondly it is a very tedious and repetitive task since I have multiple sets of variables (which have different variable names) with which I need to do the same task.
I took a look at the DO REPEAT and LOOP with VECTOR commands, but I seem to not fully understand how they work. I was not able to transfer solutions from other examples I read online to my problem.
I would be happy with a solution that only loops through one set of variables and does the task, then I would adjust the syntax appropriately for my other 13 sets of variables. Hope you can help me out.
See two solutions: one loops over each of the sets, the second is a macro which loops over a list of sets:
* creating some sample data.
DATA LIST list/a1 to a8 b1 to b8 c1 to c8 hello1 to hello8.
BEGIN DATA
1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 2 1 1 1 1 1 3 3 3 1 1 1 1 4 4 4 4
1 1 1 1 2 3 4 5 1 1 1 2 3 4 1 0 0 0 0 0 1 2 1 2 3 2 1 2 3 2 1 6
END DATA.
* solution 1: a loop for each set (example for sets a, b and c).
compute scoreA=0.
compute scoreB=0.
compute scoreC=0.
do repeat
a=a1 a2 a3 a4 a5 a6 a7 a8
/b=b1 b2 b3 b4 b5 b6 b7 b8
/c=c1 c2 c3 c4 c5 c6 c7 c8./* if variable names are consecutive replace with "a1 to a8" etc'.
compute scoreA=scoreA+(a>1).
compute scoreB=scoreB+(b>1).
compute scoreC=scoreC+(c>1).
end repeat.
execute.
Doing this for 14 different sets is no fun, so assuming your sets are always named $1 to $8, you can use the following macro:
define DoSets (SetList=!cmdend)
!do !set !in (!SetList)
compute !concat("Score_",!set)=0.
do repeat !set=!concat(!set,"1") !concat(!set,"2") !concat(!set,"3") !concat(!set,"4") !concat(!set,"5") !concat(!set,"6") !concat(!set,"7") !concat(!set,"8").
compute !concat("Score_",!set)=!concat("Score_",!set)+(!set>1).
end repeat.
!doend
execute.
!enddefine.
* now call the macro and list all set names.
DoSets SetList= a b c hello.
The do repeat loop above works perfectly, but with a lot of sets of variables, it would be tedious to create. Using Python programmability, this can be generated automatically without regard to the variable order. The code below assumes an unlimited number of variables with names of the form lowercase letter digit that occur in sets of 8 and generates and runs the do repeat. For simplicity it generates one loop for each output variable, but these will all be executed on a single data pass. If the name pattern is different, this code could be adjusted if you say what it is.
begin program.
import spss, spssaux
vars = sorted(spssaux.VariableDict(pattern="[a-z]\d").variables)
cmd = """compute %(score)s = 0.
do repeat index = %(vlist)s.
compute %(score)s = %(score)s + (index > 1).
end repeat."""
if len(vars) % 8 != 0:
raise ValueError("Number of input variables not a multiple of 8")
for v in range(0, len(vars),8):
score = "score" + vars[v][0]
vlist = " ".join(vars[v:v+8])
spss.Submit(cmd % locals())
end program.
execute.

Switching row-major to column-major dimensions

I am putting into R a row-major data as a vector. R interprets this as column-major data and as far as I can see there is no way to tell array to behave in a row-major way.
Let's say I have:
array(1:12, c(3,2,2),
dimnames=list(c("r1", "r2", "r3"), c("c1", "c2"),c("t1", "t2"))
)
Which gives:
, , t1
c1 c2
r1 1 4
r2 2 5
r3 3 6
, , t2
c1 c2
r1 7 10
r2 8 11
r3 9 12
I want to transform this data to row-major array:
, , t1
c1 c2
r1 1 2
r2 3 4
r3 5 6
, , t2
c1 c2
r1 7 8
r2 9 10
r3 11 12
Assuming that your array is in a, i.e. that you already have this array and can't change it at read time, then the following will work:
a <- array(1:12, c(3,2,2),
dimnames=list(c("r1", "r2", "r3"), c("c1", "c2"),c("t1", "t2")))
b <- aperm(array(a, dim = c(2,3,2),
dimnames = dimnames(a)[2:1]),
perm = c(2,1,3))
b
> b
, , 1
c1 c2
r1 1 2
r2 3 4
r3 5 6
, , 2
c1 c2
r1 7 8
r2 9 10
r3 11 12
The solution:
aperm(array(1:12, c(2,3,2),
dimnames=list(c("c1","c2"),c("r1","r2","r3"),c("t1","t2"))),
perm=c(2,1,3)
)
Note that aperm switches the dimensions. So essentially columns are switched with rows. In addition I needed to change the order of columns and rows in dimnames.
It produces exactly what is needed:
, , t1
c1 c2
r1 1 2
r2 3 4
r3 5 6
, , t2
c1 c2
r1 7 8
r2 9 10
r3 11 12

Need a formula for cumulative moving averages in Open Office

I have data in column A, and would like to put the averages in column B like this:
a b
1 10 10
2 7 8.5
3 8 8.333
4 19 11
5 13 11.5
where b1 =average(a1), b2 =average(a1:a2), b3 =average(a1:a3)....
Using average() is alright for small amounts of data, but I have over 1500 data entries. I would like to find a more efficient way of doing this.
Make your initial range reference absolute, while the other is relative, i.e.:
b4 = average($a$1:a4)
You can paste that 1500 times an it will always increment the end of the range while keeping the beginning pinned to A1 due to the dollar signs in that reference.

Resources