SAS Function that can create every possible combination - arrays

I have a dataset that looks like this.
data test;
input cat1 $ cat2 $ score;
datalines;
A D 1
A D 2
A E 3
A E 4
A F 4
B D 3
B D 2
B E 6
B E 5
B F 6
C D 8
C D 5
C E 4
C E 12
C E 2
C F 7
;
run;
I want to create tables based off of this table that are summarized forms of this data. For example, I want one table that sums every score for every cat1 and cat2 together, like so
proc sql;
create table all as select
'all' as cat1
,'all' as cat2
,sum(score) as score
from test
group by 1,2
;quit;
I want a table that sums all the scores for cat1='A', despite what cat2 is, like so
proc sql;
create table a_all as select
cat1
,'all' as cat2
,sum(score) as score
from test
where
cat1='A'
group by 1,2
;quit;
I want a table that sums the score for cat1='A' and cat2='E', like so
proc sql;
create table a_e as select
cat1
,cat2
,sum(score) as score
from test
where
cat1='A'
and
cat2='E'
group by 1,2
;quit;
And so on and so forth. I want a comprehensive set of tables that consists of every possible combination. I can use loops if they are efficient. The problem that the real data set I'm using has 8 categories (as opposed to the 2 here) and within those categories, there are as many as 98 levels. So the loops I've been writing have been nested 8 degrees and take up a ton of time. Pain to debug too.
Is there some kind of function or a special array I can apply that will create this series of tables I'm talking about? Thanks!

I think you want what PROC SUMMARY does by default.
data test;
input cat1 $ cat2 $ score;
datalines;
A D 1
A D 2
A E 3
A E 4
A F 4
B D 3
B D 2
B E 6
B E 5
B F 6
C D 8
C D 5
C E 4
C E 12
C E 2
C F 7
;
run;
proc print;
run;
proc summary data=test chartype;
class cat:;
output out=summary sum(score)=;
run;
proc print;
run;

Related

Google sheets - multiply two columns, copy third column

probably very simple question but yeah, dont know how of course.
I have 3 columns:
A B C
-------
5 1 A
5 2 B
5 3 C
and want to multiply column A and B, and just copy C column. All with one expresion.
So Result of multiplication A and B should be in D and result of copying should be in F:
Final result should be:
D F
---
5 A
10 B
15 C
Is there any simple method to do something like this?
this should do:
=INDEX({A1:A3*B1:B3, C1:C3})
or if you are on non-english sheet:
=INDEX({A1:A3*B1:B3\ C1:C3})

how to count classses in columns

I'm trying to make a query and i'm having a bad time with one thing. Suppose I have a table that looks like this:
id
Sample
Species
Quantity
Group
1
1
AA
5
A
2
1
AB
6
A
3
1
AC
10
A
4
1
CD
15
C
5
1
CE
20
C
6
1
DA
13
D
7
1
DB
7
D
8
1
EA
6
E
9
1
EF
4
E
10
1
EB
2
E
In the table I filter to have just 1 sample (but i have many), it has the species, the quantity of that species and a functional group (there are only five groups from A to E). I would like to make a query to group by the samples and make columns of the counts of the species of certain group, something like this:
Sample
N_especies
Group A
Group B
Group C
Group D
Group E
1
10
3
0
2
2
3
So i have to count the species (thats easy) but i don't know how to make the columns of a certain group, can anyone help me?
You can use PIVOT :
Select a.Sample,[A],[B],[C],[D],[E], [B]+[A]+[C]+[D]+[E] N_especies from
(select t.Sample,t.Grp from [WS_Database].[dbo].[test1] t) t
PIVOT (
COUNT(t.Grp)
for t.Grp in ([A],[B],[C],[D],[E])
) a

Rowmax as new column in data table

I have rank scores of countries for different variables.
I would like to create a column with the maximum rank that occurs per row.
Say the data look something like:
A B C D E F G H I ....
V1 1 4 5 3 12 . 6 9 83
V2 . . 4 6 1 4 7 6 32
So A - X are countries. In rows V1 up you have various variables and in the cells you have the rank score relating to the variable.
Problem is that some countries for whatever reasons don´t score in relation to certain variables, perhaps because V1 is not relevant to country C or whatever.
So in the end I´d like something like
A B C D E F G H I .... newv
V1 1 4 5 3 12 . 6 9 83 83
V2 . . 4 6 1 4 7 6 5 6
I think egen newvar=rowmax(A B C D E F G H I…) does what you need. Have a look at the egen help file for more information. (I presume you need value 7 in the second row, not 6?)

Hash Table: Which is the right linear-probing array?

I am studying data structures right now and in specific Hash Tables. I came across the follow question:
Imagine that we have placed the following keys
in an initial empty hash table with a length of 7
with linear probing, using the following table of hash-values:
key: A B C D E F G
hash: 3 1 4 1 5 2 5
Which of the following arrays could be the linear-probing array?
1.
0 1 2 3 4 5 6
G B D F A C E
2.
0 1 2 3 4 5 6
B G D F A C E
3.
0 1 2 3 4 5 6
E G F A B C D
When I create the linear-probing array I get this:
0 1 2 3 4 5 6
G B D A C E F
Could somebody please tell me why I am wrong and whats the right answer?
Notice how the question doesn't specify the order in which the keys are inserted, so your answer is only correct assuming that the keys are actually inserted in the order A-B-C-D-E-F-G, but since the question doesn't explicitly state the order, you need to dig deeper.
What you do know, however, is that one of those keys will be inserted first and it will go to its designated slot as shown in the Key-to-Hash diagram, since the hash table is initially empty. This immediately discards option choice 2 because none of the keys are in their designated array entry, leaving you with choice 1 and 3.
For table 1, B is in slot 1, which corresponds to its hash value and for table 3, keys F and A are in their initial hash-value spots.
It's simple to prove that no sequence of key inserts on table 3 after inserting F and A will yield table 3 as a result. And its likewise easy to prove that the sequence of key inserts B-D-F-A-C-E-G will result in table 1.
Although this is a question based on hash tables, I honestly don't consider it a good way to assess your knowledge on linear probing, this is more of a puzzle, as #gnasher729 mentioned.

create index in SAS using do loop

Say I have a set of data in this format:
ID Product account open date
1 A 20100101
1 B 20100103
2 C 20100104
2 A 20100205
2 D 20100605
3 A 20100101
And I want to create a column to capture the sequence of the products opened so the table will look like this:
ID First Second third
1 A B
2 C A D
3 A
I know I need to create an index for each ID so I can transpose the data afterwards:
ID Product account open date sequence
1 A 20100101 1
1 B 20100103 2
2 C 20100104 1
2 A 20100205 2
2 D 20100605 3
3 A 20100101 1
From my limited knowledge in do loop, I think I need to write something like this:
if first.ID and not last.ID then n=1 do while ID not last n+1
Something like that. Can anyone help me with the exact syntax? I have tried googling for similar codes and haven't had much luck.
Thanks!
I'd sort by ID and then date and use proc transpose for simplicity. Here's an example:
data prod;
input ID Product $ Open_DT :yymmdd8.;
format open_dt date9.;
datalines;
1 A 20100101
1 B 20100103
2 C 20100104
2 A 20100205
2 D 20100605
3 A 20100101
;
run;
proc sort data=prod;
by ID Open_DT;run;
proc transpose data=prod
out=prod_trans(drop=_name_)
prefix=ITEM;
by id;
var Product;
run;
proc print data=prod_trans noobs;
run;

Resources