Required help in building a crosstab query - PostgreSQL - database

I have two tables.
Table1:
Label Date CT
A 2014-01-01 19
A 2014-02-01 10
A 2014-03-01 19
A 2014-04-01 18
B 2014-01-01 20
B 2014-02-01 16
B 2014-03-01 14
B 2014-04-01 16
C 2014-01-01 13
C 2014-02-01 12
C 2014-03-01 19
C 2014-04-01 14
Table2 :
Label Date CT
D 2014-01-01 19
D 2014-02-01 10
D 2014-03-01 19
D 2014-04-01 18
E 2014-01-01 20
E 2014-02-01 16
E 2014-03-01 14
E 2014-04-01 16
F 2014-01-01 13
F 2014-02-01 12
F 2014-03-01 19
F 2014-04-01 14
Desired Output :
Label Jan'14 Feb'14 Mar'14 Apr'14 Total
A 19 10 19 18 66
B 20 16 14 16 66
C 13 12 19 14 58
D 19 10 19 18 66
E 20 16 14 16 66
F 13 12 19 14 58
I'm new to PostgreSQL.
I wanted to take the unique values of Label column from both the table.
And produce the sum total of count to their respective label.
I can combine both the tables in a straight forward method using UNION ALL.
But that'll not give me the desired output or the view like a pivot.
I did google on this but nothing could help me out.
Came across this in SO. And I'm still trying on with it.
But I actually don't have a clue whether it can be done or not.
Can someone help me in getting the desired output.
Thanks in advance!!

Try Like This
select *,("Jan ''14" + "Feb ''14" + "Mar ''14" +"Apr ''14") as total
from crosstab($$
select id,to_char(da,'Mon ''yy') as tt,no from t2
union all
select id,to_char(da,'Mon ''yy') as tt,no from "T1"
$$,$$values ('Jan ''14'), ('Feb ''14'),('Mar ''14'),('Apr ''14') $$) as at
(id text, "Jan ''14" integer,"Feb ''14" integer,"Mar ''14" integer,
"Apr ''14" integer) order by id

Related

How to Return Bin Ranges of multiple rows to a New Column with Calculated Bin Values in the Same DataFrame

I have this calculated dataset
calcDF
properties bincount
a 20
b 6
c 15
d 2
e 9
Then I have this main dataframe with the values
rawDF
a b c d e
10 26 49 18 22
21 11 29 56 14
35 41 23 33 3
51 8 12 20 11
27 10 4 29 5
I'm desperately needing a suggestion on how i can output the ranges as a column in calDF
My expected result is below
properties bincount properties
a 20 (2,5), (5,8),(8,11)
b 6 (1,2), (2,3)
c 15 (50,70), (70,90), (90, 110), (110, 130)
d 2 (5,10), (10, 15)
e 9 (90, 110), (110, 130)

Issues Regarding SAS

I was working on a homework problem regarding using arrays and looping to create a new variable to identify the date of when the maximum blood lead value was obtained but got stuck. For context, here is the homework problem:
In 1990 a study was done on the blood lead levels of children in Boston. The following variables for twenty-five children from the study have been entered on multiple lines per subject in the file lead_sum2018.txt in a list format:
Line 1
ID Number (numeric, values 1-25)
Date of Birth (mmddyy8. format)
Day of Blood Sample 1 (numeric, initial possible range: -9 to 31)
Month of Blood Sample 1 (numeric, initial possible range: -9 to 12)
Line 2
ID Number (numeric, values 1-25)
Day of Blood Sample 2 (numeric, initial possible range: -9 to 31)
Month of Blood Sample 2 (numeric, initial possible range: -9 to 12)
Line 3
ID Number (numeric, values 1-25)
Day of Blood Sample 3 (numeric, initial possible range: -9 to 31)
Month of Blood Sample 3 (numeric, initial possible range: -9 to 12)
Line 4
ID Number (numeric, values 1-25)
Blood Lead Level Sample 1 (numeric, possible range: 0.01 – 20.00)
Blood Lead Level Sample 2 (numeric, possible range: 0.01 – 20.00)
Blood Lead Level Sample 3 (numeric, possible range: 0.01 – 20.00)
Sex (character, ‘M’ or ‘F’)
All blood samples were drawn in 1990. However, during data entry the order of blood samples was scrambled so that the first blood sample in the data file (blood sample 1) may not correspond to the first blood sample taken on a subject, it could be the first, second or third. In addition, some of the months and days and days of blood sampling were not written on the forms. At data entry, missing month and missing day values were each coded as -9.
The team of investigators for this project has made the following decisions regarding the missing values. Any missing days are to set equal to 15, any missing months are to be set equal to 6. Any analyses that are done on this data set need to follow those decisions. Be sure to implement the SAS syntax as indicated for each question. For example, use SAS arrays and loops if the item states that these must be used.
Here is the data that the HW references (it is in list format and was contained in a separate file called lead_sum2018.txt):
1 04/30/78 6 10
1 -9 7
1 14 1
1 1.62 1.35 1.47 F
2 05/19/79 27 11
2 20 -9
2 5 6
2 1.71 1.31 1.76 F
3 01/03/80 11 7
3 6 6
3 27 2
3 3.24 3.4 3.83 M
4 08/01/80 5 12
4 28 -9
4 3 4
4 3.1 3.69 3.27 M
5 12/26/80 21 5
5 3 7
5 -9 12
5 4.35 4.79 5.14 M
6 06/20/81 7 10
6 11 3
6 22 1
6 1.24 1.16 0.71 F
7 06/22/81 19 6
7 3 12
7 29 8
7 3.1 3.21 3.58 F
8 05/24/82 26 7
8 31 1
8 9 10
8 2.99 2.37 2.4 M
9 10/11/82 2 7
9 25 5
9 28 3
9 2.4 1.96 2.71 F
10 . 10 8
10 30 12
10 28 2
10 2.72 2.87 1.97 F
11 11/16/83 19 4
11 15 11
11 7 -9
11 4.8 4.5 4.96 M
12 03/02/84 17 6
12 11 2
12 17 11
12 2.38 2.6 2.88 F
13 04/19/84 2 12
13 -9 6
13 1 7
13 1.99 1.20 1.21 M
14 02/07/85 4 5
14 17 5
14 21 11
14 1.61 1.93 2.32 F
15 07/06/85 5 2
15 16 1
15 14 6
15 3.93 4 4.08 M
16 09/10/85 12 10
16 11 -9
16 23 6
16 3.29 2.88 2.97 M
17 11/05/85 12 7
17 18 1
17 11 11
17 1.31 0.98 1.04 F
18 12/07/85 16 2
18 18 4
18 -9 6
18 2.56 2.78 2.88 M
19 03/02/86 19 4
19 11 3
19 19 2
19 0.79 0.68 0.72 M
20 08/19/86 21 5
20 15 12
20 -9 4
20 0.66 1.15 1.42 F
21 02/22/87 16 12
21 17 9
21 13 4
21 2.92 3.27 3.23 M
22 10/11/87 7 6
22 1 12
22 -9 3
22 1.43 1.42 1.78 F
23 05/12/88 12 2
23 21 4
23 17 12
23 0.55 0.89 1.38 M
24 08/07/88 17 6
24 27 11
24 6 2
24 0.31 0.42 0.15 F
25 01/12/89 4 7
25 15 -9
25 23 1
25 1.69 1.58 1.53 M
A) Input the data and in the data step:
1) make sure that Date of Birth variable is recorded as a SAS date;
2) use SAS arrays and looping to create a SAS date variable for each of the three blood samples and to address the missing data in accordance to the decisions of the investigators. Hint: use a single array and do loop to recode the missing values for day and month, separately, and an array/do loop for creating the SAS date variable;
3) use a SAS function to create a variable for the highest, i.e., maximum, blood lead value for each child;
4) use SAS arrays and looping to identify the date on which this largest value was obtained and create a new variable for the date of the largest blood lead value;
5) determine the age of the child in years when the largest blood lead value was obtained (rounded to two decimal places);
6) create a new variable based on the age of the child in years when the largest lead value was obtained (call it, “agecat”) that takes on three levels: for children less than 4 years old, agecat should equal 1; for children at least 4 years old, but less than 8, agecat should equal 2; and for children at least 8 years of age, agecat should be 3.;
7) print out the variables for the date of birth, date of the largest lead level, age at blood sample for the largest blood lead level, agecat, sex, and the largest blood lead level (Only print out these requested variables). All dates should be formatted to use the mmddyy10. format on the output.
The code I used in response to this was:
libname HW3 'C:\Users\johns\Desktop\SAS';
filename HW3new 'C:\Users\johns\Desktop\SAS\lead_sum2018.txt';
data one;
infile HW3new;
informat dob mmddyy8.;
input #1 id dob dbs1 mbs1
#2 dbs2 mbs2
#3 dbs3 mbs3
#4 bls1 bls2 bls3 sex;
array dbs{3} dbs1 dbs2 dbs3;
array mbs{3} mbs1 mbs2 mbs3;
do i=1 to 3;
if dbs{i}=-9 then dbs{i}=15;
end;
do i=4 to 6;
if mbs{i}=-9 then mbs{i}=6;
end;
array date{3} mdy1 mdy2 mdy3;
do i=1 to 3;
date{i}=mdy(mbs{i}, dbs{i}, 1990);
end;
maxbls=max(of bls1-bls3);
array bls{3} bls1 bls2 bls3;
array maxdte{3} maxdte1 maxdte2 maxdte3;
do i=1 to i=3;
if bls{i}=maxbls then maxdte=i;
end;
agemax=maxdte-dob;
ageest=round(agemax/365.25,2);
if agemax=. then agecat=.;
else if agemax < 4 then agecat=1;
else if 4 <= agemax < 8 then agecat=2;
else if agemax ge 8 then agecat=3;
run;
I received this error:
22 maxbls=max(of bls1-bls3);
23 array bls{3} bls1 bls2 bls3;
24 array maxdte{3} maxdte1 maxdte2 maxdte3;
25 do i=1 to i=3;
26 if bls{i}=maxbls then maxdte=i;
ERROR: Illegal reference to the array maxdte.
27 end;
Does anyone have any tip is regards to this issue? What did I do wrong? Was I supposed to create an additional array for the date of when the maximum blood lead sample value was collected? Thanks!
**I'm stuck on #4 of Part A, but I included the other parts for context. Thanks!
**Edits: I included the data that I had to read into SAS and the file name of the file it came from
Just from looking at the code immediately prior to the error, you have a problem on this line:
26 if bls{i}=maxbls then maxdte=i;
You are getting the error because you are attempting to assign a value to the array maxdte. Arrays cannot be assigned values like that (unless you are using the deprecated do over syntax...) Instead, choose an element of the array and assign the value to the element. E.g. you could do:
26 if bls{i}=maxbls then maxdte{1}=i;
Or instead of a literal 1, you could use a variable containing the relevant array index.
You are not properly handling ID field from lines #2-4
input #1 id dob dbs1 mbs1
#2 dbs2 mbs2
#3 dbs3 mbs3
#4 bls1 bls2 bls3 sex;
For example you need to skip field 1 on line 2-3 or read the ids into array perhaps to check they are all the same.
input #1 id dob dbs1 mbs1
#2 id2 dbs2 mbs2
#3 id3 dbs3 mbs3
#4 id4 bls1 bls2 bls3 sex;
This example show how to check that you have 4 lines with the same ID and if you do read the rest of the variables or execute LOSTCARD. ID 3 has a missing record;
353 data ex;
354 infile cards n=4 stopover;
355 input #1 id #2 id2 #3 id3 #4 id4 #;
356 if id eq id2 eq id3 eq id4
357 then input #1 id dob:mmddyy. dbs1 mbs1
358 #2 id2 dbs2 mbs2
359 #3 id3 dbs3 mbs3
360 #4 id4 bls1 bls2 bls3 sex :$1.;
361 else lostcard;
362 format dob mmddyy.;
363 cards;
NOTE: LOST CARD.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
372 3 01/03/80 11 7
373 3 27 2
374 3 3.24 3.4 3.83 M
375 4 08/01/80 5 12
NOTE: LOST CARD.
376 4 28 -9
NOTE: LOST CARD.
377 4 3 4
NOTE: The data set WORK.EX has 3 observations and 15 variables.
data ex;
infile cards n=4 stopover;
input #1 id #2 id2 #3 id3 #4 id4 #;
if id eq id2 eq id3 eq id4
then input #1 id dob:mmddyy. dbs1 mbs1
#2 id2 dbs2 mbs2
#3 id3 dbs3 mbs3
#4 id4 bls1 bls2 bls3 sex :$1.;
else lostcard;
format dob mmddyy.;
cards;
1 04/30/78 6 10
1 -9 7
1 14 1
1 1.62 1.35 1.47 F
2 05/19/79 27 11
2 20 -9
2 5 6
2 1.71 1.31 1.76 F
3 01/03/80 11 7
3 27 2
3 3.24 3.4 3.83 M
4 08/01/80 5 12
4 28 -9
4 3 4
4 3.1 3.69 3.27 M
;;;;
run;
proc print;
run;

Sum of multiple variables by group

I have a dataset with over 900 observations, each observation represents the population of a sub-geographical area for a given year by gender (male, female, all) and 20 different age groups.
I have dropped the variable for the sub-geographical area and I want to collape into the greater geographical area (called Geo).
I am having a difficult time doing a SUM or PROC MEANS because I have so many age groups to sum up and I am trying to avoid writing them all out. I want to collapse across the group year, geo, sex so that I only have 3 observations per Geo (my raw data could have as many as 54 observations).
This is an example of what a tiny section of the raw data looks like:
Year Geo Sex Age0005 Age0610 Age1115 (etc)
2010 1 1 92 73 75
2010 1 2 57 81 69
2010 1 3 159 154 144
2010 1 1 41 38 43
2010 1 2 52 41 39
2010 1 3 93 79 82
2010 2 1 71 66 68
2010 2 2 63 64 70
2010 2 3 134 130 138
2010 2 1 32 35 34
2010 2 2 29 31 36
2010 2 3 61 66 70
This is how I want it to look:
Year Group Sex Age0005 Age0610 Age1115 (etc)
2010 1 1 133 111 118
2010 1 2 109 122 08
2010 1 3 252 233 226
2010 2 1 103 101 102
2010 2 2 92 95 106
2010 2 3 195 196 208
Any ideas? Please help!
You don't have to write out each variable name individually - there are ways of getting around that. E.g. if all of the age group variables that need to be summed up start with age then you can use a : wildcard to match them:
proc summary nway data = have;
var age:;
class year geo sex;
output out = want sum=;
run;
If your variables don't have a common prefix, but are all next to each other in one big horizontal group in your dataset, you can use a double dash list instead:
proc summary nway data = have;
var age005--age1115; /*Includes all variables between these two*/
class year geo sex;
output out = want sum=;
run;
Note also the use of sum= - this means that each summarised variable is reproduced with its original name in the output dataset.
I personally like to use proc sql for this, since it makes it very clear what you're summing and grouping by.
data old ;
input Year Geo Sex Age0005 Age0610 Age1115 ;
datalines;
2010 1 1 92 73 75
2010 1 2 57 81 69
2010 1 3 159 154 144
2010 1 1 41 38 43
2010 1 2 52 41 39
2010 1 3 93 79 82
2010 2 1 71 66 68
2010 2 2 63 64 70
2010 2 3 134 130 138
2010 2 1 32 35 34
2010 2 2 29 31 36
2010 2 3 61 66 70
;
run;
proc sql ;
create table new as select
year
, geo label = 'Group'
, sex
, sum(age0005) as age0005
, sum(age0610) as age0610
, sum(age1115) as age1115
from old
group by geo, year, sex ;
quit;

Average of Counts

I Have a table called totals and the data looks like:
ACC_ID Data_ID Mon Weeks Total_AR_Count Total_FR_Count Total_OP_Count
23 9 01/2011 4 172 251 194
42 9 01/2011 4 2 16 28
75 9 01/2011 4 33 316 346
75 9 07/2011 5 1 12 20
42 9 09/2011 5 25 758 25
I want the output to be as Average of all the counts grouped by ACC_ID and Data_ID:
ACC_ID Data_ID Avg_AR_Count Avg_FR_Count Avg_OP_Count
23 9 172 251 194
42 9 13.5 387 26.5
75 9 17 164 183
How can do this?
Your description of what you want just about writes the SQL:
SELECT ACC_ID, Data ID, AVG(Total_AR_Count) AS Avg_AR_Count, AVG(Total_FR_Count) AS Avg_FR_Count...
FROM table
GROUP BY ACC_ID, Data_ID

need hint with a custom Linux/UNIX command line utlity "cal" in C

Ok I need to make this program to display "cal" 3 month(one month before and one month after) side by side, rather than just one single month it displays in any Linux/UNIX. I got it working to display 3 calendar by using "system(customCommand)" three times; but then it's not side by side.
I got some hint to use the following system calls:
close(..) pipe(..) dup2(..) read(..) and write(..)
my question is what should I start with? Do I need to create child process and than catch it in pipe(..)?
How can I display three calendar side by side.
ex.
February 2009 March 2009 April 2009
S M Tu W Th F S S M Tu W Th F S S M Tu W Th F S
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4
8 9 10 11 12 13 14 8 9 10 11 12 13 14 5 6 7 8 9 10 11
15 16 17 18 19 20 21 15 16 17 18 19 20 21 12 13 14 15 16 17 18
22 23 24 25 26 27 28 22 23 24 25 26 27 28 19 20 21 22 23 24 25
29 30 31 26 27 28 29 30
Assuming you want to write it yourself instead of using "cal -3", what I'd do (in psuedo code):
popen three calls to "cal" with the appropriate args
while (at least one of the three pipes hasn't hit EOF yet)
{
read a line from the first if it isn't at EOF
pad the results out to a width W, print it
read a line from the second if it isn't at EOF
pad the results out to a width W, print it
read a line from the third if it isn't at EOF
print it
print "\n"
}
pclose all three.
if "cal -3" doesn't work, just use paste :)
$ TERM=linux setterm -regtabs 24
$ paste <(cal 2 2009) <(cal 3 2009) <(cal 4 2009)
febbraio 2009 marzo 2009 aprile 2009
do lu ma me gi ve sa do lu ma me gi ve sa do lu ma me gi ve sa
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4
8 9 10 11 12 13 14 8 9 10 11 12 13 14 5 6 7 8 9 10 11
15 16 17 18 19 20 21 15 16 17 18 19 20 21 12 13 14 15 16 17 18
22 23 24 25 26 27 28 22 23 24 25 26 27 28 19 20 21 22 23 24 25
29 30 31 26 27 28 29 30
$
(setterm ignores -regtabs unless TERM=linux or TERM=con.)
just do
cal -3
Does this not work?
cal -3
Ok, how about cal -3?
cal -3 12 2120 to make it a special month and year, with one before and one after.
The approach I would use for this would be to capture the output, split it into lines, and printf the lines out next to each other. I'd probably do it in Perl, though, rather than C.
Or just use cal -3, if your cal has it.

Resources