ARRAY function in SAS after MEAN (by grouping) - arrays

There is some homework for SAS and I just can't seem to find the right way to do it. Hopefully, some of you will be able to help.
We start with a table where we have the following variables:
City State Temp January Temp Feb Temp Mar ... Temp Dec
First, we have to calculate the mean temperature (per month, so for 12 different variables) and per state (so there are always a few cities per state).
I used this code:
PROC SORT DATA=Homework;
BY state;
RUN;
PROC MEANS DATA=Homework;
VAR JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC;
BY State
OUTPUT OUT=MTSM (DROP=_type_ _freq_) MEAN=;
RUN;
My result is a table in which I have 53 rows (one per state) and 1 column per month (and a first column for the states of cours). Something like this:
State JAN FEB ... DEC
State1 xjan xfeb ... xdec
State2
...
State53
Now I need to use an Array statement to make a new table in long format:
State Month Mean_temp
State1 JAN xjan
state1 FEB xfeb
. MAR ...
. APR ...
. ... ...
State1 DEC xdec
State 2 JAN ...
...
DEC
...
State53 JAN
FEB
...
Does someone have an idea of how to do this? I'm completely lost.
This is what I tried:
DATA MTSM2;
SET MTSM;
BY state;
ARRAY newvars {1} Mean_Temp;
ARRAY oldvars {1, 12} JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC;
DO Month = JAN to DEC;
DO k=1;
newvars{k} = oldvars{k, Month};
END;
OUTPUT;
END;
KEEP state Month Mean_Temp;
RUN;
I got following error: ERROR: Array subscript out of range at line 30 column 22 :'(
What am I doing wrong? I have been changing this in many ways, but always get the same error.
Thanks in advance!

You can get the table you want by using a more specific output statement in proc means/proc summary:
/*Generate some dummy data*/
data have;
call streaminit(1);
do j = 1 to 10;
do state = 'a', 'b', 'c';
array months[12] m1-m12;
do i = 1 to dim(months);
months[i] = rand('uniform');
end;
output;
end;
end;
drop i j;
run;
proc summary nway data = have;
var m1-m12;
class state;
output out = want(drop = _TYPE_ _FREQ_) mean=;
run;

You are very close.
There is no need to use ARRAY for the new variable since it is just one. There is no need to tell SAS how many variables there are in the array when you have listed the actual variable names. And arrays are indexed by integers, not strings. You can use the VNAME() function to find the name of the variable addressed by the index into the array. The BY statement is not needed.
DATA MTSM2;
SET MTSM;
ARRAY oldvars JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC;
length month $32 mean_temp 8;
DO month_number = 1 to 12 ;
month=vname( oldvars[month_number] );
mean_temp = oldvars[month_number] ;
OUTPUT;
END;
KEEP state Month Mean_Temp;
RUN;

If the homework is to pivot the data, using ARRAY, from the categorically organized layout (state/month/mean) to a wide layout (state/month-1...month-12) you can use BY processing and index determination to fill an array.
Essentially for each BY group there will be one row output.
One way is to use a DOW loop in which the SET statement is inside an explicit loop.
data want(keep=state jan--dec);
do until (last.state);
set have;
by state;
array months jan feb mar apr may jun jul aug sep oct nov dec;
index = (index('JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC', trim(month))+2)/3;
months(index) = mean;
end;
run;
If the data is known to have every month, the index 'lookup' is not needed and can be retrieved directly from the do loop index variable:
data want(keep=state jan--dec);
do _n_ = 1 by 1 until (last.state); /* repurpose _n_ */
set have;
by state;
array months jan feb mar apr may jun jul aug sep oct nov dec;
months(_n_) = mean;
end;
run;
Update
Using array to pivot data from an across layout to a down layout. Iterate a loop over the array elements and output name/value pairs within the loop.
data want (keep=state month percent);
set have;
array months jan feb mar apr may jun jul aug sep oct nov dec;
do _n_ = 1 to dim(months);
month = vname(months(_n_)); /* name */
percent = months(_n_); /* value */
OUTPUT;
end;
run;
Proc TRANSPOSE can perform the same data transformation.
Array based pivoting is very useful when you want to transpose two or more arrays at the same time. An example would be if you had variables
jan_percent to dec_percent and
jan_rating to dec_rating
that you wanted to pivot into a data form of month/percent/rating. Such a transformation with TRANSPOSE requires multiple proc steps (one per array).

Sounds like you simply want to use a Class Statement instead of a By Statement?

Related

I'm not able to change the following code into if-else condition in Neo4j

I'm new in Neo4j and trying to compare the ScoreFT[0] with ScoreFT[1] and get that value which is greater in Neo4j.I have tried once but it has not worked.I don't have idea to use order by and Limit in the last code(Return case).Please help me.
Round,Date,Team1,FT,HT,Team2
1,(Fri) 11 Aug 2017 (32),Arsenal FC,4-3,2-2,Leicester City FC
1,(Sat) 12 Aug 2017 (32),Brighton & Hove Albion FC,0-2,0-0,Manchester City FC
1,(Sat) 12 Aug 2017 (32),Chelsea FC,2-3,0-3,Burnley FC
1,(Sat) 12 Aug 2017 (32),Crystal Palace FC,0-3,0-2,Huddersfield Town AFC
1,(Sat) 12 Aug 2017 (32),Everton FC,1-0,1-0,Stoke City FC
1,(Sat) 12 Aug 2017 (32),Southampton FC,0-0,0-0,Swansea City AFC
1,(Sat) 12 Aug 2017 (32),Watford FC,3-3,2-1,Liverpool FC
1,(Sat) 12 Aug 2017 (32),West Bromwich Albion FC,1-0,1-0,AFC Bournemouth
1,(Sun) 13 Aug 2017 (32),Manchester United FC,4-0,1-0,West Ham United FC
1,(Sun) 13 Aug 2017 (32),Newcastle United FC,0-2,0-0,Tottenham Hotspur FC
Import query
Attempt with if/else
CASE will only work with a single value, but you're trying to use multiple values here based upon the comparison, and worse you're trying to get a single CASE to output multiple variables, which won't work.
It would be better to first get the values you want in a composite structure (either in 2-element lists or a map), and then use two CASEs to get the right value for each variable. Something like this, replacing your RETURN:
...
WITH [t1.key, ScoreFT[0]] as t1Score, [t2.key, ScoreFT[1]] as t2Score, ScoreFT[0] > ScoreFT[1] as t1Won
RETURN CASE WHEN t1Won THEN t1Score ELSE t2Score END as s,
CASE WHEN t1Won THEN t2Score ELSE t1Score END as p
If you want a map instead, you can create it explicitly instead of using the lists:
WITH t1 {.key, score:ScoreFT[0]} as t1Score, t2 {.key, score:ScoreFT[1]} as t2Score, ...

loop in sql to update record as per user

I have 2 tables in my sql server in 1st table I am saving official holidays(officialHolidays) and in second table I am saving Leave Applied by users(appliedLeave),
I stored all the saturday's and sundays of 2018 in officialHolidays Table
here is one scenerio
I have 2 users
user A Applied a leave from 1 dec 2018 to 10 dec 2018 so leaveApplied for this user is 7 days as 2nd and 9th dec is sunday 8th dec is non working saturday
user B Applied a leave from 1 dec 2018 to 5 dec 2018 so leaveApplied for this user is 4 days as 2nd dec is sunday
And Dec 1 is saturday this saturday is working saturday as per my db but now I am giving this saturday as official holiday, I updated officialHolidays and now I want to update appliedLeave table too so that the LeaveApplied for user A becomes 6 and user B becomes 3
I want to use loops to update the reord of user A then User B
here is my update query
UPDATE officialHolidays SET
Active = #Active
WHERE OfficialID = #OfficialID
this is what I Tried just now
DECLARE #HolidayDate AS DATE = (SELECT Date FROM officialHolidays WHERE OfficialID = #OfficialID)
IF(EXISTS(SELECT 1 FROM AppliedLeave WHERE AppliedFrom BETWEEN #HolidayDate AND #HolidayDate
OR AppliedTo BETWEEN #HolidayDate AND #HolidayDate))
BEGIN
DECLARE #LeaveTaken AS FLOAT
DECLARE #LeaveRemaining AS FLOAT
--I want to add Loop Here
END
how can I add a loop in this scenerio?

mktime() giving different results for same input in different timezone

Here is the piece of code for converting Fri Jan 1 00:00:00 IST 1970 to EPOCH
memset(&Date_st,0,sizeof(struct tm));
Date_st.tm_year = 70;
Date_st.tm_mon = 0;
Date_st.tm_mday = 1;
Date_st.tm_hour = 24;
Date_st.tm_min = 0;
Date_st.tm_sec = 0;
Date_st.tm_isdst = 0 ;
date_in_seconds = mktime( &Date_st );
The code is running on two servers having different time zones
Server_1!:user_1> Tue Aug 25 11:03:51 IDT 2015
Server_2!:user_2> Tue Aug 25 05:05:03 CLT 2015
Now the code gives different output on different servers for same input which is Fri Jan 1 00:00:00 IST 1970
Server_1 -> 79200
Server_2 -> 100800
Can someone suggest why output is different? and how can it be make same {I want it to be same} ?
That's what timezones are all about, the local time is different.
You might want to try the gmtime function instead, if you want a common reference for the time.

Adjusting dates to avoid overlap of days

If I have three dates, e.g. Jan 1, Jan 25, and Feb 20 but I want the dates to be separated by 30 days, how can i do it?
For example, what I want to do is Jan 1, Jan 30, Feb 29.
I am very new at R but the code should be something like this - If 2nd date is before (1st date+30), then adjust 2nd date to (1st+31) and similarly for 3rd date..
Any help will be much appreciated!
Since you want a fixed distance between each adjacent pair of dates, you don't need to "adjust" any dates; rather, you can just compute the desired date vector from scratch, starting with the first date.
This can actually be done with a single call to the S3 generic seq(), which will dispatch to seq.Date():
seq(as.Date('2000-01-01'),by=30,length.out=3);
## [1] "2000-01-01" "2000-01-31" "2000-03-01"
Also note that you seem to have made an error in deriving your expected dates; 30 days from Jan 1 is Jan 31, not Jan 30.
d1 = as.Date("01-01",format="%m-%d")
d2 = as.Date("01-25",format="%m-%d")
if (abs(as.numeric(difftime(d2,d1)))<30) d2 = d1 + 30
>d2
[1] "2015-01-31"

DST-switch-aware getter for UNIX timestamp of current day's local time midnight

(Language/API: Standard C 89 library and / or POSIX)
Probably a trivial question, but I've got a feeling that I'm missing something.
I need to implement this function:
time_t get_local_midnight_timestamp(time_t ts);
That is, we get arbitrary timestamp (from the last year, for example), and return it rounded up to the midnight of the same day.
The problem is that the function must be aware of DST switches and DST rules changes (like DST cancellation and/or extension).
The function must also be future-proof, and cope with weird TZ changes (like shift of time zone 30 minutes ahead etc.).
(The reason I need all this that I need to implement look up into some older statistics data.)
As far as I understand, naïve approach with zeroing out struct tm time fields would not work — precisely because of DST stuff (looks like in DST-change day there are two local midnight time_t timestamps).
Please point me in the right direction...
I doubt that it can be done with standard C 89, so POSIX-specific solutions are acceptable. If not POSIX, then something Debian-specific would do...
Update: Also: Something tells me that I should also take leap seconds in account. Maybe I should look into trying to directly use Tz database... (Which is rather sad — so much /perceived/ overhead for so small task.) ...Or not — seems that libc should use it, so maybe I'm just doing it wrong...
Update 2: Here is why I think that naïve solution does not work:
#include <stdio.h>
#include <time.h>
int main()
{
struct tm date_tm;
time_t date_start = 1301173200; /* Sunday 27 March 2011 0:00:00 AM MSK */
time_t midnight = 0;
char buf1[256];
char buf2[256];
int i = 0;
for (i = 0; i < 4 * 60 * 60; i += 60 * 60)
{
time_t date = date_start + i;
localtime_r(&date, &date_tm);
strftime(buf1, 256, "%c %Z", &date_tm);
date_tm.tm_sec = 0;
date_tm.tm_min = 0;
date_tm.tm_hour = 0;
midnight = mktime(&date_tm);
strftime(buf2, 256, "%c %Z", &date_tm);
printf("%d : %s -> %d : %s\n", (int)date, buf1, (int)midnight, buf2);
}
}
Output (local time was MSD at the moment when I run this):
$ gcc time.c && ./a.out
1301173200 : Sun Mar 27 00:00:00 2011 MSK -> 1301173200 : Sun Mar 27 00:00:00 2011 MSK
1301176800 : Sun Mar 27 01:00:00 2011 MSK -> 1301173200 : Sun Mar 27 00:00:00 2011 MSK
1301180400 : Sun Mar 27 03:00:00 2011 MSD -> 1301169600 : Sat Mar 26 23:00:00 2011 MSK
1301184000 : Sun Mar 27 04:00:00 2011 MSD -> 1301169600 : Sat Mar 26 23:00:00 2011 MSK
As you can see, two midnights.
I ran your code with the TZ environment variable set to "Europe/Moscow" and was able to reproduce your output. Here's what I think is going on:
On the first two lines, everything is fine. Then we "spring ahead" and 2 AM becomes 3 AM. Let's use gdb to break on entry to mktime and see what its argument is each time:
hour mday mon year wday yday isdst gmtoff tm_zone
0 27 2 111 0 85 0 10800 MSK
0 27 2 111 0 85 0 10800 MSK
0 27 2 111 0 85 1 14400 MSD
0 27 2 111 0 85 1 14400 MSD
So what has happened? Your code sets the hour to 0 each time, but this is a problem after the DST switch, because the impossible has happened: it is now "before" the DST switch in terms of the time of day, yet isdst is now set and gmtoff has been increased by one hour. By hacking up the time, you have "created" a time of midnight but with DST enabled, which is basically invalid.
You may now wonder, how can we get out of this mess? Do not despair! When you are adjusting the tm_hour field by hand, simply admit that you no longer know what the DST status is by setting tm_isdst to -1. This special value, which is documented in man localtime, means the DST status is "not available." So the computer will figure it out, and everything should work fine.
Here's my patch for your code:
date_tm.tm_hour = 0;
+ date_tm.tm_isdst = -1; /* we no longer know if it's DST or not */
Now I get this output, I hope is what you want:
$ TZ='Europe/Moscow' ./a.out
1301173200 : Sun Mar 27 00:00:00 2011 MSK -> 1301173200 : Sun Mar 27 00:00:00 2011 MSK
1301176800 : Sun Mar 27 01:00:00 2011 MSK -> 1301173200 : Sun Mar 27 00:00:00 2011 MSK
1301180400 : Sun Mar 27 03:00:00 2011 MSD -> 1301173200 : Sun Mar 27 00:00:00 2011 MSK
1301184000 : Sun Mar 27 04:00:00 2011 MSD -> 1301173200 : Sun Mar 27 00:00:00 2011 MSK

Resources