Creating loop syntax with a count variable (SPSS) - loops

I have a longitudinal (person-level) dataset I am looking to for syntax for counting the number of times an event happened up to a certain point. More specifically I have 200 weeks of data (each week is coded 1-7, I'm only interested in weeks where the value is 5 or greater), But I am only interested in weeks that happened before a certain time point (the time point is different for each person but and captured under a single variable "eventweek"). So for person Y whose eventweek = 154, I want to know what percentage of weeks before week 154 (wks 1-153) where that person was coded a 5 or above. For person Z whose eventweek = 52, I want to know what percentage of weeks before week 52 (wks 1-51) for which that person was coded a 5 or above, and so on.
Any ideas on how to code this?

Try the following:
vector v = week1 to week200.
compute #z = 0.
loop #i = 1 to (eventweek -1).
compute #z = #z + (v(#i) ge 5).
end loop.
compute ew_perc = #z/(eventweek -1)*100.
exe.

Related

Data Entry in SAS using Loops

I just learned about the "do" loop today and would like to try using it for data entry in SAS. I have tried most examples online, but I still cannot figure it out.
My dataset in an experiment with 6 treatments (1 to 6) using 2 sets of cues, 3 each, Visual and Audio. There's lag measured in seconds, which are 5, 10, and 15, which there are 2 sets.
Basically it looks like this:
Table
The entries I want are:
1. Obs_no, ranging from 1 to 18 (total of 18 observations, this allows me to easily delete outliers with an IF THEN)
2. Treatment type, which are Auditory and Visual.
3.Treatment number, 1 to 6, 3 sets.
4. Lag, 5, 10 or 15.
5. And the data itself
So far, my code makes 2 and 5 possible, it also makes the rest possible with an IF THEN statement and input statement, although I assume there's a way easier method:
data AVCue;
do cue = 'Auditory','Visual';
do i = 1 to 3;
input AVCue ##;
output;
end;
end;
datalines;
.204 .167 .202 .257 .283 .256
.170 .182 .198 .279 .235 .281
.181 .187 .236 .269 .260 .258
;
Lag and the rest was made possible using an IF THEN statement and the crude method of input:
data AVCue;
set AVCue;
IF i=1 THEN Lag=5;
IF i=2 THEN Lag=10;
IF i=3 THEN Lag=15;
input obs_no treatment;
cards;
1 1
2 2
3 3
4 4
5 5
6 6
7 1
8 2
9 3
10 4
11 5
12 6
13 1
14 2
15 3
16 4
17 5
18 6
;
proc print data=AVCue;
run;
The IF THEN should be fine, but the input statement here is just in my opinion counterproductive, and defeats the purpose of using loops, which is to me, to save time. If done this way, I might as well just put the data into excel and import it, or type everything out with ample copy and paste of the text in the
input obs_no treatment;
cards;
section.
My coding knowledge is basic, so sorry if this question sounds silly, I want to know:
1. How would I make a list of numbers using the "do" loops in SAS? I've made several attempts and all I get is a list containing the next number. I know why this happens, the loop counts to x and the value assigned would just be x. I just don't know how to get around that. Somehow this didn't happen in the datalines section, I guess SAS knows there's 18 numbers and the entry i is stored accordingly... or something?
2. How would I go about assigning in this case, the numbers 1 to 6 to each entry?
Thanks!
It is certainly much easier to read in the actual dataset instead of having to impute some of the variables based on the order the values have in the source data. You might be able to combine a SET statement and an INPUT statement in the same data step and get it to work, but it is probably NOT worth the effort. Just make two datasets and merge them.
Looking at the photograph you posted it looks like TREATMENT is not an independent variable. Instead it is just a label for the combination of CUE and LAG. To make it cycle from 1 to 6 just reset it back to 1 when it gets too large.
data AVCue;
do cue = 'Auditory','Visual';
do lag= 5, 10, 15 ;
treatment+1;
if treatment=7 then treatment=1;
obsno+1;
input AVCue ##;
output;
end;
end;
datalines;
.204 .167 .202 .257 .283 .256
.170 .182 .198 .279 .235 .281
.181 .187 .236 .269 .260 .258
;
You can get in trouble if you just let SAS guess at how you want to define your variables. For example if you change the order of the CUE values do cue = 'Visual','Auditory'; then SAS will make CUE with length $5 instead of $8. Add a LENGTH statement to define your variables before you use them.
length obsno 8 treatment 8 cue $8 lag 8 AVCue 8 ;
This will also let you control the order they are created in the dataset.
If you really did already have a SAS dataset and you wanted to add a variable like TREATMENT that cycled from 1 to 6 (or really any DO loop construct) then could nest the SET statement inside the DO loop. Just remember to add the explicit OUTPUT statement.
data new ;
do treatment=1 to 6 ;
set old;
output;
end;
run;

lag over columns/ variables SPSS

I want to do something I thought was really simple.
My (mock) data looks like this:
data list free/totalscore.1 to totalscore.5.
begin data.
1 2 6 7 10 1 4 9 11 12 0 2 4 6 9
end data.
These are total scores accumulating over a number of trials (in this mock data, from 1 to 5). Now I want to know the number of scores earned in each trial. In other words, I want to subtract the value in the n trial from the n+1 trial.
The most simple syntax would look like this:
COMPUTE trialscore.1 = totalscore.2 - totalscore.1.
EXECUTE.
COMPUTE trialscore.2 = totalscore.3 - totalscore.2.
EXECUTE.
COMPUTE trialscore.3 = totalscore.4 - totalscore.3.
EXECUTE.
And so on...
So that the result would look like this:
But of course it is not possible and not fun to do this for 200+ variables.
I attempted to write a syntax using VECTOR and DO REPEAT as follows:
COMPUTE #y = 1.
VECTOR totalscore = totalscore.1 to totalscore.5.
DO REPEAT trialscore = trialscore.1 to trialscore.5.
COMPUTE #y = #x + 1.
END REPEAT.
COMPUTE trialscore(#i) = totalscore(#y) - totalscore(#i).
EXECUTE.
But it doesn't work.
Any help is appreciated.
Ps. I've looked into using LAG but that loops over rows while I need it to go over 1 column at a time.
I am assuming respid is your original (unique) record identifier.
EDIT:
If you do not have a record indentifier, you can very easily create a dummy one:
compute respid=$casenum.
exe.
end of EDIT
You could try re-structuring the data, so that each score is a distinct record:
varstocases
/make totalscore from totalscore.1 to totalscore.5
/index=scorenumber
/NULL=keep.
exe.
then sort your cases so that scores are in descending order (in order to be bale to use lag function):
sort cases by respid (a) scorenumber (d).
Then actually do the lag-based computations
do if respid=lag(respid).
compute trialscore=totalscore-lag(totalscore).
end if.
exe.
In the end, un-do the restructuring:
casestovars
/id=respid
/index=scorenumber.
exe.
You should end up with a set of totalscore variables (the last one will be empty), which will hold what you need.
you can use do repeat this way:
do repeat
before=totalscore.1 to totalscore.4
/after=totalscore.2 to totalscore.5
/diff=trialscore.1 to trialscore.4 .
compute diff=after-before.
end repeat.

Counting Columns with conditions, assigning values based on count

I have a table with call logs. I need to assign time slots for next call based on which time slot the phone number was reachable in.
The relevant columns of the table are:
Phone Number | CallTimeStamp
CallTimeStamp is a datetime object.
I need to calculate the following:
Time Slot: From the TimeStamp, I need to calculate the count for each time slot (eg. 0800-1000, 1001-1200, etc.) for each phone number. Now, if the count is greater than 'n' for a particular time slot, then I need to assign that time slot to that number. Otherwise, I select a default time slot.
Weekday Slot: Same as above, but with weekdays.
Priority: Basically a count of how many times a number was reached
Here's I have gone about solving these issues:
Priority
To calculate the number of times a phone number is called is straight forward. If a number exists in the call log, I know that it was called. In that case, the following query will give me the call count for each number.
SELECT DISTINCT(PhoneNumber), COUNT(PhoneNumber) FROM tblCallLog
GROUP BY PhoneNumber
However, my problem is that I need to change the values in the field Count(PhoneNumber) based on the value in that column itself. How do I go about achieving this? (eg. If Count(PhoneNumber) gives me a value > 20, I need to change it to 5).
Time Slot / Weekday
This is where I'm completely stumped and am looking for the "database" way of doing things.
Unfortunately, I can't get out of my iterative process of thinking. For example, if I was aggregating for a certain phone number (say '123456') and in a certain time slot (say between 0800-1000 hrs), I can write a query like this:
DECLARE #T1Start time = '08:00:00.0000'
DECLARE #T2End time = '10:00:00.0000'
SELECT COUNT(CallTimeStamp) FROM tblCallLog
WHERE PhoneNumber = '123456' AND FORMAT(CallTimeStamp, 'hh:mm:ss') >= #T1Start AND FORMAT(CallTimeStamp, 'hh:mm:ss') < #T2End
Now, I could go through each and every Distinct Phone Number in the table, count the values for each time slot and then assign a slot value for the phone number. However, there has to be a way that does not involve me iterating through a database.
So, I am looking for suggestions on how to solve this.
Thanks
You can use DATEPART Function to get week day slot.
To calculate time slot you can try dividing number of minutes from beginning of day and dividing it by size of the time slot. It would return you slot number. You can use either CASE statement to translate it to proper string or look table where you can store slot descriptions.
SELECT
PhoneNumber
, DATEPART(WEEKDAY, l.CallTimeStamp) AS DayOfWeekSlot
, DATEDIFF(MINUTE, CONVERT(DATE, l.CallTimeStamp), l.CallTimeStamp) / 120 AS TwoHourSlot /*You can change number of minutes to get different slot size*/
, COUNT(*) AS Count
FROM tblCallLog l
GROUP BY PhoneNumber
, DATEPART(WEEKDAY, l.CallTimeStamp)
, DATEDIFF(MINUTE, CONVERT(DATE, l.CallTimeStamp), l.CallTimeStamp) / 120
You could try this to return the phone number, the day of the week and a 2 hour slot. If the volume of calls is greater than 20 the value is set to 5 (not sure why to 5?). The code for the 2 hour section is adapted from this question How to Round a Time in T-SQL where the value 2 in (24/2) is the number of hours in your time period.
SELECT
PhoneNumber
, DATENAME(weekday,CallTimeStamp) as [day]
, CONVERT(smalldatetime,ROUND(CAST(CallTimeStamp as float) * (24/2),0)/(24/2)) AS RoundedTime
, CASE WHEN COUNT(*) > 20 THEN 5 ELSE COUNT(*) END
FROM
tblCallLog
GROUP BY
PhoneNumber
, DATENAME(weekday,dateadd(s,start_ts,'01/01/1970'))

algorithm for finding date in sorted array of dates

here is my problem.
I have a sorted array of dates that is stored in a circular buffer. I have a pointer to last date in buffer. There is a possibility that some dates are missing. Client requires a range of dates. If low limit date is missing, program should return first closest date that is higher then required one and vice versa for upper limit date.
Here is an example:
Dates in circular buffer (int[18]):
1,2,3,4,5,11,12,13,14,15,21,22,23,24,25,26,27,28
and if client wants from 8 to 23,
program should return 11,12,13,14,15,21,22,23.
I tried like this :
Notes:
- number between two stars is current date, and diff is number of steps to go to find 8.
- pointer can not be less then 0 or higher then 17.
{1,2,3,4,5,11,12,13,14,15,21,22,23,24,25,26,27,*28*}, diff = -20
{*1*,2,3,4,5,11,12,13,14,15,21,22,23,24,25,26,27,28}, diff = +7
{1,2,3,4,5,11,12,*13*,14,15,21,22,23,24,25,26,27,28}, diff = -5
{1,2,*3*,4,5,11,12,13,14,15,21,22,23,24,25,26,27,28}, diff = +5 -> (5/2)+1=+3<br />
(if I detect that I will just go x steps forward and x steps backward I split x in half)
{1,2,3,4,5,*11*,12,13,14,15,21,22,23,24,25,26,27,28}, diff = -3 -> (-3/2)-1 = -2
{1,2,3,*4*,5,11,12,13,14,15,21,22,23,24,25,26,27,28}, diff = 4
{1,2,3,4,5,11,12,*13*,14,15,21,22,23,24,25,26,27,28}, diff = -5
{1,2,*3*,4,5,11,12,13,14,15,21,22,23,24,25,26,27,28}, diff = +5 -> (5/2)+1=+3
If we continue like this we will get 13,3,11,4 over and over again.
Notes:
- It is only coincidence that we get 11 here. When I use some real examples, with more dates,this algorithm jumps over some other 4 (or 3) numbers.
- Dates are stored in EEPROM of uC, so reading dates take a while, and I need to find date as quick as it possible (with minimum reads).
Please help.
Set p1 to be the start of the buffer, p2 to be the end. X is what you're looking for.
If the date of p1Date is after X, return p1. If p2Date is before X return p2.
Look at the midpoint between p1 and p2, m. If mDate is after X then p1=m else p2=m.
Repeat until p1=p2.

Finding values of same time every week. code provided, data provided. 4D-matrix

I have coded this partly..but am not sure, since what i get is only partial data.
so i have a matrix 4D, it has dimensions: xV(6,24,63,15) ---> meaning: xV(min,hour,day,customer).. the data is collected every 10 min for 63 days for 15 customer.
so that is why first 6 row is 10 min interval.
what i want is that i can collect the data for lets say monday every week and use it for plot.
meaning there is 63/7 = 9 mondays.. 9 mondays having 24 hours where each hour has 6 data(every 10 min). i want for each of those hour each monday each 10 min a new matrix..so i can take the mean of it and plot..
is this possible?
i have come so far..but no luck:
n = 0;
m = 0;
while(n<24)
n = n + 1;
while(m<6)
m = m + 1;
Va(:,m) = x(m,n,1:63,1); %(min,hour,day,line)
Vb(:,m) = x(m,n,1:63,1);
Vc(:,m) = x(m,n,1:63,1);
end
end
the file: xV.mat
thanks again for help
firstMonday = 1; %// index of first Monday. 1 if first day is a Monday
result = xV(:,:,firstMonday:7:end,:);
This gives a 6x24x9x15 matrix containing only Mondays. To average over all Mondays, use
squeeze(mean(result,3)) %// mean along 3rd dim. Size is 6x24x15

Resources