Stata loops for two lists of variables - loops

I am looking for a way to run a for loop for a variable list, based on the value in a corresponding list, where it takes a specific value.
The enrollment variables are numeric and represent counts of people enrolled and the second list is the program in each year that is categorical and represents if the enrollment is in program 1 or program 2. Is it possible to run summary statistics of the enrollment figures based on program = program 1?
Each of the variables has the same prefix enrollment_ or program_ and a different suffix based on the program year 1 (yr1), year 2 (yr2), or year 3 (yr3)
for example,
Possible to use a do loop where the second list takes a specific value in that year?
local vars enrollment_yr1 enrollment_yr2 enrollment_yr3
program_yr1 program_yr2 program_yr3 where programyr == "program_1"
foreach var of local vars {
sum `var', detail if program_yr == "program 1"
}

Related

(SPSS) Assign values to remaining time point based on value on another variable, and looping for each case

I am currently working on analyzing a within-subject dataset with 8 time-ordered assessment points for each subject.
The variables of interest in this example is ID, time point, and accident.
I want to create two variables: accident_intercept and accident_slope, based on the value on accident at a particular time point.
For the accident_intercept variable, once a participant indicated the occurrence of an accident (e.g., accident = 1) at a specific time point, I want the values for that time point and the remaining time points to be 1.
For the accident_slope variable, once a participant indicated the occurrence of an accident (e.g., accident = 1) at a specific time point, I want the value of that time point to be 0, but count up by 1 for the remaining time points until the end time point, for each subject.
The main challenge here is that the process stated above need to be repeated/looped for each participant that occupies 8 rows of data.
Please see how the newly created variables would look like:
I have looked into the instruction for different SPSS syntax, such as loop, the lag/lead functions. I also tried to break my task into different components and google each one. However, I have not made any progress :)
I would be really grateful of any helps and directions that you provide.
Here is one way to do what you need using aggregate to calculate "accident time":
if accident=1 accidentTime=TimePoint.
aggregate out=* mode=addvariables overwrite=yes /break=ID/accidentTime=max(accidentTime).
if TimePoint>=accidentTime Accident_Intercept=1.
if TimePoint>=accidentTime Accident_Slope=TimePoint-accidentTime.
recode Accident_Slope accidentTime (miss=0).
Here is another approach using the lag function:
compute Accident_Intercept=0.
if accident=1 Accident_Intercept=1.
if $casenum>1 and id=lag(id) and lag(Accident_Intercept)=1 Accident_Intercept=1.
compute Accident_Slope=0.
if $casenum>1 and id=lag(id) and lag(Accident_Intercept)=1 Accident_Slope=lag(Accident_Slope) +1.
exe.

Count across columns starting from unique column for each row in SAS

I have a dataset containing a number of persons who have been involved in an accident. Each person have been in an accident at a different time and I have coded a variable start_week which indicates what week number after a certain date (january 1st 2011), the accident occurred.
For each individual I also have a a variable for each week after january 1st 2011, that shows whether or not this individual has been hospitalized. I now need to count how many weeks a person has been hospitalized XX weeks after the accident.
The desired results should be a column like sum_week that sums number of weeks after the accident depending on the value shown in the variable start_week.
Id
start_week
week_1
week_2
week_3
week_4
sum_week
1
2
1
0
1
1
2
2
3
1
0
0
1
1
I think this can be done using an array, but I have no idea how. If it isn't possible to count across columns based on the variable start_week, I am planning on transposing my data. I would however prefer if this could be done without having to transpose my data.
Any help is much appreciated!
Just use the START_WEEK as the initial value in the DO loop you use to check the array.
data want;
set have ;
array week_[4];
sum_week=0;
do index=start_week to dim(week_);
sum_week+week_[index];
end;
drop index;
run;

how to compute multiple variables using loop

In the dataset, there are two columns "start_year" and "end_year", indicating the year a patient start and end the registration in the GP clinic. I want to know whether each patient was registered in the clinic from 1990 to 2019. Probably compute 20 new variables (1=yes,0=no) for each year.
I used ifelse (R) to compute the variable one by one:
test$pt_1990<-ifelse(test$start_year<=1990 & 1990<=test$end_year,1,0)
Hope loops could have a better solution instead of write 20 lines of same code. Thank u very much

Return the smallest value from a list, in which only certain values are eligeble - excel

I am having some troubles formulating my problem but I hope you understand!
I have a table of firms building production plants in foreign countries in certain years. (Columns A to C).
In a seperate table i have so-called cross-national distance measures (based on the difference in gdp of the countries). (Columns G to M). Note that the distances change per year.
A simplified version of the excel would look like this:
https://new.wu.ac.at/fileadmin/wu/d/i/iib/photo/stack.JPG
What I want is a formula for the manually entered results in column D. It shall give me a result which is the following:
It shall look in which countries the specific company has previously (years before) built plants
It shall find the smallest cross-national distance from the current country to any of the countries previously entered
The value should be for the year of the current plant-construction
Let me illustrate my request with the example result i would want in cell D8:
The formula would have to find a list of countries that were previously entered in this case Turkey and Bulgaria
It would then have to into the second table and give me the minimum of the distances from Kosovo but only to Turkey and Bulgaria
This would have to be done in the rows for 2008 (current year)
I really hope you guys can help me, i figured out a way to find a minimum in a list and i can do it for certain years as well but the issue i am having that excel first needs to find the previously entered countries, memorize them in some kind of array and then use only these countries to consider the minimum distance.
Thank you very much!
Try this "array formula" for D2 copied down
=IFERROR(SMALL(IF(COUNTIFS(A$2:A$11,A2,B$2:B$11,"<"&B2,C$2:C$11,"<>"&C2,C$2:C$11,I$1:M$1)*(G$2:G$31=B2)*(H$2:H$31=C2),I$2:M$31),1),"N/A")
confirmed with CTRL+SHIFT+ENTER
That checks three conditions for your larger table - that the header row matches a qualifying country (using COUNTIFS function based on criteria in the small table), that column G matches the current year and column H matches the current country.
If all those criteria are satisfied then the relevant values in the table are returned, and SMALL finds the smallest. If there's an error (because there are no qualifying values) then N/A is returned
In Excel 2010 or later versions you can use AGGREGATE function instead of SMALL - this is useful because it doesn't require "array entry"
=IFERROR(AGGREGATE(15,6,I$2:M$31/(COUNTIFS(A$2:A$11,A2,B$2:B$11,"<"&B2,C$2:C$11,"<>"&C2,C$2:C$11,I$1:M$1)>0)/(G$2:G$31=B2)/(H$2:H$31=C2),1),"N/A")

SPSS Identifying Different Lagged Values Through Loops

I have this dataset with 2 variables: week and brand_chosen, where brand chosen designates which product from e.g. a super market was chosen, an it looks like this.
Week brand_chosen
2 19
2 15
2 50
2 12
3 19
3 16
3 50
4 77
4 19
What I am trying to do is for each line, to note the week in which the brand purchase was made, and check if in the week before that the same brand purchase was made. In case it did, a variable dummy would take the value of 1, otherwise 0.
Because week appears multiple times I cannot take just the lag(week,1), so I probably need to loop through the week variables for each case, until it finds the first different value.
This is what i tried to do
loop i=1 to 70.
do if (week<>lag(week,i) and brand_chosen=lag(brand_chosen,i)).
compute dummy=1.
end loop.
else.
compute dummy=0.
end if.
end loop.
execute.
Where 70 is just an arbitrary number so that I am sure that it will check all the previous cases.
I get two problems with that. First the lag function needs to contain a number from what I understand but "i" is not considered a number here.
The second problem is that i would like to close the loop if the condition is satisfied, and move to the next case but I get an error.
I am new to spss syntax and I am struggling with that one, so any help is greatly appreciated.
I assume that every combination of week--brand_chosen is unique. In this case the solution is quite simple. Just reorder your dataset by brand_chosen and then week, and then run a simple lag command.
This should do the trick:
SORT CASES BY brand_chosen week.
COMPUTE dummy=0.
IF (brand_chosen=LAG(brand_chosen) AND week>LAG(week)) dummy = 1.

Resources