COBOL85: How to find number of rows in an array dynamically - arrays

In my program I keep filling the following array with data obtained from a database table then inspect it to find certain words:
01 PRODUCTS-TABLE.
03 PRODUCT-LINE PIC X(40) OCCURS 50 TIMES.
sometimes it occurs 6 times, sometimes more than 6 times.
I'd like to find the number of lines in the array every time I write data to it , how can I do that ?
I tried this but it based on a fixed length:
INSPECT-PROCESS.
MOVE 0 TO TALLY-1.
INSPECT PRODUCTS-TABLE TALLYING TALLY-1 FOR ALL "PRODUCT"
IF TALLY-1 > 0
MOVE SER-NUMBER TO HITS-SN-OUTPUT
MOVE FILLER-SYM TO FILLER-O
MOVE PRODUCT-LINE(1) TO HITS-PR-OUTPUT
WRITE HITS-REC
PERFORM WRITE-REPORT VARYING CNT1 FROM 2 BY 1 UNTIL CNT1 = 11.
WRITE-REPORT.
MOVE " " TO HITS-SN-OUTPUT
MOVE PRODUCT-LINE(CNT1) TO HITS-TX-OUTPUT
WRITE HITS-REC.
In the first output line it writes the SN and the first product-line then in the following lines it writes all remaining product-line and blank out SN.
Something like:
12345678 first product-line
Second product-line
etc
It’s working, however, it only stops when CNT1 is 11, how can I feed the procedure with a variable CNT1 based on how many lines are actually in PRODUCTS-TABLE each time?

I solved the problem by adding an array line counter (LINE-COUNTER-1) to count (ADD 1 TO LINE-COUNTER-1) how many times I add data to the array and stop writing the report when "WRITE-COUNTER = LINE-COUNTER-1"
INSPECT-PROCESS.
MOVE 0 TO TALLY-1
INSPECT PRODUCTS-TABLE TALLYING TALLY-1 FOR ALL "PRODUCT"
IF TALLY-1 > 0
MOVE HOLD-SER-NUM TO HITS-SN-OUTPUT
MOVE FILLER-SYM TO FILLER-O
MOVE PRODUCT-LINE(1) TO HITS-PR-OUTPUT
WRITE HITS-REC
PERFORM WRITE-REPORT VARYING WRITE-COUNTER FROM 2 BY 1
UNTIL WRITE-COUNTER = LINE-COUNTER-1.

Related

SAS: set statement point = _N_

I'm trying to understand a friend's code to see if I can find some inspiration for my dissertation. He runs a section where he creates a dataset and inputs 3 datasets. However, what I don't understand is that he uses 3 set statements and the latter datasets use point = "_ N _"
What is the use of the following code?
data Other;
set One;
set Two point = _N_;
set Three point = _N_;
array Rating[*] Unrated;
array Amortising[*] '1'n;
array Rating_old[*] old_Unrated;
AM = 0;
do i = 1 to dim(Rating);
Rating[i] = Rating[i] + Rating_old[i] * Amortising[i];
end;
run;
The input datasets look like this
data one;
input Segment count weight ;
datalines;
1 0 0.1
99 1 0.2
;
run;
data two;
input block $ type '0'n '1'n '99'n;
datalines;
50 A 100% 10% 0%
50 S 100% 10% 0%
51 S 100% 10% 0%
52 S 100% 10% 0%
132 S 100% 12% 0%
;
run;
data three;
input DPD $ block type $ segment count weight;
datalines;
AM 50 S 1 0 0.1
Unrated 51 S 99 0.2
NPE 132 S 1 0.5
;
run;
Just looking to see what the point = _ N _ would be used for!
In this program it does nothing. The program would run exactly the same without the point= option on the last two set statements.
The POINT= let's you access observations directly. The _N_ automatic variable is incremented once for each iteration of the data step. So on the first iteration the step will read the first observation from each of the three inputs. Which is exactly what would happen without the point= option.
Note that this program will stop when the first SET statement reads past the end of the file. Without the POINT= then it would stop when ANY of the three set statements attempted to read past the end of the input file. You could do the same and avoid the ERRORs in the SAS log by using and testing the NOBS= options.
set One;
if _n_ <= nobs2 then set Two nobs=nobs2;
if _n_ <= nobs3 then set Three nobs=nobs3;
Given the datasets shown, it doesn't do anything.
However, if the ONE dataset had more rows than one or both of the other two datasets, it would avoid the data step stopping when it ran out of rows from the shortest dataset. For example, run this:
data Other;
set Two;
set One point = _N_;
set Three point = _N_;
array Rating[*] Unrated;
array Amortising[*] '1'n;
array Rating_old[*] old_Unrated;
AM = 0;
do i = 1 to dim(Rating);
Rating[i] = Rating[i] + Rating_old[i] * Amortising[i];
end;
run;
Just swapping TWO and ONE. Now you get 5 rows, while if you took off the point=_n_, you'd only get two still. So the program is likely being written to ensure all of ONE's rows are represented (similar to a left join in SQL except you're not joining to anything here). This would probably be more clearly written as a merge, even without a by statement if it's just a one-to-one merge. Usually, though, there's a valid merge key to merge on.

Data Entry in SAS using Loops

I just learned about the "do" loop today and would like to try using it for data entry in SAS. I have tried most examples online, but I still cannot figure it out.
My dataset in an experiment with 6 treatments (1 to 6) using 2 sets of cues, 3 each, Visual and Audio. There's lag measured in seconds, which are 5, 10, and 15, which there are 2 sets.
Basically it looks like this:
Table
The entries I want are:
1. Obs_no, ranging from 1 to 18 (total of 18 observations, this allows me to easily delete outliers with an IF THEN)
2. Treatment type, which are Auditory and Visual.
3.Treatment number, 1 to 6, 3 sets.
4. Lag, 5, 10 or 15.
5. And the data itself
So far, my code makes 2 and 5 possible, it also makes the rest possible with an IF THEN statement and input statement, although I assume there's a way easier method:
data AVCue;
do cue = 'Auditory','Visual';
do i = 1 to 3;
input AVCue ##;
output;
end;
end;
datalines;
.204 .167 .202 .257 .283 .256
.170 .182 .198 .279 .235 .281
.181 .187 .236 .269 .260 .258
;
Lag and the rest was made possible using an IF THEN statement and the crude method of input:
data AVCue;
set AVCue;
IF i=1 THEN Lag=5;
IF i=2 THEN Lag=10;
IF i=3 THEN Lag=15;
input obs_no treatment;
cards;
1 1
2 2
3 3
4 4
5 5
6 6
7 1
8 2
9 3
10 4
11 5
12 6
13 1
14 2
15 3
16 4
17 5
18 6
;
proc print data=AVCue;
run;
The IF THEN should be fine, but the input statement here is just in my opinion counterproductive, and defeats the purpose of using loops, which is to me, to save time. If done this way, I might as well just put the data into excel and import it, or type everything out with ample copy and paste of the text in the
input obs_no treatment;
cards;
section.
My coding knowledge is basic, so sorry if this question sounds silly, I want to know:
1. How would I make a list of numbers using the "do" loops in SAS? I've made several attempts and all I get is a list containing the next number. I know why this happens, the loop counts to x and the value assigned would just be x. I just don't know how to get around that. Somehow this didn't happen in the datalines section, I guess SAS knows there's 18 numbers and the entry i is stored accordingly... or something?
2. How would I go about assigning in this case, the numbers 1 to 6 to each entry?
Thanks!
It is certainly much easier to read in the actual dataset instead of having to impute some of the variables based on the order the values have in the source data. You might be able to combine a SET statement and an INPUT statement in the same data step and get it to work, but it is probably NOT worth the effort. Just make two datasets and merge them.
Looking at the photograph you posted it looks like TREATMENT is not an independent variable. Instead it is just a label for the combination of CUE and LAG. To make it cycle from 1 to 6 just reset it back to 1 when it gets too large.
data AVCue;
do cue = 'Auditory','Visual';
do lag= 5, 10, 15 ;
treatment+1;
if treatment=7 then treatment=1;
obsno+1;
input AVCue ##;
output;
end;
end;
datalines;
.204 .167 .202 .257 .283 .256
.170 .182 .198 .279 .235 .281
.181 .187 .236 .269 .260 .258
;
You can get in trouble if you just let SAS guess at how you want to define your variables. For example if you change the order of the CUE values do cue = 'Visual','Auditory'; then SAS will make CUE with length $5 instead of $8. Add a LENGTH statement to define your variables before you use them.
length obsno 8 treatment 8 cue $8 lag 8 AVCue 8 ;
This will also let you control the order they are created in the dataset.
If you really did already have a SAS dataset and you wanted to add a variable like TREATMENT that cycled from 1 to 6 (or really any DO loop construct) then could nest the SET statement inside the DO loop. Just remember to add the explicit OUTPUT statement.
data new ;
do treatment=1 to 6 ;
set old;
output;
end;
run;

lag over columns/ variables SPSS

I want to do something I thought was really simple.
My (mock) data looks like this:
data list free/totalscore.1 to totalscore.5.
begin data.
1 2 6 7 10 1 4 9 11 12 0 2 4 6 9
end data.
These are total scores accumulating over a number of trials (in this mock data, from 1 to 5). Now I want to know the number of scores earned in each trial. In other words, I want to subtract the value in the n trial from the n+1 trial.
The most simple syntax would look like this:
COMPUTE trialscore.1 = totalscore.2 - totalscore.1.
EXECUTE.
COMPUTE trialscore.2 = totalscore.3 - totalscore.2.
EXECUTE.
COMPUTE trialscore.3 = totalscore.4 - totalscore.3.
EXECUTE.
And so on...
So that the result would look like this:
But of course it is not possible and not fun to do this for 200+ variables.
I attempted to write a syntax using VECTOR and DO REPEAT as follows:
COMPUTE #y = 1.
VECTOR totalscore = totalscore.1 to totalscore.5.
DO REPEAT trialscore = trialscore.1 to trialscore.5.
COMPUTE #y = #x + 1.
END REPEAT.
COMPUTE trialscore(#i) = totalscore(#y) - totalscore(#i).
EXECUTE.
But it doesn't work.
Any help is appreciated.
Ps. I've looked into using LAG but that loops over rows while I need it to go over 1 column at a time.
I am assuming respid is your original (unique) record identifier.
EDIT:
If you do not have a record indentifier, you can very easily create a dummy one:
compute respid=$casenum.
exe.
end of EDIT
You could try re-structuring the data, so that each score is a distinct record:
varstocases
/make totalscore from totalscore.1 to totalscore.5
/index=scorenumber
/NULL=keep.
exe.
then sort your cases so that scores are in descending order (in order to be bale to use lag function):
sort cases by respid (a) scorenumber (d).
Then actually do the lag-based computations
do if respid=lag(respid).
compute trialscore=totalscore-lag(totalscore).
end if.
exe.
In the end, un-do the restructuring:
casestovars
/id=respid
/index=scorenumber.
exe.
You should end up with a set of totalscore variables (the last one will be empty), which will hold what you need.
you can use do repeat this way:
do repeat
before=totalscore.1 to totalscore.4
/after=totalscore.2 to totalscore.5
/diff=trialscore.1 to trialscore.4 .
compute diff=after-before.
end repeat.

Find a list of max values in a text file using awk

I am new to awk and I cannot figure out the correct syntax for the task I am working on.
I have a text file which looks something like this (the content is always sorted but is not always the same, so I cannot hard code the index of the array):
27 abc123
27 abd333
27 dce123
23 adb234
21 abc789
18 bcd213
So apparently the max is 27. However, I want my output to be:
27 abc123
27 abd333
27 dce123
and not the first row only.
The second column is just there, my code always sorts the text file based on the first column.
My code right now set the max as the first value (27 for example), and as it reads through the lines, it stores only the rows with the max values in an array and eventually print out the output.
awk 'BEGIN {max=$1} {if(($1)==max) a[NR]=($0)} END {for (i in a) print a[i]}' file
You can't read fields in a BEGIN block, since it's executed before the file is read.
To find the first record, use the pattern NR == 1. NR is the number of the current record. To find the other records, just check whether $1 equals the max value.
NR == 1 { max = $1 }
$1 == max { print }
Since your input is always sorted, you can optimise this program by exiting after reading all the records with the max value:
$1 != max { exit }

Print words from the corresponding line numbers

Hello Everyone,
I have two files File1 and File2 which has the following data.
File1:
TOPIC:topic_0 30063951.0
2 19195200.0
1 7586580.0
3 2622580.0
TOPIC:topic_1 17201790.0
1 15428200.0
2 917930.0
10 670854.0
and so on..There are 15 topics and each topic have their respective weights. And the first column like 2,1,3 are the numbers which have corresponding words in file2. For example,
File 2 has:
1 i
2 new
3 percent
4 people
5 year
6 two
7 million
8 president
9 last
10 government
and so on.. There are about 10,470 lines of words. So, in short I should have the corresponding words in the first column of file1 instead of the line numbers. My output should be like:
TOPIC:topic_0 30063951.0
new 19195200.0
i 7586580.0
percent 2622580.0
TOPIC:topic_1 17201790.0
i 15428200.0
new 917930.0
government 670854.0
My Code:
import sys
d1 = {}
n = 1
with open("ap_vocab.txt") as in_file2:
for line2 in in_file2:
#print n, line2
d1[n] = line2[:-1]
n = n + 1
with open("ap_top_t15.txt") as in_file:
for line1 in in_file:
columns = line1.split(' ')
firstwords = columns[0]
#print firstwords[:-8]
if firstwords[:-8] == 'TOPIC':
print columns[0], columns[1]
elif firstwords[:-8] != '\n':
num = columns[0]
print d1[n], columns[1]
This code is running when I type print d1[2], columns[1] giving the second word in file2 for all the lines. But when the above code is printed, it is giving an error
KeyError: 10472
there are 10472 lines of words in the file2. Please help me with what I should do to rectify this. Thanks in advance!
In your first for loop, n is incremented with each line until reaching a final value of 10472. You are only setting values for d1[n] up to 10471 however, as you have placed the increment after you set d1 for your given n, with these two lines:
d1[n] = line2[:-1]
n = n + 1
Then on the line
print d1[n], columns[1]
in your second for loop (for in_file), you are attempting to access d1[10472], which evidently doesn't exist. Furthermore, you are defining d1 as an empty Dictionary, and then attempting to access it as if it were a list, such that even if you fix your increment you will not be able to access it like that. You must either use a list with d1 = [], or will have to implement an OrderedDict so that you can access the "last" key as dictionaries are typically unordered in Python.
You can either:
Alter your increment so that you do set a value for d1 in the d1[10472] position, or simply set the value for the last position after your for loop.
Depending on what you are attempting to print out, you could replace your last line with
print d1[-1], columns[1]
to print out the value for the final index position you currently have set.

Resources