read values using sequential line pointer control in sas - dataset

I have the following txt file with records in the following way
LEE ATHNOS
1215 RAINTREE CIRCLE
PHOENIX AZ 85044
JOYCE BENEFIT
85 MAPLE AVENUE
MENLO PARK CA 94025
These are actually 2 records in multiple lines. The code i am using to input the records
data lineinput;
infile linein;
input Lname $ Fname $ /
Address $1-20 /
City & $10. State $ zip $ ;
run;
I am not able to read the 2nd record .
Below is the log
NOTE: The infile LINEIN is:
Filename=\\VBOXSVR\win_7\SAS\DATA\INPUT\linepointer.txt,
RECFM=V,LRECL=256,File Size (bytes)=105,
Last Modified=30Jun2015:00:45:47,
Create Time=30Jun2015:00:32:31
NOTE: LOST CARD.
Lname=JOYCE Fname=BENEFIT Address=MENLO PARK CA 94025 City= State= zip= _ERROR_=1 _N_=2
NOTE: 6 records were read from the infile LINEIN.
The minimum record length was 10.
The maximum record length was 20.
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.LINEINPUT has 1 observations and 6 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
on proc print I am getting follwing output
ANY idea guys why I am not getting the 2nd record correct.(There is a space between city name so i have used & )

You need to add an option 'truncover' for it to work in a streaming setting:
filename FT15F001 temp lrecl=512;
data hipa;
infile FT15F001 truncover;
input Lname $ Fname $ /
Address $1-20 /
City & $10. State $ zip $ ;
parmcards4;
LEE ATHNOS
1215 RAINTREE CIRCLE
PHOENIX AZ 85044
JOYCE BENEFIT
85 MAPLE AVENUE
MENLO PARK CA 94025
;;;;
run;

Related

SAS progamming problem in using data delimiter *

I'm trying to define, using the data in the products.txt file, a data set with the delimiter *.
products.txt data:
hartie 2 birotica
creione 10 birotica
apa 6 alimente
ceai 8 alimente
tricou 100 haine
I tried to use the delimiter *:
data produse;
infile '/home/u47505185/produse.txt' dlm='*';
input Nume $ Pret Categorie $;
run;
dsd command is changing space into , . i want the command for changing space into *
The DSD option, in addition to the other things it does, changes the DEFAULT delimiter from space to comma. But you can override the default delimiter to any list of characters you want by using the DLM= (also known as DELIMITER=) option, whether or not you are using the DSD option.
From the comments it sounds like you just want to do text manipulation. Just change the spaces to stars. Make sure to remove any trailing spaces (unless you want those to also become stars).
data _null_;
infile '/home/u47505185/produse.txt';
input;
file '/home/u47505185/produse_star.txt';
_infile_=translate(trimn(_infile_),'*',' ');
put _infile_;
run;
To display missing numeric values as an asterik (*), in output or data viewers, use this setting
OPTIONS MISSING='*';
The INFILE DLM= option is for specifying what character(s) in the data file are to be used to separate the variables being INPUT.
DLM does NOT specify a replacement value for missing values.
You told SAS to use * as a field separator.
So what is happening ? The LOG will tell you. Essentially Nume was read as a 8 character variable (default length) and the delimiter never appeared. So, Pret, a numeric variable, had nothing to be read-in from and was assigned a missing value. When viewed in output or data viewer, the value appears as a ..
data want;
infile datalines dlm='*'; * '/home/u47505185/produse.txt' dlm='*';
input Nume $ Pret Categorie $;
datalines;
hartie 2 birotica
creione 10 birotica
apa 6 alimente
ceai 8 alimente
tricou 100 haine
;
Log
25 data want;
26 infile datalines dlm='*'; * '/home/u47505185/produse.txt' dlm='*';
27 input Nume $ Pret Categorie $;
28 datalines;
NOTE: Invalid data for Pret in line 30 1-80.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
31 apa 6 alimente
NOTE: Invalid data errors for file CARDS occurred outside the printed range.
NOTE: Increase available buffer lines with the INFILE n= option.
Nume=hartie 2 Pret=. Categorie=apa 6 al _ERROR_=1 _N_=1
NOTE: Invalid data for Pret in line 33 1-80.
NOTE: LOST CARD.
34 ;
NOTE: Invalid data errors for file CARDS occurred outside the printed range.
NOTE: Increase available buffer lines with the INFILE n= option.
Nume=ceai 8 a Pret=. Categorie= _ERROR_=1 _N_=2
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.WANT has 1 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
By default, what is shown to you when a value is missing?
Numeric variables, . or the current setting for session option MISSING="<one-char>"
Character variables, . The missing value for character variables is a single space.

Merging time series with different number of observations where variables have the same name (SAS)

I have a bunch of time series data (sas-files) which I like to merge / combine up to a larger table (I am fairly new to SAS).
Filenames:
cq_ts_SYMBOL, where SYMBOL is the respective symbol for each file
with the following structure:
cq_ts_AAA.sas7bdat: file1
SYMBOL DATE TIME BID ASK MID
AAA 20100101 9:30:00 10.375 10.4 .
AAA 20100101 9:31:00 10.38 10.4 .
.
.
AAA 20150101 15:59:00 15 15.1 .
cq_ts_BBB.sas7bdat: file2
SYMBOL DATE TIME BID ASK MID
BBB 20120101 9:30:00 12.375 12.4 .
BBB 20120102 9:31:00 12.38 12.4 .
.
.
BBB 20170101 15:59:00 20 20.1 .
Key characteristics:
- They have the same variable name
- They have different number of observations
- They are all saved in the same folder
So what I want to do is:
- Create 3 tables: BID-table, ASK-table, Mid-table with the following structure, ie for bid-table, cq_ts_bid.sas7bdat:
DATE TIME AAA BBB ...
20100101 9:30:00 10.375 .
20100102 9:31:00 10.38 .
.
.
20120101 9:30:00 9.375 12.375
20120102 9:31:00 9.38 12.38
.
.
20150101 15:59:00 15 17
.
.
20170101 15:59:00 . 20
It is not all to difficult to do it for 2 stock time series, however, I was wondering whether there is the possibility to do the following:
From data set cq_ts_AAA take DATE TIME BID and rename BID to AAA (either from the values in symbol? does this make sense? or get the name from the filename).
Do the same for cq_ts_BBB.
In fact, loop through the folder to get the number of files and filenames (this part I got more or less, see below).
Merge cq_ts_BBB and cq_ts_BBB having DATE TIME AAA (former bid price of AAA) BBB (former bid price of BBB), for all the files in the folder.
Do this for BID, then for ASK and finally MID (actually I couldn't get the midpoint variable from bid and ask (i.e. mid= (bid + ask) / 2;) just gives me the "." in the previous data steps when creating the files).
I think a macro to first get each single file then rename (when should this step take place?) it and merge them together - like a double loop.
Here the renaming and merging part:
data ALDW_short (rename=(iprice = ALDW));
set output.cq_ts_aldw
retain date time ALJ;
run;
data ALJ_short (rename= (iprice = ALJ));
set output.cq_ts_alj;
retain date time datetime ALJ;
run;
data ALDW_ALJ_merged (keep= date itime ALDW ALJ);
merge ALDW_short ALJ_short;
by datetime;
run;
This is the part to loop through the folder and get a list of names:
proc contents data = output._all_ out = outputcont(keep = memname) noprint;
run;
proc sort data = outputcont nodupkey;
by memname;
run;
data _null_;
set outputcont end = last;
by memname;
i+1;
call symputx('name'||trim(left(put(i,8.))),memname);
if last then call symputx('count',i);
run;
Would it make sense to extract the symbol (and how? they have different length) from the filename or just to take them from the variable SYMBOL (and how can I get the one value to rename my column?)?
Somehow I have difficulty changing the order of columns, ie. I tried with retain and format.
Looks like you could do this easily with PROC TRANSPOSE. Combine your datasets into a single dataset.
data all ;
set set output.cq_ts_: ;
by date time;
run;
Then use PROC TRANSPOSE for each of your source variables/target tables.
proc transpose data=all out=bid ;
by date time ;
id symbol;
var bid;
run;
Given your example data a formula for MID of
mid = (bid + ask)/2 ;
Should work. Most likely if you got all missing values you probable put the assignment statement before the SET or INPUT statement. In other words you were trying to calculate using values that had not been read in yet.

SAS INPUT DATA WITH SPECIAL CHARACTERS

I´m trying to import some dat file (comma delimited) to SAS University. However, one variable contains special characters (e.g. french accents). Most are replaced with �, but also some observations have some problems.
Example of a problem:
An original observation in the data looks like this:
Crème Brûlée,105,280
Running the following command:
DATA BenAndJerrys;
INFILE '/folders/myfolders/HW3/BenAndJerrys.dat' DLM = ',' DSD MISSOVER;
INPUT flavor_name :$48. portion_size calories;
RUN;
It has this problem:
flavor_name=Cr�me Br�l�e,105 portion_size=280 calories=
as you can see the value 105 which is the value of portion_size is merged with the value of flavor_name, and the value 280 of calories is assigned to portion_size.
How can solve this problem and allow SAS to import the data with the special characters?
Try telling SAS what encoding to use when reading the file.
I copied and saved your sample line into a text file using Windows NOTEPAD editor.
%let path=C:\Downloads ;
data _null_;
infile "&path\test.txt" dsd encoding=wlatin1;
length x1-x3 $50 ;
input x1-x3;
put (_all_) (=);
run;
Result in the log.
x1=Crème Brûlée x2=105 x3=280
NOTE: 1 record was read from the infile "C:\Downloads\test.txt".
The minimum record length was 20.
The maximum record length was 20.

SAS Programming - DATA Step - Text to SAS Data Set Parsing Issue

I am trying to create a SAS Data Set from a text file. The text file shows data in a format exactly like this:
-HEADER HEADER HEADER
-HEADER HEADER HEADER
April SpringRace Male
$$$$$$$$$$$$$$$$$$$$
Name Age State /these are titles in the text file/
$$$$$$$$$$$$$$$$$$$$
John Smith 30 CA
Mark Doe 49 TX
May SpringRace2 Female
$$$$$$$$$$$$$$
Name Age State
$$$$$$$$$$$$$$
Betty White 50 ME
Jane Smith 37 NY
The issue I am having going through the data step is: by-passing varying header rows and then collecting the "event" data before the ****** titles ******* as variables then skipping over the titles and assigning variables for the actual people. It is a similar format throughout the huge text file. Please can anyone point me in the right direction?
I have been experimenting:
Data work.test;
infile c:\tester dlm=' , $' missover;
input / / / Month $15. EventName $15. Gender $6.
(This is where I get stuck as I do not know how to skip the "Name Age State" in the text file and just assign variables to "John Smith 30 CA" etc.)
run;
I also think there must be a better way to get passed the headers as there is no certainty that they will always only be 2 rows long.
Thanks
I think that using #'my_char_string' column pointer in an INPUT statement would help you, if the titles that separate data values always repeat and you know what they are. For example:
INFILE mydatafile FLOWOVER FIRSTOBS=2;
INPUT month $ race $ sex $ #'State' first_name $ last_name $ address $;
The FIRSTOBS=2 option in INFILE statement skips the HEADER HEADER... row, and the FLOWOVER option tells SAS to keep looking for data on next line, in particular for #'State'. You may need to specify additional options and formatting, depending on your input file format, delimiters etc.
Per your edits, you could use the month value to determine that you are reading the start of an event, and then, using trailing #, retain and some conditional logic, read in your participants on separate lines and retain the event info across the participants, like this (just add all the remaining month names in the first if clause):
data test1;
length test $20 month $20 event $20 gender $20 firstname $20 lastname $20 state $2;
infile "test1.txt" DLM=' $' FIRSTOBS=5;
retain month event gender; * Keep these values from last readin;
input test $ #; /* Read in the first word in the data line being
read into test var, and stay on this line for
now (with #)*/
if strip(test) in('April', 'May') then do; /* If test var contains month,
then read in all of the variables,
and skip the name/age/state titles row*/
input #1 month $ event $ gender $ #'State' firstname $ lastname $ age state $ ;
end;
else do; /* Otherwise, the data line being read in should contain
only names, age and state, so read in those values only.
The month, event and gender values will be kept the same
by the retain statement above.*/
input #1 firstname $ lastname $ age state $ ;
end;
drop test; /* Comment out this drop statement to see whats in test var*/
run;
This code will work with varying numbers of participants per event. But the month cannot be missing in order for this code to work.
Helpful tip: To see what is in the current data line being read in by SAS, try adding
put _INFILE_;
after the INFILE statement. It will print the data lines to your log the way SAS sees them.
Hopefully you solved your problem a long time ago, but here is another suggestion.
Using the trailing # on the input statement lets you apply a second input statement and would be the preferred solution. This solution does not really use the trailing # but I left it in for you to consider in the future.
DATA test;
INFILE 'stacktest.txt' lrecl=200 missover;
length n1 n2 n3 n4 $20. ;
input #1 c1 $1. #1 c2 $2. #1 c5 $5. #1 lne & $75. # ;
keep month event gender fname lname age state;
if c1 = ' ' then return;
if c1 = '-' then return;
if c1 = '$' then return;
if c5 = 'Name' then return;
n1 = scan(lne, 1);
n2 = scan(lne, 2);
n3 = scan(lne, 3);
n4 = scan(lne, -1);
if ( n3 eq 'Male' or n3 eq 'Female') then do;
month = n1 ;
event = n2;
gender = n3 ;
return;
end;
else do ;
* input fname $ lname $ age state $ ;
fname = n1 ;
lname = n2 ;
age = n3 ;
state = n4 ;
output;
end;
retain month event gender;
run;

SAS: sum all values except one

I'm working in SAS and I'm trying to sum all observations, leaving out one each time.
For example, if I have:
Count Name Grade
1 Sam 90
2 Adam 100
3 John 80
4 Max 60
5 Andrea 70
I want to output a value for Sam that is the sum of all grades but his own, and a value for Adam that is a sum of all grades but his own - etc.
Any ideas? Thanks!
You can do it in a single proc sql instead, using key word calculated:
data have;
input Count Name $ Grade;
datalines;
1 Sam 90
2 Adam 100
3 John 80
4 Max 60
5 Andrea 70
;;;;
run;
proc sql;
create table want as
select *, sum(grade) as all_grades, calculated all_grades-grade as minus_grade
from have;
quit;
Here's a nearly one pass solution (it will be about the same speed as a one pass solution if the dataset fits in the read buffer). I actually calculate the mean here instead of just the sum, as I feel that's a more interesting result (and the sum is of course the mean without the division).
data have;
input Count Name $ Grade;
datalines;
1 Sam 90
2 Adam 100
3 John 80
4 Max 60
5 Andrea 70
;;;;
run;
data want;
retain grademean;
if _n_=1 then do;
do _n_ = 1 to nobs_have;
set have(keep=grade) point=_n_ nobs=nobs_have;
gradesum+grade;
end;
grademean=gradesum/nobs_have;
end;
set have;
grade_noti = ((grademean*nobs_have)-grade)/(nobs_have-1);
run;
Calculate the mean, then for each record subtract the portion that record contributed to the mean. This is a super useful technique for stat testing when you want to compare a record to the rest of the population, and you have a complicated class combination where you'd rather do the mean first. In those cases you use PROC MEANS first and then merge it on, then do this subtraction.
proc sql;
create table temp as select
sum(grade) as all_grades
from orig_data;
quit;
proc sql;
create table temp2 as select
a.count,
a.name,
a.grade,
(b.all_grades-a.grade) as sum_other_grades
from orig_data a
left join temp b;
quit;
Haven't tested it but the above should work. It creates a new dataset temp that has the sum of all grades and merges that back to create a new table with the sum of all grades less the current students grade as sum_other_grades.
This solution performs takes each observation of your starting dataset, and then loops through the same dataset summing up grade values for any records with different names, so beginning with 'Sam', we only add the oth_g variable when we find names that are NOT 'Sam':
data want;
set have;
oth_g=0;
do i=1 to n;
set have
(keep=name grade rename=(name=name_loop grade=grade_loop))
nobs=n point=i;
if name^=name_loop then oth_g+grade_loop;
end;
drop grade_loop name_loop i n;
run;
This is a slight modification to the answer #Reese provided above.
proc sql;
create table want as
select *,
(select sum(grade) from have) as all_grades,
calculated all_grades - grade as minus_grade
from have;
quit;
I've rearranged it this way to avoid the below message being printed to the log:
NOTE: The query requires remerging summary statistics back with the original data.
If you see the above message, it almost always means that you have made a mistake. If you actually did mean to remerge summary stats back with the original data, you should do so explicitly (like I have done above by refactoring #reese 's query.
Personally I think the refactored version is also easier to understand.

Resources