Importing data from excel to SAS - database

One interview question--I didn't get answer of this, please help me to solve this.
In excel file variable name having space in between (e.g- Shop Name), if we will bring excel data to the sas. How we will bring as it is, bcz in sas dataset, space is not allowed between the variable name?
Code:
proc import datafile='/home/roshnigupta16020/test (2).xlsx' out=testexcel dbms=xlsx replace; getnames=yes; run;

Your PROC IMPORT syntax is good.
proc import datafile='/home/roshnigupta16020/test (2).xlsx'
out=testexcel dbms=xlsx replace
;
getnames=yes;
run;
Depending on the setting of the VALIDVARNAME option PROC IMPORT will create different names for the variables.
With VALIDVARNAME=ANY the names will include the spaces. Which means that to use the name in your SAS code you will need to use name literals, like 'Column 1'n.
With other settings, like VALIDVARNAME=V7, then PROC IMPORT will replace invalid characters, like spaces, with underscores. Then the name will not exactly match the column header in the spreadsheet. But the name will be something like Column_1 which is easier to include in your SAS code.

Related

Macro that loads multiple datasets that updates and change names?

Im working with a database in SAS that updates every so often. I want the macro to automatically load the most recent dataset of a given year. The datasets cover the years 2015-2018 and each year has a different updated version which is stated in the name of the dataset, i.e. 2015_version9. With my current code you need to update the macro manually everytime a dataset change its version and name.
You can scan through each library and find the max version number, then save those to a single macro variable string that you can supply to a set statement. Here are the assumptions of this solution:
Your libraries are named lib_2015, lib_2016, etc. and follow 8-char libname requirements
Your libraries are static for years 2015-2018
Your datasets are named _version1, _version2, etc.
Here's how we'll do it.
%let libraries = "LIB_2015", "LIB_2016", "LIB_2017", "LIB_2018";
proc sql noprint;
select cats(libname, '.', memname)
, input(compress(memname,,'KD'), 8.) as version
into :data separated by ' '
from dictionary.members
where upcase(libname) IN(&libraries.)
AND upcase(memname) LIKE "^_VERSION%" escape '^'
group by libname
having version = max(version)
;
quit;
data want;
set &data. indsname=name;
dsn = name;
run;
This code does the following:
Gets all dataset names from each library that starts with _VERSION. The ^ in the like clause is an escape character that we defined so that we can match _ literally.
Removes all non-digits from the dataset name and converts it to a version number, version. The KD option in the compress() function says to keep only digits from the string.
Keeps only names in each library where version is the highest value
Saves all the dataset names to a single macro variable, &data
&data will store a string of all the relevant datasets you want with the highest version number for each library. For example:
%put &data.;
LIB_2015._VERSION9 LIB_2016._VERSION19 LIB_2017._VERSION12 LIB_2018._VERSION8
The indsname option in the data step will store the full dataset name of each observation. We're saving that to a variable named dsn. This shows where each observation comes from so you can split them out to individual datasets as needed.

Retaining Header name while importing flat file into sql

I am trying to import a flat file into sql. My headers look like this in notepad
SCSItem.[Item],SCSItem.[PhaseOutItemType]
But when I import this into sql using "Import Data" it removes the period and the bracket. This is what it looks like after the import
Is there a way to retain the header info ?
Both periods . and square brackets [] are reserved syntax in SQL Server. If you want the field name to be:
SCSItem.[Item]
Then you need to use the other, ANSI Standard, identifier quote, which is ".
For example:
CREATE TABLE has_brackets
("SCSItem.[Item]" nvarchar(100)
,"SCSItem.[PhaseOutItemType]" nvarchar(100)
);

Importing Excel XLS file into SSIS - converting header row and adding that to underlying records

I have an Excel XLS file which we would like to import to a database using SSIS.
The format of the file is as below
This is the Input File
I would like to convert it to insert into a SQL table in the following format
SQL table layout
Any ideas on the best way to achieve this?
I have a second question on the way to split the Investor Name/Address cell into multiple columns, but will put that in a separate question.
Thanks in Advance
Steve
The only way to do a transformation as complex as this is with a script task.
I'm on phone now.
Here's what i suggest
Import rows a$5:g
Import excel.
Conditional split out null accounts.
It seems like all investor account hrs are numbers
Add script component as transform
On output add your 3 columns you want.
Outside row transform add 3 variables to meet your requirement ( ie string adv; et al )
Inside row
If(!int.tryparse(row.investorid)
{Adv= row.investmentid
...
}
Else
{
Row.advisor=adv
...
}
Done w script
Add conditional split to ignore null advisors.

Exporting SPSS variable labels

I am using Stata for data analysis but had to convert the dataset I am using from SPSS, which includes variable labels, by saving it as a .csv file. However, the variable labels were not exported to Stata in the process.
I have followed the advice in this question (In SPSS, is it possible to export a dataset file to .CSV with the value names instead of the value numbers?) but this one only refers to the value labels, not the variable labels.
How do I export the SPSS variable labels?
You can export the variable labels using the DISPLAY DICTIONARY. SPSS syntax. You can also find this in the menu: File -> Display Data File Information -> Working File. A table with the category labels of all variables appears in the output window.
You can export the contents of the output window into formats understood by other software, including html, txt, xls.
Then you can extract the labels from the exported file and re-format them for use in Stata. I would use txt export and a Python script to produce a Stata program.
You can not do it with csv. In SPSS with save as (instead of export data) you can save your dataset as dta, in Stata format. Just chose the most recent Stata format your SPSS knows. I think then you have both the value labels and the variable labels.
I realize this is an old question, but just in case someone else is looking for how to do this.
If you export your SPSS file to Excel, there is an option to save variable labels instead of variable names as the column headers. It's clunky, but you can:
1) Export to Excel once with variable names,
2) Export to Excel once with variable labels
3) Paste special -> transpose the two next to each other
And you'll get a crosswalk from variable name to variable label

showing specific columns from loaded from CSV file in SAS

i have a question to ask,
i'm dealing with a small csv database where i need to perform some calculations with SAS, i have exported an excel file to CSV format and i want to load some columns in SAS to work with, the problem i have encountered is the order of column mismatch after loading: here is the code :
cars6.txt
AMC,Concord,22,2930,4099
AMC,Concord,22,2930,4099
AMC,Pacer,17,3350,4749
AMC,Spirit,22,2640,3799
Buick,Century,20,3250,4816
Buick,Electra,15,4080,7827
code to output data:
DATA cars6;
INFILE "/folders/myfolders/hbv1/cars6.txt" delimiter=',';
INPUT make $ model $ mpg $ weight price;
RUN;
TITLE "cars5 data";
PROC PRINT DATA=cars5(OBS=5);
RUN;
but i want to show only the columns: Make , weight, price ?
so how to print selected columns ?
and how to do that if i have named columns (the example differs from this one only by column names 'variables' at the beginning) but i have tried to call the columns y name , it printed them but with bad data ( the sas is taking ordered column data and ignoring the column data based on column name :
input make $ model $ price $;
thank You.
If you are writing you own program to read a CSV file then you probably want to use the DSD and FIRSTOBS=2 options on the INFILE statement. This will treat missing values properly and skip the line with the variable names. You also probably want to add the TRUNCOVER option to properly handle lines that have only some of the columns. It is worth the extra work to properly define your variables by including LENGTH or ATTRIB statements. Otherwise SAS will have to guess whether you want numeric or character variables and how long to make the character variables from the way that you first reference them.
DATA cars6;
INFILE "/folders/myfolders/hbv1/cars6.txt" DSD DLM=',' FIRSTOBS=2 TRUNCOVER;
LENGTH make model $20 mpg weight price 8 ;
INPUT make model mpg weight price;
RUN;
But your program will need to know the order of the variables in the file. If your data files are inconsistent then you can try using PROC IMPORT to read the CSV file. It can take the names from the first row and make an educated guess at what the variable types are.
proc import datafile='/folders/myfolders/hbv1/cars6.txt' out=car6 replace dbms=dlm ;
delimiter=',';
getnames=yes;
run;
When using the data from the SAS dataset you have created you can use the SAS language to select the columns of interest. Syntax will depend on the procedure you are using. So for PROC PRINT use the VAR statement.
proc print data=car6 ;
var price make model;
run;
And for PROC FREQ use the TABLES statement.
proc freq data=car6;
tables make model;
run;
Consider using proc import and select columns as needed in print. Proc import can handle comma-separated files saved as either .txt or .csv. Below is demonstration of either text file type:
%Let fpath = /folders/myfolders/hbv1;
** READING IN TXT;
proc import
datafile = "&fpath/cars6.txt"
out = Cars6
dbms = csv replace;
run;
** READING IN CSV;
proc import
datafile = "&fpath/cars6.csv"
out = Cars6
dbms = csv replace;
run;
title "cars6 data";
proc print data=cars1(obs=5);
var make model price;
run;
Alternatively, you can drop variables and re-order needed columns for report with retain:
data CarsReport;
retain make model price;
set Cars6;
keep make model price;
run;
title "cars6 data";
proc print data=CarsReport(obs=5);
run;
Try the VAR statement in PROC PRINT.
DATA cars6;
INFILE "/folders/myfolders/hbv1/cars6.txt" delimiter=',' firstobs=2;
INPUT make $ model $ mpg $ weight price;
RUN;
proc print data=cars6 noobs;
var make weight price;
run;

Resources