Upsert into SQL Server from SAS - sql-server

I've got several datasets which need to be upserted into a SQL server database from SAS (my environment uses SAS DI 4.9).
The default table loader transformation that comes packaged with SAS DI offers an Update/Insert load style, with options to match by SQL set, column, or index. None of these works for me, instead throwing the error
ERROR: CLI execute error: [SAS][ODBC SQL Server Wire Protocol driver][Microsoft SQL Server]A cursor with the name
'SQL_CUR608F0C44282B0000' does not exist.
This SAS note indicates that this issue may be related to the version of the DataDirect driver and that there are workarounds, but the workaround for the version of SAS running in my environment causes poor read performance (which isn't acceptable for my needs). The environment is administered by IT.
What I'd like to do is leverage SAS DI's custom transformation abilities to build something that works the way the Table Loader transformation should have for users with my setup. This would entail some SQL pass-through which uses an update + insert approach, but where the column and table names are programmatically determined from the inputs and outputs to the transformation, and the match columns are specified by the user as with the default transformation.
This requires some serious macro magic.
Here's what I've tried for just the update portion (with anonymized info in [ square brackets ]):
%let conn = %str([my libname]);
%let where_clause = &match_col0 = &match_col0;
%macro custom_upsert;
data _null_;
put 'proc sql;';
put 'connect to ODBC(&conn);';
put "execute(update &_OUTPUT";
%do i=1 %to &_OUTPUT_col_count;
put '&&_OUTPUT_col_&i_name = &&_OUTPUT_col_&i_name';
%end;
put 'from &_OUTPUT join &_INPUT on';
put 'where &where_clause';
put ') by ODBC;';
put 'quit;';
run;
%mend;
%custom_upsert;
But this is failing with errors about unbalanced quotation marks and the quoted string exceeding 262 characters.
How can I get this working as intended?
EDIT
Here is the SQL server code that I am ultimately trying to get at with my SAS code, with the major difference here being that the SQL code references two SQL server tables but in reality I'm trying to update from a SAS table:
begin
update trgt_tbl
set col1 = col1
, ...
,coln = coln
from trgt_tbl
join upd_tbl
on trgt_tbl.match_col = upd_tbl.match_col;
insert into trgt_tbl
select * from
(select
col1
, ...
,coln
from upd_tbl) as temp
where not exists
(select 1 from trgt_tbl
where match_col = temp.match_col);
end

The macro could generate the SQL code directly, not output the desired code to log (which put will do). However, you could also put to a file that will be submitted via %include. The code gen into the file still has macro resolution references (&&) due to the single quoted put. Thus, those macro variables to be resolved must be existent in the scope at the %include time.
%macro myupsert;
filename myupsert 'c:\temp\passthrough-upsert.sas';
data _null_;
file myupsert;
…
/* same puts */
…
run;
%include myupsert;
filename myupsert;
%mend;
%myupsert;

Related

Converting space delimited string for use in SAS PROC SQL IN statement with SQL Server

I have a macro function which is defined as follows:
%MACRO Data_Load( Years );
LIBNAME CCDW_LIB
ODBC
CONNECTION=SHAREDREAD
COMPLETE="DRIVER=SQL Server Native Client 11.0;SERVER=&CCDW_Server_Name;Trusted_Connection=Yes;DATABASE=&CCDW_Data_DB;"
SCHEMA="&CCDW_Data_Schema"
PRESERVE_TAB_NAMES=YES
PRESERVE_COL_NAMES=YES
;
/* Server and database details obscured for obvious reasons */
PROC SQL NOPRINT;
CREATE TABLE WORK.TABLE1 AS
SELECT ID
, VAL1
FROM CCDW_LIB.TABLE1
WHERE YR IN ( &Years )
;
QUIT; RUN;
%MEND;
When I invoke this as %Data_Load( 2018 ) I get an error because YR is actually defined as a VARCHAR and not a NUMERIC. So I tried adding a call to SepList in the WHERE clause (WHERE YR IN ( %SepList( &Years, nest=Q ) )), but this gets an syntax error, even though the MPRINT statement is a correctly formed SQL statement. If I put '2018' in a macro variable prior to the PROC SQL call and then use that variable, the SQL statement runs fine. In fact, I added the following just to see if they were the same and they were.
%LET Years_IN='2018';
%LET Years_IN1=%SepList( &Years, nest=Q );
%Log( "Years_IN = [&Years_IN]");
%IF &Years_IN1=&Years_IN %THEN %DO;
%Log("They Match");
%END;
%ELSE %DO;
%Log("The DONT Match");
%END;
I want to use SepList as the calling program may need more than one year. Any ideas what I am doing wrong? I am running on SAS 9.4 TS Level 1M5 on X64_10PRO if that matters.
Try adding the below custom function, cquote(). It converts a space-delimited list into an individually quoted, comma delimited list. For example, 2012 2013 2014 will be converted into '2012','2013','2014'.
It's a great function to keep in your custom function toolbox. You don't have to use proc fcmp, but it will prevent you from having a huge macro variable full of %sysfunc().
If you get an error that says something about the string being too long, this is a bug in 9.4M5 and a hotfix exists for it. You can safely ignore the error.
proc fcmp outlib=work.funcs.funcs;
function cquote(str $) $;
length result $32767;
result = cats("'",tranwrd(cats(compbl(str))," ", "','"),"'");
return (result);
endsub;
run;
options cmplib=work.funcs;
%let years = 2012 2013 2014;
%let yearcq = %sysfunc(cquote(&years.));
I'm assuming you're using Richard DeVenezia's excellent function-style utility macro %seplist:
https://www.devenezia.com/downloads/sas/macros/index.php?m=seplist.
Note that when you specify nest=Q it introduces some macro quoting.
Whenever the MPRINT log looks good and you get an error, and there is macro quoting going on, try explicitly unquoting. (SAS should unquote automatically, but it doesn't always).
So try:
WHERE YR IN (%unquote(&Years))
You could also change the last line of the macro definition to be:
%unquote(&emit)
so that it will unquote the value before it is returned.

Extracting SQL Server data into SAS with datetime filter

I'm pulling data into SAS from a SQL server database with a (SAS DI) extract transformation. The data in the table goes back several years and has a little over 16 million rows; I only need data for the last few years which should amount to roughly 2.6 million rows.
So my extract uses a datetime filter. The autogenerated proc sql looks as follows (after obfuscating the libref, table, and column names):
%let SYSLAST = %nrquote(LIBREF.SRC_TBL);
proc sql;
create table work.WFJVX4PU as
select
col1,
col2,
dt_col,
col3
from &SYSLAST
where dt_col >= &start_dt
;
quit;
The code works and returns the desired rows, but it takes far too long to run when compared with executing a similar query directly on the SQL server. When inspecting the statistics of the execution I discovered that the entire table was brought into SAS before the where clause was applied.
I remember reading that this will occur when attempting to use filters that apply functions to source data (a la datepart(dt_col) >= input("&start_date",date9.)) which is why I tried to pass the datetime used by the filter directly to the SQL engine. I also tried the "datepart" approach and got the same result.
Is there something else I should be doing to apply the filter server side before bringing the data into SAS? This isn't the experience I've had in the past when working with other database tables (e.g. Teradata, MySQL, Oracle, etc).
Additional Details:
The macro variable start_date is defined using the SAS DI prompt creation tool; the auto-generated code for this is
%let start_date_label = 730 days ago (July 02, 2017);
%let start_date_rel = D-730D;
%let start_date = 02Jul2017;
I then create the datetime macro variable start_dt by executing the following in the precode of the job:
%let start_dt = %sysfunc(dhms("&start_date"d,0,0,0));
Below is the libname statement (after obfuscation):
LIBNAME MYLIB SQLSVR CONNECTION=SHARED PRESERVE_TAB_NAMES=YES dbconinit="set ansi_warnings off;use MY_DB_NAME;set nocount on;" Datasrc=MY_DATA_SRC SCHEMA=dbo AUTHDOMAIN="my_sql_authentication" ;
If I modify the where clause to use a literal of the form '2019-06-25' then SAS throws the error
ERROR: Expression using greater than or equal (>=) has components that
are of different data types.
because the dt_col field is of type numeric (format datetime22.3) on the SAS instance of the SQL server table that is registered from the library. If I use a literal as follows:
dt_col >= '25JUN2019'D
then I do get the desired set of rows (despite comparison between a datetime field and a date literal), but the query still takes a long time to execute and the job statistics indicate SAS is still grabbing all 16 million rows to perform this task.
UPDATE
I'm having issues following Tom's advice below. If I execute the following code:
LIBNAME MYLIB SQLSVR CONNECTION=SHARED PRESERVE_TAB_NAMES=YES dbconinit="set ansi_warnings off;use MYDB;set nocount on;" Datasrc=MYSRC SCHEMA=dbo AUTHDOMAIN="my_sql_authentication" ;
/*---- Map the columns ----*/
proc datasets lib = work nolist nowarn memtype = (data view);
delete sql_psthru2;
quit;
proc sql;
create table work.sql_psthru2 as
select
col1,
col2,
dt_col,
col3
from MYLIB.MYTBL
where dt_col>= '25JUN2019'd
;
quit;
then I get data from the database, but if I execute
LIBNAME MYLIB SQLSVR CONNECTION=SHARED PRESERVE_TAB_NAMES=YES dbconinit="set ansi_warnings off;use MYDB;set nocount on;" Datasrc=MYSRC SCHEMA=dbo AUTHDOMAIN="my_sql_authentication" ;
proc sql;
connect using MYLIB;
create table work.sql_psthru as
select * from connection to MYLIB
(select
col1
,col2
,dt_col
from MYTBL
where dt_col >= '2019-06-25'
)
;
quit;
then I receive the error
ERROR: CLI error trying to establish connection: [DataDirect][ODBC
lib] Data source name not found and no default driver specified
immediately after the connect using MYLIB; line.
I've tried many variants of the explicit passthru which I've found all over the internet which I won't post here, but none worked.
An interesting side note is that I believe the reason the SAS DI statistics are indicating that all 16 million rows are returned is that the extract transformation auto-generates the following macro:
%macro etls_recordCheck;
%let etls_recCheckExist = %eval(%sysfunc(exist(MYLIB.MYTBL, DATA)) or
%sysfunc(exist(MYLIB.MYTBL, VIEW)));
%if (&etls_recCheckExist) %then
%do;
proc sql noprint;
select count(*) into :etls_recnt from MYLIB.MYTBL;
quit;
%end;
%mend etls_recordCheck;
%etls_recordCheck;
So the culprit in the long execution time is not that the full dataset is being returned to SAS (I removed the macro and the code still takes far too long to run).
You could try explicitly writing the code in the remote database. So if you already have a libref named LIBREF defined you can use that in the CONNECT statement in PROC SQL.
proc sql;
connect using libref ;
create table WFJVX4PU as
select * from connection to libef
(select
col1
,col2
,dt_col
,col3
from SRC_TBL
where dt_col >= &start_dt
)
;
quit;
Just make sure everything inside the () is valid syntax for that database system. Included the values of the macro variable START_DT.

SAS query issue on external DBMS TABLE where Column Name has space

Through SAS/ACCESS, I can successfully run data steps querying external DBMS tables. E.g.,
Data OutTable;
Set ExternalDBMS.Table1;
Where Var1 ='abc';
Run;
However, when column name has space, it caused a problem even I used ''n.
One example as shown below:
Data OutTable;
Set ExternalDBMS.Table1;
Where 'Var 2'n ='abc';
Run;
ERROR: CLI open cursor error: [SAS][ODBC SQL Server Wire Protocol driver][Microsoft SQL Server]Incorrect syntax near the keyword 'Function'.
Further try with SAS Option validvarname=v7 to standardize the var names with spaces still caused same error.
After using SAS Option sastrace=',,,d' I found that SAS/ACCESS submitted statement to SQL server like this:
SELECT Var 1, .....
FROM schema1.Table1
WHERE (Var 1 ='abc' );
Apparently the code above would cause error in SQL server side because the Var 1 was neither quoted nor bracketed.
One way to fix it is using explicit pass-through query. I'm just wondering if there's any other ways to solve this problem too.
Thanks in advance!
when using an explicit pass-through query, put a set of square brackets around the variable name. This would be similar to how you'd write your code in SSMS.
SELECT [Var 1], ...
FROM schema1.Table1
WHERE ([Var 1] ='abc' );

Run SQL statement from file to create data in SAS

I have very little experience in SAS. I do have experience in SQL.
I want to do the following:
- Use a SQL statement that is stored in a text file to import data into SAS.
What works is to copy and paste the SQL server query and run it as a pass-through query in SAS. I get the data (after a few minutes).
But I would like to be able to manage and develop the SQL script in SSMS, and store the script in a sql file. So I tried the following:
proc sql;
connect to ODBC("dsn=DatabaseOfInterest");
create table NewDataSet as select * from connection to odbc(
%include 'C:\sqlscript.sql';
);
quit ;
This does not work and creates the following error:
**ERROR: CLI prepare error:
[Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near '%'.
**
Is there a way to achieve this?
I don't know if there's a truly clean way to work around this. The issue is that the connect to SQL is passing %include to the SQL parser, which is of course incorrect compared to what you intend.
It will, however, correctly resolve macros and macro variables, so you can read your SQL command into a macro variable and use it that way. One way to do that is below.
filename tempfile temp; *imaginary file - this would be your SQL script;
data _null_; *creating a mocked up SQL script file;
file tempfile;
put "select * from country";
run;
data _null_; *reading the SQL script into a variable, hopefully under 32767?;
infile tempfile recfm=f lrecl=32767 pad;
input #1 sqlcode $32767.;
call symputx('sqlcode',sqlcode); *putting it into a macro variable;
run;
proc sql; *using it;
connect to oledb(init_string=&dev_string);
select * from connection to oledb(
&sqlcode.
);
quit;
The file containing your SQL code is C:\sqlscript.sql. I'll assume it looks something like this:
select * from mytable;
Edit the file so that it now looks like this...
%macro sqlscript;
select * from mytable;
%mend;
... and then rename the file extension to C:\sqlscript.sas.
Finally, change your proc sql code to look like this:
options sasautos = ("c:\", sasautos);
proc sql;
connect to ODBC("dsn=DatabaseOfInterest");
create table NewDataSet as select * from connection to odbc
(
%sqlscript;
);
quit;
Explanation: The %include statement you tried to use although it uses a % sign and looks like macro code can't really be substituted in any random point in code as it is a SAS statement. It's really meant to be issued outside of PROC statements and data steps (it probably shouldn't even have a %sign infront of it but unforunately that's how SAS designed it...). So that's why it won't work.
SAS provides the ability to search for macro functions outside of the current program being run. If you call a macro function that isn't defined in your current SAS program (in this case %sqlscript), it's going to go look for it in the list of pathnames specified in the SASAUTOS option. If it finds a file in one of the SASAUTOS pathnames that exactly matches the macro it's searching for, and if the contents of that file contain a definition for the macro, SAS will compile and run that macro. In the above example, the macro simply substitutes in the SQL code contained within it.
In the options sasautos= statement - we are simply prepending the c:\ path to the existing list of pathnames currently in SASAUTOS. It will search the pathnames in order, and I'm assuming we want our custom macros to override any existing macros if there happens to be a conflict. You only need to specifyt options sasautos= once per SAS session, so don't copy/paste it before every proc sql statement.
Documentation for SASAUTOS . These are also known as autocall macros so that should turn up some useful hits in google too.
Also - obviously I don't recommend storing code in c:\ so adjust as necessary. A note to non-windows users - macro names and definitions are case sensitive so be consistent!
Based on feedback from my previous answer I've provided an alternate approach below that should better address your exact needs.
The code below shows how the final program will 'work' once it is all combined together. We are going to take this code and split it into different files as indicated by the comments:
%macro myQuery; /* FILE 1 - header.sas */
select * from myTable; /* FILE 2 - query.sql */
%mend; /* FILE 3 - footer.sas */
/* BEGIN FILE 4 - main.sas */
proc sql;
connect to ODBC("dsn=DatabaseOfInterest");
create table NewDataSet as
select *
from connection to odbc
(
%myQuery;
);
quit ;
/* END FILE 4 */
FILE1 - "header.sas" will look like:
%macro myQuery;
FILE2 - "query.sql" will look like:
select * from myTable;
FILE3 - "footer.sas" will look like:
%mend;
FILE4 will become:
%include "c:\header.sas"
"c:\query.sql"
"c:\footer.sas"
;
proc sql;
connect to ODBC("dsn=DatabaseOfInterest");
create table NewDataSet as
select *
from connection to odbc
(
%myQuery;
);
quit ;
You can see that we are defining the macro using the include statements. The query which is the body of the macro will be kept separate in it's own .sql file. This should allow you to continue to edit/submit your queries via both SAS and your favorite SQL editor. The header and footer files can be re-used if you have multiple query files.

How to pass macro variable to PROC SQL on IN statement in WHERE clause on MS SQL Server

I have a table in MS SQL Server that looks like:
ID, Code
01, A
02, A
03, B
04, C
...
and is defined in SAS as
LIBNAME MSSQLDB ODBC
CONNECTION=SHAREDREAD
COMPLETE='Description=OIPE DW (Dev);DRIVER=SQL Server Native Client 11.0;SERVER=Amazon;Trusted_Connection=Yes;DATABASE=OIPEDW_Dev;'
SCHEMA='dbo'
PRESERVE_TAB_NAMES=YES
PRESERVE_COL_NAMES=YES;
I have a SAS dataset that has records of the same format as MSSQLDB (ID and Code variables) but is just a subset of the full database.
I would like to do the following:
PROC SQL NOPRINT;
/* If SASDS contains just codes A and B, CodeVar=A B
SELECT DISCTINCT CODE INTO :CodeVar SEPARATED BY ' ' FROM SASDS;
QUIT;
/* seplist is a macro that wraps each code in a quote */
%LET CodeInVar=%seplist( &CodeVar, nest=%STR(") );
PROC SQL;
DELETE * FROM MSSQLDB WHERE CODE IN (&CodeInVar);
/* Should execute DELETE * FROM MSSQL WHERE CODE IN ('A','B');
QUIT;
The problem is this generates a syntax error on the values in the &CodeInVar macro variable.
Any idea how to pass the macro variable value to SQL Server in the IN statement?
I think you have a few problems here; hopefully some of them are just transcription errors.
First off, this will not do anything:
PROC SQL;
DELETE * FROM MSSQLDB WHERE CODE IN (&CodeInVar);
/* Should execute DELETE * FROM MSSQL WHERE CODE IN ('A','B');
QUIT;
MSSQLDB is your libname, not the table; you need to define it as MSSQLDB.dbname here. Perhaps that's just a copying error.
Fundamentally there's nothing explicitly wrong with what you've typed. I would suggest first identifying if there are any problems with your macro variable. Put a %put statement in there:
%put &codeinvar.;
See what that outputs. Is it what you wanted? If not, then fix that part (the macro, presumably).
I would say that there are a lot of better ways to do this. First off, you don't need to add a macro to add commas or quotes or anything.
PROC SQL NOPRINT;
/* If SASDS contains just codes A and B, CodeVar=A B */
SELECT DISCTINCT cats("'",CODE,"'") INTO :CodeVar SEPARATED BY ',' FROM SASDS;
QUIT;
That should get you &codevar precisely as you want [ie, 'A','B' ].
Secondly, since you're using LIBNAME and not passthrough SQL, consider using SQL syntax rather than this entirely.
proc sql;
delete from MSSQLDB.yourtable Y where exists
(select 1 from SASDS S where S.code=Y.code);
quit;
That is sometimes faster, depending on the circumstances (it also could be slower). If code is something that has a high frequency, summarize it using PROC FREQ or a SQL query first.

Resources