SAS procedures Delete and Datasets - dataset

I'm just learning sas and see two interesting procedures.
proc Delete data = table; run;
and
proc datasets lib=Libr nolist;
modify table;
rename __first = second;
quit;
run;
and several questions about them:
why some procedures ended like quit not run?
why datasets use quit and run together? (is this a statement to quit table?)
is it best recommendation to use datasets-procedure for small tasks? (not, of course, but for what? or doesn't use it?)
and also, which method is faster: proc delete or sql-drop? (which have a greater speed and on which amount of data it is necessary)

Some SAS procedures end in QUIT, rather than RUN, because they are running in Interactive Mode. For instance, in PROC GLM, you can specify an additional model statement as long as the top of the SAS window says that PROC GLM is running (if you're using the Windows version).
Some programmers have gotten into the habit of typing QUIT and RUN together. I don't think it actually matters, as procedures that use the QUIT statement begin running as soon as you enter them. I only use one or the other.
PROC DELETE is an unsupported legacy feature; it has been officially superseded by PROC DATASETS, which is the designated tool for handling datasets in SAS. Mailing List Post.
I generally don't find myself with much need to delete datasets while in SAS. Because SAS effectively manages its memory use, and because RAM is so plentiful now, I usually do 90% of my work from temporary datasets which I create on-demand at the start of the session.
As before, PROC DELETE is now deprecated in favor of PROC DATASETS. In terms of which is faster, excluding extremely large data, I would wager that there is little difference between them. When handling permanent SAS datasets, however, I like to use PROC DATASETS rather than PROC SQL, simply because I feel better manipulating permanent datasets using the SAS-designed method, and not the SQL implementation (which isn't 100%) in my opinion.

WRT "run" versus "quit":
Some SAS procedures support something called "run group processing", which means that the procedure performs whatever work it is asked to do when it sees the "run;" statement. The procedure continues to execute until it sees a "quit:" statemnet. In fact, a "quit;" statement will automatically insert a "run;" statement if there is still work to be done.
PROC DATASETS is one of those procedures. The "quit;" statement says that there is no more work for the procedure to do. Consider this trivial example:
proc datasets;
change a=new_a;
run;
delete new_a;
run;
quit;
The first statement (change) renames an existing dataset "a" to "new_a". The second statement will delete that dataset. If you fail to include a "run;" statement (after "change") in this example, the procedure will fail because it will notice that the "new_a" dataset does not exist and so will not execute either statement.
That said, I rarely use PROC DATASETS myself; I prefer to use PROC SQL.
WRT: PROC DELETE versus DROP TABLE with PROC SQL:
Although PROC DELETE is officially "deprecated", all that means is that it will no longer be changed. It is a simple procedure to delete a data object from a SAS library; I use it all the time. It has one special advantage compared to PROC SQL. If you use PROC DELETE to try and delete a dataset that does not exist, you will just get a warning message in the log. However, if you try a DROP TABLE statement from SQL, you will get an error and your SQL step will halt. I use PROC DELETE all the time when creating new tables in my ETL scripts that load into external databases like Teradata and Oracle.
Long-winded, I know, but I hope this helps.
Bob

Regarding whether datasets or sql is faster at deleting tables, I investigated that issue a bit here. Proc SQL was generally faster, which was surprising to me.

I think the following code can delete the SAS datasets in the work library;
proc datasets lib=work memtype=data kill;
run;
quit;

I believe you will find that PROC DELETE has not gone away and will not anytime soon. Moreover, it is often faster than the PROC DATASETS ... DELETE form of deletion for some types of libraries. In my experience, data libraries managed by SPDS with many datasets can cause PROC DATASETS of any kind to have very poor performance, and so I will always use PROC DELETE.

Related

I am struggling with migrating the temp tables (SQL server) to oracle

I am struggling with migrating the temp tables (SQL server) to oracle. Mostly, oracle don't consider to use temporary table inside the store procedure but in sql server, they are using temp tables for small fetching record and also manipulate same.
How to overcome this issue. I am also searching some online articles about migrating temp table to oracle but they are not clearly explained for my expectations.
i got information like using inline view, WITH clause, ref cursor instead of temp table. I am totally confused.
Please suggest me, in which case may use Inline view, WITH clause, ref cursor.
This may be helpful for improve my knowledge and also doing job well.
As always thank you for your valuable time in helping out the newbies.
Thanks
Alsatham hussain
Like many questions, the answer is "it depends". A few things
Oracle's "temp" table is called a GLOBAL TEMPORARY TABLE (GTT). Unlike most other vendor's TEMP tables, their definition is global. Scripts or programs in SQL Server (and others), will create a temp table and that temp table will disappear at the end of a session. This means that the script or program can be rerun or run concurrently by more than one user. However, this will not work with a GTT, since the GTT will remain in existence at the end of the session, so the next run that attempts to create the GTT will fail because it already exists.
So one approach is to pre-create the GTT, just like the rest of the application tables, and then change the program to INSERT into the gtt, rather than creating it.
As others have said, using a CTE Common Table Expression) could potentially work, buy it depends on the motivation for using the TEMP table in the first place. One of the advantages of the temp table is it provides a "checkpoint" in a series of steps, and allows for stats to be gathered on intermediate temporary data sets; it what is a complex set of processing. The CTE does not provided that benefit.
Others "intermediate" objects such as collections could also be used, but they have to be "managed" and do not really provide any of the advantages of being able to collect stats on them.
So as I said at the beginning, you choice of solution will depend somewhat on the motivation for the original temp table in the first place.

Is it a best practice to drop the temp table after using it, in addition to before creating the temp table?

I have a stored proc that creates a temp table. It is only needed for the scope of this stored proc, and no where else.
When I use temp tables list this, I always check to see if the temp table exists, and drop it if it does, before creating it in the stored proc. I.e.:
IF OBJECT_ID('tempdb..#task_role_order') IS NOT NULL
DROP TABLE #task_role_order
CREATE TABLE #task_role_order(...)
Most of the time, is it a best practice to drop the temp table when done with it, in addition to before creating the temp table?
If more context is needed, I have a .NET Web API back end that calls stored procs in the database. I believe that SQL server drops the temp table when the SQL Server session ends. But I don't know if .NET opens a new SQL Server session each time it queries the database, or only once per application lifecycle, etc.
I've read this similar question, but thought that it was slightly different.
Usually, it is considered a good practice to free up resource as long as you don't need it anymore. So I'd add DROP TABLE at the end of stored procedure.
Temporary table lives as long as connection lives. Usually, applications use connection pooling (it is configurable) and connection doesn't close when you call Connection.Close. Before connection re-usage, client executes special stored procedure (sp_reset_connection) which does all clean-up tasks. So temp tables will be dropped in any case, but sometimes after some delay.
It's very unlikely to be of much impact, but if I had to choose I would do neither. Temporary tables are accessible via nested stored procedures, so unless you have a specific need to pass data between procedures, not doing either will help avoid contention if you happen to use the same name, call a procedure recursively, in a circular manner (and it is valid), or you have another procedure that happens to use the same name and columns. Dropping it out of practice could hide some weird logic errors.
For example Proc A creates temporary table, then calls B. B drops and creates the table. Now either Proc A now is referencing the temporary table created, or since Proc A is not nested inside B, Proc A mysteriously fails. It would be better to have proc B fail when it tries to create the temp table.
At the end of the day SQL Server will clean these up, but it won't stop you from leaking between nested procedures.

Export the "functionality" of many stored procedures to script

I have a large number of stored procedures (200+) that all collect clinical data and insert the result into a common table. Each stored procedure accepts the same single parameter, ClientID, and then compiles a list of diagnostic results and inserts them into a master table.
I have each clinical test separated into individual stored procedures however as I described in a previous SO question, the execution of the batch of these stored procedures pegs the CPU at 100% and continues on for hours before eventually failing. This leads me to want to create a single script that contains all the functionality of the stored procedures. Why you ask? Well, because it works. I would prefer to keep the logic in the stored procedure but until I can figure out why the stored procedures are so slow, and failing, I need to proceed with the "script" method.
So, what I am looking to do is to take all the stored procedures and find a way to "script" their functionality out to a single SQL script. I can use the "Tasks => Generate Scripts" wizard but the result contains all the Create Procedure and Begin and End functionality that I don't need.
In the versions of studio, etc. I use, there are options to control whether to script out the "if exists statements".
If you just want to capture the procs without the create statements, you could be able to roll your own pretty easily usig sp_helptext proc
For example, I created this proc
create proc dummy (
#var1 int
, #var2 varchar(10)
) as
begin
return 0
end
When I ran sp_helptext dummy I get pretty much the exact same thing as the output. Comments would also be included
I don't know of any tool that is going to return the "contents" without the create, as the formal parameters are part of the create or alter statement. Which probably leaves you using perl, python, whatever to copy out the create statement -- you lose the parameters -- though I suppose you could change those into comments.

Which method is best method for speed? in SQL Server, stored procedure

I have select, insert, update and delete query.
If I have to write all queries in the same stored procedure that is good for performance or should I write all queries in separate stored procedures?
For ease of maintenance, I would go for separate procedures.
Speed is not going to be an issue if the same code is in one proc or multiple procs - as long as the code is the same, it will behave in the same way in one proc or many.
The best method to get good speed is to write an efficient query. Run it and review the execution plan; then tune the query where required.
You will find a lot of good information on query tuning and index tuning on this site (just search for it).
If it is something like, and all the parameters are manageable:
BEGIN TRANSACTION
INSERT
....
UPDATE
...
DELETE
COMMIT
yes, all in one will eliminate the little overhead of multiple calls and keep the logic together as a unit.
however if it is:
#ParamType char(1) --given parameter "I"nsert, "U"pdate, "D"elete
IF #ParamType='I'
INSERT
ELSE IF #ParamType='U'
UPDATE
ELSE
DELETE
split them up into separate procedures, they make no sense to be combined together.

SQL Cursor w/Stored Procedure versus Query with UDF

I'm trying to optimize a stored procedure I'm maintaining, and am wondering if anyone can clue me in to the performance benefits/penalities of the options below. For my solution, I basically need to run a conversion program on an image stored in an IMAGE column in a table. The conversion process lives in an external .EXE file. Here are my options:
Pull the results of the target table into a temporary table, and then use a cursor to go over each row in the table and run a stored procedure on the IMAGE column. The stored proc calls out to the .EXE.
Create a UDF that calls the .EXE file, and run a SQL query similar to "select UDFNAME(Image_Col) from TargetTable".
I guess what I'm looking for is an idea of how much overhead would be added by the creation of the cursor, instead of doing it as a set?
Some additional info:
The size of the set in this case is max. 1000
As an answer mentions below, if done as a set with a UDF, will that mean that the external program is opened 1000 times all at once? Or are there optimizations in place for that? Obviously, on a multi-processor system, it may not be a bad thing to have multiple instances of the process running, but 1000 might be a bit much.
define set base in this context?
If you have 100 rows will this open up the app 100 times in one shot? I would say test and just because you can call an extended proc from a UDF I would still use a cursor for this because setbased doesn't matter in this case since you are not manipulating data in the tables directly
I did a little testing and experimenting, and when done in a UDF, it does indeed process each row at a time - SQL server doesn't run 100 processes for each of the 100 rows (I didn't think it would).
However, I still believe that doing this as a UDF instead of as a cursor would be better, because my research tends to show that the extra overhead of having to pull the data out in the cursor would slow things down. It may not make a huge difference, but it might save time versus pulling all of the data out into a temporary table first.

Resources