I have a Node.js app that is handling database build / versioning by reading multiple .sql files and executing them as transactions.
The problem I am running into is that these database build scripts require a lot of GO statements, as you cannot execute multiple CREATEs, etc. in the same context.
However, GO is not T-SQL and errors when used outside of a Microsoft application context.
Take the following shorthand:
CREATE database foo /* ... */
use [foo]
CREATE TABLE bar /* ... */
This would error if GO statements were not injected between each line.
I would rather not break this into multiple .sql files for every separate transaction - I am building a database and that would turn into hundreds of files!
I could run String.split() functions on all go statements and have node execute each as a separate transaction, but that seems very hacky.
Is there any standard or best-practice solution to this type of problem?
Update
Looks like a semicolon will do the trick for everything except CREATE statements on functions, stored procedures, etc. Doesn't apply to tables or databases, though which is good.
I ended up:
Parsing the SQL file into an array of statements split on lines whose only content is GO statements & comments,
Creating a SQL transaction,
Executing all statements in the array in sequence,
Conditionally rolling back the entire transaction on any single statement fail,
Ending the transaction once all queries have completed synchronously.
A bit of work, but I think that was the best way to do it.
Related
I have SQL scripts for creating the Partition Function & Partition Scheme. I want the migrations to be taken care of via flyway scripts. I would like to know whether these SQL scripts can be considered as Repeatable scripts or Versioned Scripts?
Similarly, I have scripts for SEQUENCE creation, should this be considered as Versioned Scripts or Repeatable Scripts?
You can make this a repeatable script only if it's got a check for the existence of the function before you attempt to create it. Otherwise, it'll try to create it on every deployment, causing errors. Something along the lines of:
IF NOT EXISTS( SELECT * FROM sys.partition_functions WHERE name = 'MyFunction' )
BEGIN
CREATE PARTITION FUNCTION...
END
You do the same thing for SEQUENCE, but you want to look for it here: sys.sequences. That's it.
Although, probably, I wouldn't make SEQUENCE or PARTITION repeatable. They're usually created once and then you're done. However, I can absolutely imagine circumstances where you're doing all sorts of different kinds of deployments to more than one system where having this stuff be repeatable, to ensure it's always there, regardless of the version, to be a good idea.
I have a huge packages in PL/SQL which executed a delete command on a table A.
I want to determine the procedure responsible for this operation.
My idea is to create a trigger on this table to track the "file" and "line" of the deletion.
Is that possible in PL/SQL like in C++ or PHP through file and line macros.
Otherwise, Is it possible for a trigger to find the name of the stored procedure that deleted data?
Thanks.
Is this ongoing tracking of rows deleted, or just the occasional check of "Hmmm....a SQL deleted some rows, I wonder where it came from?". If it is the latter, you can get that from V$SQL which tracks the originating PLSQL module that issued the SQL. The columns are PROGRAM_ID and PROGRAM_LINE#.
If you need a full example, I've got one on my site
https://connor-mcdonald.com/2016/01/20/problematic-sql-plsql-is-your-friend/
I'm working on writing a migration script for a database, and am hoping to make it idempotent, so we can safely run it any number of times without fear of it altering the database (/ migrating data) beyond the first attempt.
Part of this migration involves removing columns from a table, but inserting that data into another table first. To do so, I have something along these lines.
IF EXISTS
(SELECT * FROM sys.columns
WHERE object_id = OBJECT_ID('TableToBeModified')
AND name = 'ColumnToBeDropped')
BEGIN
CREATE TABLE MigrationTable (
Id int,
ColumnToBeDropped varchar
);
INSERT INTO MigrationTable
(Id, ColumnToBeDropped)
SELECT Id, ColumnToBeDropped
FROM TableToBeModified;
END
The first time through, this works fine, since it still exists. However, on subsequent attempts, it fails because the column no longer exists. I understand that the entire script is evaluated, and I could instead put the inner contents into an EXEC statement, but is that really the best solution to this problem, or is there another, still potentially "validity enforced" option?
I understand that the entire script is evaluated, and I could instead put the inner contents into an EXEC statement, but is that really the best solution to this problem
Yes. There are several scenarios in which you would want to push off the parsing validation due to dependencies elsewhere in the script. I will even sometimes put things into an EXEC, even if there are no current problems, to ensure that there won't be as either the rest of the script changes or the environment due to addition changes made after the current rollout script was developed. Minorly, it helps break things up visually.
While there can be permissions issues related to breaking ownership changing due to using Dynamic SQL, that is rarely a concern for a rollout script, and not a problem I have ever run into.
If we are not sure that the script will work or not specially migrating database.
However, For query to updated data related change, i will execute script with BEGIN TRAN and check result is expected then we need to perform COMMIT TRAN otherwise ROLLBACK transaction, so it will discard transaction.
Do I need GO before and after SET IDENTITY_INSERT to ON/OFF?
I see a lot of samples, articles use this, but when I skip GO, the script works fine.
GO
SET IDENTITY_INSERT [PropertyValue] ON
GO
-- Some script
GO
SET IDENTITY_INSERT [PropertyValue] OFF
GO
I wonder if this is a good code conversion.
You don't need to. The GO instructs eg. SQL Server Management Studio to execute the query as a separate command. Some clauses need this, and some don't.
For example, it's usually necessary to use GO to separate a CREATE TABLE clause from INSERTs into the same table.
EDIT: Seems this is no longer the case, as pointed out by Damien. CREATE TABLE can be followed by INSERT just fine, and the same applies to DROP TABLE and others. Always improving, eh? :) I've tried some other cases I remember, and it seems you no longer need to have a CREATE PROCEDURE as the last statement in a command either. Times change :)
set identity_insert doesn't need to be separated by GO. It's usually used as safe practice when you're autogenerating SQL code, because you're not necessarily aware of what's going on elsewhere (eg. you might have been creating the table in the previous clause, so to be sure, you put a GO before the identity insert). But it's not necessary, and will not yield you any specific benefit, if you use it in your hand written SQL that you execute as one batch.
GO is not a Transact-SQL statement; it is a command recognized by the sqlcmd and osql utilities and SQL Server Management Studio Code editor.
SQL Server utilities interpret GO as a signal that they should send the current batch of Transact-SQL statements to an instance of SQL Server. The current batch of statements is composed of all statements entered since the last GO, or since the start of the ad hoc session or script if this is the first GO.
A Transact-SQL statement cannot occupy the same line as a GO command. However, the line can contain comments.
Users must follow the rules for batches. For example, any execution of a stored procedure after the first statement in a batch must include the EXECUTE keyword. The scope of local (user-defined) variables is limited to a batch, and cannot be referenced after a GO command.
I'm just learning sas and see two interesting procedures.
proc Delete data = table; run;
and
proc datasets lib=Libr nolist;
modify table;
rename __first = second;
quit;
run;
and several questions about them:
why some procedures ended like quit not run?
why datasets use quit and run together? (is this a statement to quit table?)
is it best recommendation to use datasets-procedure for small tasks? (not, of course, but for what? or doesn't use it?)
and also, which method is faster: proc delete or sql-drop? (which have a greater speed and on which amount of data it is necessary)
Some SAS procedures end in QUIT, rather than RUN, because they are running in Interactive Mode. For instance, in PROC GLM, you can specify an additional model statement as long as the top of the SAS window says that PROC GLM is running (if you're using the Windows version).
Some programmers have gotten into the habit of typing QUIT and RUN together. I don't think it actually matters, as procedures that use the QUIT statement begin running as soon as you enter them. I only use one or the other.
PROC DELETE is an unsupported legacy feature; it has been officially superseded by PROC DATASETS, which is the designated tool for handling datasets in SAS. Mailing List Post.
I generally don't find myself with much need to delete datasets while in SAS. Because SAS effectively manages its memory use, and because RAM is so plentiful now, I usually do 90% of my work from temporary datasets which I create on-demand at the start of the session.
As before, PROC DELETE is now deprecated in favor of PROC DATASETS. In terms of which is faster, excluding extremely large data, I would wager that there is little difference between them. When handling permanent SAS datasets, however, I like to use PROC DATASETS rather than PROC SQL, simply because I feel better manipulating permanent datasets using the SAS-designed method, and not the SQL implementation (which isn't 100%) in my opinion.
WRT "run" versus "quit":
Some SAS procedures support something called "run group processing", which means that the procedure performs whatever work it is asked to do when it sees the "run;" statement. The procedure continues to execute until it sees a "quit:" statemnet. In fact, a "quit;" statement will automatically insert a "run;" statement if there is still work to be done.
PROC DATASETS is one of those procedures. The "quit;" statement says that there is no more work for the procedure to do. Consider this trivial example:
proc datasets;
change a=new_a;
run;
delete new_a;
run;
quit;
The first statement (change) renames an existing dataset "a" to "new_a". The second statement will delete that dataset. If you fail to include a "run;" statement (after "change") in this example, the procedure will fail because it will notice that the "new_a" dataset does not exist and so will not execute either statement.
That said, I rarely use PROC DATASETS myself; I prefer to use PROC SQL.
WRT: PROC DELETE versus DROP TABLE with PROC SQL:
Although PROC DELETE is officially "deprecated", all that means is that it will no longer be changed. It is a simple procedure to delete a data object from a SAS library; I use it all the time. It has one special advantage compared to PROC SQL. If you use PROC DELETE to try and delete a dataset that does not exist, you will just get a warning message in the log. However, if you try a DROP TABLE statement from SQL, you will get an error and your SQL step will halt. I use PROC DELETE all the time when creating new tables in my ETL scripts that load into external databases like Teradata and Oracle.
Long-winded, I know, but I hope this helps.
Bob
Regarding whether datasets or sql is faster at deleting tables, I investigated that issue a bit here. Proc SQL was generally faster, which was surprising to me.
I think the following code can delete the SAS datasets in the work library;
proc datasets lib=work memtype=data kill;
run;
quit;
I believe you will find that PROC DELETE has not gone away and will not anytime soon. Moreover, it is often faster than the PROC DATASETS ... DELETE form of deletion for some types of libraries. In my experience, data libraries managed by SPDS with many datasets can cause PROC DATASETS of any kind to have very poor performance, and so I will always use PROC DELETE.