Versioned or Repeatable Scripts for Sequence & Partitions - Flyway? - sql-server

I have SQL scripts for creating the Partition Function & Partition Scheme. I want the migrations to be taken care of via flyway scripts. I would like to know whether these SQL scripts can be considered as Repeatable scripts or Versioned Scripts?
Similarly, I have scripts for SEQUENCE creation, should this be considered as Versioned Scripts or Repeatable Scripts?

You can make this a repeatable script only if it's got a check for the existence of the function before you attempt to create it. Otherwise, it'll try to create it on every deployment, causing errors. Something along the lines of:
IF NOT EXISTS( SELECT * FROM sys.partition_functions WHERE name = 'MyFunction' )
BEGIN
CREATE PARTITION FUNCTION...
END
You do the same thing for SEQUENCE, but you want to look for it here: sys.sequences. That's it.
Although, probably, I wouldn't make SEQUENCE or PARTITION repeatable. They're usually created once and then you're done. However, I can absolutely imagine circumstances where you're doing all sorts of different kinds of deployments to more than one system where having this stuff be repeatable, to ensure it's always there, regardless of the version, to be a good idea.

Related

Parse SQL scripts to find table dependencies and outputs

I currently have a large set of sql scripts transforming data from one table to another, often in steps, like for example
select input3.id as cid
, input4.name as cname
into #temp2
from input3
inner join input4
on input3.match = input4.match
where input3.regdate > '2019-01-01';
truncate table output1;
insert into output1 (customerid, customername)
select cid, cname from #temp2;
I would like to "parse" these scripts into their basic inputs and outputs
in: input3, input4
out: output1
(not necessarily this format, just this info)
To have the temporary tables falsely flagged would not be a problem:
in: input3, input4, #temp2
out: #temp2, output1
It is OK to take a little bit of time, but the more automatic, the better.
How would one do this?
Things I tried include
regexes (straight forward but will miss edge cases, mainly falsely flagging tables in comments)
Use an online parser to list the DB objects, postprocessing by hand
Look into solving it programmatically, but for example writing a C# program for this will cost too much time
I usually wrap the scripts' content into stored procedures and deploy them into the same database where the tables are located. If you are sufficiently acquainted with (power)shell scripting and regexps, you can even write the code which will do it for you.
From this point on, you have some alternatives:
If you need a complete usage / reference report, or it's a one-off task, you can utilise the sys.sql_expression_dependencies or other similar system views;
Create a SSDT database project from that database. Among many other things that make database development easier and more consistent, SSDT has the "Find all references" functionality (Shift+F12 hotkey) which displays all references of a particular object (or column) across the code.
AFAIK neither of them sees through dynamic SQL, so if you have lots of it, you'll have to look elsewhere.

Conditional SQL block evaluated even when it won't be executed

I'm working on writing a migration script for a database, and am hoping to make it idempotent, so we can safely run it any number of times without fear of it altering the database (/ migrating data) beyond the first attempt.
Part of this migration involves removing columns from a table, but inserting that data into another table first. To do so, I have something along these lines.
IF EXISTS
(SELECT * FROM sys.columns
WHERE object_id = OBJECT_ID('TableToBeModified')
AND name = 'ColumnToBeDropped')
BEGIN
CREATE TABLE MigrationTable (
Id int,
ColumnToBeDropped varchar
);
INSERT INTO MigrationTable
(Id, ColumnToBeDropped)
SELECT Id, ColumnToBeDropped
FROM TableToBeModified;
END
The first time through, this works fine, since it still exists. However, on subsequent attempts, it fails because the column no longer exists. I understand that the entire script is evaluated, and I could instead put the inner contents into an EXEC statement, but is that really the best solution to this problem, or is there another, still potentially "validity enforced" option?
I understand that the entire script is evaluated, and I could instead put the inner contents into an EXEC statement, but is that really the best solution to this problem
Yes. There are several scenarios in which you would want to push off the parsing validation due to dependencies elsewhere in the script. I will even sometimes put things into an EXEC, even if there are no current problems, to ensure that there won't be as either the rest of the script changes or the environment due to addition changes made after the current rollout script was developed. Minorly, it helps break things up visually.
While there can be permissions issues related to breaking ownership changing due to using Dynamic SQL, that is rarely a concern for a rollout script, and not a problem I have ever run into.
If we are not sure that the script will work or not specially migrating database.
However, For query to updated data related change, i will execute script with BEGIN TRAN and check result is expected then we need to perform COMMIT TRAN otherwise ROLLBACK transaction, so it will discard transaction.

Handling batch T-SQL statements with GO commands

I have a Node.js app that is handling database build / versioning by reading multiple .sql files and executing them as transactions.
The problem I am running into is that these database build scripts require a lot of GO statements, as you cannot execute multiple CREATEs, etc. in the same context.
However, GO is not T-SQL and errors when used outside of a Microsoft application context.
Take the following shorthand:
CREATE database foo /* ... */
use [foo]
CREATE TABLE bar /* ... */
This would error if GO statements were not injected between each line.
I would rather not break this into multiple .sql files for every separate transaction - I am building a database and that would turn into hundreds of files!
I could run String.split() functions on all go statements and have node execute each as a separate transaction, but that seems very hacky.
Is there any standard or best-practice solution to this type of problem?
Update
Looks like a semicolon will do the trick for everything except CREATE statements on functions, stored procedures, etc. Doesn't apply to tables or databases, though which is good.
I ended up:
Parsing the SQL file into an array of statements split on lines whose only content is GO statements & comments,
Creating a SQL transaction,
Executing all statements in the array in sequence,
Conditionally rolling back the entire transaction on any single statement fail,
Ending the transaction once all queries have completed synchronously.
A bit of work, but I think that was the best way to do it.

'tail -f' a database table

Is it possible to effectively tail a database table such that when a new row is added an application is immediately notified with the new row? Any database can be used.
Use an ON INSERT trigger.
you will need to check for specifics on how to call external applications with the values contained in the inserted record, or you will write your 'application' as a SQL procedure and have it run inside the database.
it sounds like you will want to brush up on databases in general before you paint yourself into a corner with your command line approaches.
Yes, if the database is a flat text file and appends are done at the end.
Yes, if the database supports this feature in some other way; check the relevant manual.
Otherwise, no. Databases tend to be binary files.
I am not sure but this might work for primitive / flat file databases but as far as i understand (and i could be wrong) the modern database files are encrypted. Hence reading a newly added row would not work with that command.
I would imagine most databases allow for write triggers, and you could have a script that triggers on write that tells you some of what happened. I don't know what information would be available, as it would depend on the individual database.
There are a few options here, some of which others have noted:
Periodically poll for new rows. With the way MVCC works though, it's possible to miss a row if there were two INSERTS in mid-transaction when you last queried.
Define a trigger function that will do some work for you on each insert. (In Postgres you can call a NOTIFY command that other processes can LISTEN to.) You could combine a trigger with writes to an unpublished_row_ids table to ensure that your tailing process doesn't miss anything. (The tailing process would then delete IDs from the unpublished_row_ids table as it processed them.)
Hook into the database's replication functionality, if it provides any. This should have a means of guaranteeing that rows aren't missed.
I've blogged in more detail about how to do all these options with Postgres at http://btubbs.com/streaming-updates-from-postgres.html.
tail on Linux appears to be using inotify to tell when a file changes - it probably uses similar filesystem notifications frameworks on other operating systems. Therefore it does detect file modifications.
That said, tail performs an fstat() call after each detected change and will not output anything unless the size of the file increases. Modern DB systems use random file access and reuse DB pages, so it's very possible that an inserted row will not cause the backing file size to change.
You're better off using inotify (or similar) directly, and even better off if you use DB triggers or whatever mechanism your DBMS offers to watch for DB updates, since not all file updates are necessarily row insertions.
I was just in the middle of posting the same exact response as glowcoder, plus another idea:
The low-tech way to do it is to have a timestamp field, and have a program run a query every n minutes looking for records where the timestamp is greater than that of the last run. The same concept can be done by storing the last key seen if you use a sequence, or even adding a boolean field "processed".
With oracle you can select an psuedo-column called 'rowid' that gives a unique identifier for the row in the table and rowid's are ordinal... new rows get assigned rowids that are greater than any existing rowid's.
So, first select max(rowid) from table_name
I assume that one cause for the raised question is that there are many, many rows in the table... so this first step will be taxing the db a little and take some time.
Then, select * from table_name where rowid > 'whatever_that_rowid_string_was'
you still have to periodically run the query, but it is now just a quick and inexpensive query

Confirm before delete/update in SQL Management Studio?

So for the second day in a row, someone has wiped out an entire table of data as opposed to the one row they were trying to delete because they didn't have the qualified where clause.
I've been all up and down the mgmt studio options, but can't find a confirm option. I know other tools for other databases have it.
I'd suggest that you should always write SELECT statement with WHERE clause first and execute it to actually see what rows will your DELETE command delete. Then just execute DELETE with the same WHERE clause. The same applies for UPDATEs.
Under Tools>Options>Query Execution>SQL Server>ANSI, you can enable the Implicit Transactions option which means that you don't need to explicitly include the Begin Transaction command.
The obvious downside of this is that you might forget to add a Commit (or Rollback) at the end, or worse still, your colleagues will add Commit at the end of every script by default.
You can lead the horse to water...
You might suggest that they always take an ad-hoc backup before they do anything (depending on the size of your DB) just in case.
Try using a BEGIN TRANSACTION before you run your DELETE statement.
Then you can choose to COMMIT or ROLLBACK same.
In SSMS 2005, you can enable this option under Tools|Options|Query Execution|SQL Server|ANSI ... check SET IMPLICIT_TRANSACTIONS. That will require a commit to affect update/delete queries for future connections.
For the current query, go to Query|Query Options|Execution|ANSI and check the same box.
This page also has instructions for SSMS 2000, if that is what you're using.
As others have pointed out, this won't address the root cause: it's almost as easy to paste a COMMIT at the end of every new query you create as it is to fire off a query in the first place.
First, this is what audit tables are for. If you know who deleted all the records you can either restrict their database privileges or deal with them from a performance perspective. The last person who did this at my office is currently on probation. If she does it again, she will be let go. You have responsibilites if you have access to production data and ensuring that you cause no harm is one of them. This is a performance problem as much as a technical problem. You will never find a way to prevent people from making dumb mistakes (the database has no way to know if you meant delete table a or delete table a where id = 100 and a confirm will get hit automatically by most people). You can only try to reduce them by making sure the people who run this code are responsible and by putting into place policies to help them remember what to do. Employees who have a pattern of behaving irresponsibly with your busness data (particulaly after they have been given a warning) should be fired.
Others have suggested the kinds of things we do to prevent this from happening. I always embed a select in a delete that I'm running from a query window to make sure it will delete only the records I intend. All our code on production that changes, inserts or deletes data must be enclosed in a transaction. If it is being run manually, you don't run the rollback or commit until you see the number of records affected.
Example of delete with embedded select
delete a
--select a.* from
from table1 a
join table 2 b on a.id = b.id
where b.somefield = 'test'
But even these techniques can't prevent all human error. A developer who doesn't understand the data may run the select and still not understand that it is deleting too many records. Running in a transaction may mean you have other problems when people forget to commit or rollback and lock up the system. Or people may put it in a transaction and still hit commit without thinking just as they would hit confirm on a message box if there was one. The best prevention is to have a way to quickly recover from errors like these. Recovery from an audit log table tends to be faster than from backups. Plus you have the advantage of being able to tell who made the error and exactly which records were affected (maybe you didn't delete the whole table but your where clause was wrong and you deleted a few wrong records.)
For the most part, production data should not be changed on the fly. You should script the change and check it on dev first. Then on prod, all you have to do is run the script with no changes rather than highlighting and running little pieces one at a time. Now inthe real world this isn't always possible as sometimes you are fixing something broken only on prod that needs to be fixed now (for instance when none of your customers can log in because critical data got deleted). In a case like this, you may not have the luxury of reproducing the problem first on dev and then writing the fix. When you have these types of problems, you may need to fix directly on prod and you should have only dbas or database analysts, or configuration managers or others who are normally responsible for data on the prod do the fix not a developer. Developers in general should not have access to prod.
That is why I believe you should always:
1 Use stored procedures that are tested on a dev database before deploying to production
2 Select the data before deletion
3 Screen developers using an interview and performance evaluation process :)
4 Base performance evaluation on how many database tables they do/do not delete
5 Treat production data as if it were poisonous and be very afraid
So for the second day in a row, someone has wiped out an entire table of data as opposed to the one row they were trying to delete because they didn't have the qualified where clause
Probably the only solution will be to replace someone with someone else ;). Otherwise they will always find their workaround
Eventually restrict the database access for that person and provide them with the stored procedure that takes the parameter used in the where clause and grant them access to execute that stored procedure.
Put on your best Trogdor and Burninate until they learn to put in the WHERE clause.
The best advice is to get the muckety-mucks that are mucking around in the database to use transactions when testing. It goes a long way towards preventing "whoops" moments. The caveat is that now you have to tell them to COMMIT or ROLLBACK because for sure they're going to lock up your DB at least once.
Lock it down:
REVOKE delete rights on all your tables.
Put in an audit trigger and audit table.
Create parametrized delete SPs and only give rights to execute on an as needed basis.
Isn't there a way to give users the results they need without providing raw access to SQL? If you at least had a separate entry box for "WHERE", you could default it to "WHERE 1 = 0" or something.
I think there must be a way to back these out of the transaction journaling, too. But probably not without rolling everything back, and then selectively reapplying whatever came after the fatal mistake.
Another ugly option is to create a trigger to write all DELETEs (maybe over some minimum number of records) to a log table.

Resources