Database version control plan: hot or not? - database

Based on reading around the web, stack overflow, and mostly these articles about db versioning that were linked from coding horror, I've made a stab at writing a plan for versioning the database of an 8 year old php mysql website.
Database Version Control plan
- Create a db as the "Master Database"
- Create a table db_version (id, script_name, version_number, author, comment, date_ran)
- Create baseline script for schema+core data that creates this db from scratch, run this on Master Db
- Create a "test data" script to load any db with working data
- Modifications to the master db are ONLY to be made through the db versioning process
- Ensure everyone developing against the Master Db has a local db created by the baseline script
- Procedures for commiting and updating from the Master Db
- Master Db Commit
- Perform a schema diff between your local db and the master db
- Perform a data diff on core data between your local db and master db
- If there are changes in either or both cases, combine these changes into an update script
- Collect the data to be added to a new row in db_version table, and add an insert for this into the script
- new version number = latest master db version number +1
- author
- comment
- The script must be named as changeScript_V.sql where V is the latest master db version +1
- Run the script against the master db
- If the script executed succesfully, add it to the svn repository
- Add the new db_version record to your local db_version table
- Update from Master Db
- Update your local svn checkout to have all the latest change scripts available
- compares your local db_version table to the master db_version table to determine which change scripts to run
- Run the required change scripts in order against your local db, which will also update your local db_version table
My first question is, does this sound correct?
My second question is, the commit process seems a bit complicated to do more than once a day. Is there a way to reliably automate it? Or should I not be commiting database changes often enough for it to matter?

Looking at your proposals, it doesn't seem like something that's feasible nor practical.
I was working in a company where we used more than 1k tables per database (very complex system), and it all worked fine like this:
Have one person in charge of the DB (lets call him DBPerson) - every script/db change has to pass through him. This will avoid any unnecessary changes, and some 'overlooks' of the issues (for example, if someone moves an index to perform better for his query, hi might destroy other persons work, maybe someone will create a table that is completely redundant and unnecessary, etc...). This will keep db clean and efficient. Even if it seems like this is too much work for one guy (or his deputy), in fact it isn't - the db usually rarely changes.
Each script has to pass validation through DBPerson
When the script is approved, the DBPerson assigns a number and puts it in 'update' folder/svn(...), with appropriate numbering (as you suggested, incremental numbers for example).
Next, if you have some continuous integration in place, the script gets picked up and updates the db (if you don't have continuous integration, do it manually).
Do not store entire database script, with all the data in script. Store the actual database instead. If you have branches of the solution - have each branch with it's own database, or you can always have update scripts divided for each of the branches so you could rollback/forward to another branch. But, I really recommend to have a separate db for each branch.
Have one database always with default data (intact) - for needs of unit tests, regression tests etc. Whenever you do the tests, do them on the copy of this database. You could even put a nightly cleanup of the test databases with the main one (if appropriate of course).
In an environment like this you'll have multiple versions of database:
Developers database (local) - the one that the dev guy is using to test his work. He can always copy from Master or Test Master.
Master database - the one with all the default values, maybe semi-empty if you're doing redeploys to new clients.
Test Master database - Master database filled with test data. Any scripts you have ran on Master you ran here as well.
Test in progress database - copied from Test Master and used for testing - gets overwritten prior to any new test.
If you have branches (similar database with slight difference for each of the clients) than you'll have the same as above for each branch...
You will most certainly have to make modifications of this to match your situation, but anyway I think that keeping the textual version of the create script for entire database is wrong in terms of maintainability, merging, updating, etc...

Related

Implementation of initial database replication

How various databases implement copying data (replication) to a new instance when it is added to the replication setup?
I. e., when we add a new instance, how is the data loaded into it?
There are a lot of information about ways of replication, but they are explained in cases when the target database instance already has the same data from its source. But not when there is a new initially empty instance of database
There are basically 3 approaches here.
First you start capturing the changes from the source database using CDC tool. Since the target database is not yet created you store all the changes to apply them later.
Depending on the architecture you can:
If you have 1:1 copy
Take a backup of the source database, backup it, and restore it to the target database. Having some point in time of the backup you start applying the changes from the timestamp when the database backup was created.
Assuming you have a consistent backup of the database you would have the same data on the target but delayed compared to the source.
If you have a subset of the tables or a different vendor
The same approach like in 1. but you don't backup & restore the full database but just a list of tables. You can also restore the database backup in a temporary location, export part of the tables (or not full tables but just subset of columns), and next load them to the target.
When the target is initially prepared - you start applying the changers from the source to the target.
No source database snapshot available
If you can't get a snapshot of the replication tool often contains a method to work with that. Depending on the tool the function is named AUTOCORRECTION (SAP/Sybase Replication Server), HANDLECOLLISIONS (Oracle GoldenGate). This method basically means that the replication tool has a full image of the UPDATE operation, and when the record does not exist in the target - it is cerated. When the row for DELETE does not exists - the operation is ignored. When the rows already exists for INSERT - the operation is ignored.
To get a consistent state of the target you work in mode described here for some time until the point when you have data in sync, and next switch to regular replication.
One thing to mention about this mode is that you need to make sure that during the reconciliation operation the CDC must provide full UPDATE content for rows. If the UPDATE just contains modified columns - you would not be able to create INSERT command (with all column values) if the row is missing.
Of course the replication tool you use can incorporate the solution described above and do the task instead of you - automatically.

Detect Table Changes In A Database Without Modifications

I have a database ("DatabaseA") that I cannot modify in any way, but I need to detect the addition of rows to a table in it and then add a log record to a table in a separate database ("DatabaseB") along with some info about the user who added the row to DatabaseA. (So it needs to be event-driven, not merely a periodic scan of the DatabaseA table.)
I know that normally, I could add a trigger to DatabaseA and run, say, a stored procedure to add log records to the DatabaseB table. But how can I do this without modifying DatabaseA?
I have free-reign to do whatever I like in DatabaseB.
EDIT in response to questions/comments ...
Databases A and B are MS SQL 2008/R2 databases (as tagged), users are interacting with the DB via a proprietary Windows desktop application (not my own) and each user has a SQL login associated with their application session.
Any ideas?
Ok, so I have not put together a proof of concept, but this might work.
You can configure an extended events session on databaseB that watches for all the procedures on databaseA that can insert into the table or any sql statements that run against the table on databaseA (using a LIKE '%your table name here%').
This is a custom solution that writes the XE session to a table:
https://github.com/spaghettidba/XESmartTarget
You could probably mimic functionality by writing the XE events table to a custom user table every 1 minute or so using the SQL job agent.
Your session would monitor databaseA, write the XE output to databaseB, you write a trigger that upon each XE output write, it would compare the two tables and if there are differences, write the differences to your log table. This would be a nonstop running process, but it is still kind of a period scan in a way. The XE only writes when the event happens, but it is still running a check every couple of seconds.
I recommend you look at a data integration tool that can mine the transaction log for Change Data Capture events. We are recently using StreamSets Data Collector for Oracle CDC but it also has SQL Server CDC. There are many other competing technologies including Oracle GoldenGate and Informatica PowerExchange (not PowerCenter). We like StreamSets because it is open source and is designed to build realtime data pipelines between DB at the schema level. Till now we have used batch ETL tools like Informatica PowerCenter and Pentaho Data Integration. I can near real-time copy all the tables in a schema in one StreamSets pipeline provided I already deployed DDL in the target. I use this approach between Oracle and Vertica. You can add additional columns to the target and populate them as part of the pipeline.
The only catch might be identifying which user made the change. I don't know whether that is in the SQL Server transaction log. Seems probable but I am not a SQL Server DBA.
I looked at both solutions provided by the time of writing this answer (refer Dan Flippo and dfundaka) but found that the first - using Change Data Capture - required modification to the database and the second - using Extended Events - wasn't really a complete answer, though it got me thinking of other options.
And the option that seems cleanest, and doesn't require any database modification - is to use SQL Server Dynamic Management Views. Within this library residing, in the System database, are various procedures to view server process history - in this case INSERTs and UPDATEs - such as sys.dm_exec_sql_text and sys.dm_exec_query_stats which contain records of database transactions (and are, in fact, what Extended Events seems to be based on).
Though it's quite an involved process initially to extract the required information, the queries can be tuned and generalized to a degree.
There are restrictions on transaction history retention, etc but for the purposes of this particular exercise, this wasn't an issue.
I'm not going to select this answer as the correct one yet partly because it's a matter of preference as to how you approach the problem and also because I'm yet to provide a complete solution. Hopefully, I'll post back with that later. But if anyone cares to comment on this approach - good or bad - I'd be interested in your views.

SSDT creating a rollback deployment script?

we can create a SQLServer deployment script with TFS&SSDT, but is there a way to create a rollback script, so that we can roll back the deployment?
Thank's
As SSDT (and similar products) all work by comparing the schema in the project against a live database to bring the database in sync with the model, there's not a direct way to create a rollback script. There are also considerations regarding data changed/added/removed through pre or post-deploy scripts.
That being said, there are a handful of options.
Take a snapshot each time you do a release. You can use that snapshot from the prior release to do another compare for rollback purposes.
Maintain a prior version elsewhere - perhaps do a schema compare from your production system against your local machine. You can use that to compare against production and do a rollback.
Generate a dacpac of the existing system prior to release (use SQLPackage or SSDT to do that). You can use that to deploy that version of the schema back to the database if something goes wrong.
Take a database snapshot prior to release. Best case scenario, you don't need it and can drop the snapshot. Worst case, you could use that to rollback. Of course, you need to watch out for space and IO as you'll be maintaining that original state elsewhere.
Run your changes through several environments to minimize the need for a rollback. Ideally if you've run this through Development, QA, and Staging/User Acceptance environments, your code and releases should be solid enough to be able to release without any issues.
You'll need to code accordingly for rolling back data changes. That could be a bit trickier as each scenario is different. You'll need to make sure that you write a script that can undo whatever changes were part of your release. If you inserted a row, you'll need a rollback script to delete it. If you updated a bunch of data, you'll either need a backup of that data or some other way to get it back.
Before making any changes to a database project I take a snapshot (a dacpac) which I can compare the modified database project against to generate a release script. Although it's easy enough to swap the source and target to do a reverse schema compare I've discovered it won't let me generate an update script (which would be the rollback script) from the reverse comparison, presumably because the target is a database project.
To get around that problem and generate a rollback script I do the following:
Deploy the modified database project to my (localdb) development database;
Check out a previous version of the database project from source control, from before the changes were made;
Run a schema compare from the previous version of the database project to the (localdb) development database;
Use the schema compare to generate an update script. This update script will be a rollback script.
Although it would be nice to be able to generate a rollback script more directly the four step process above takes less than five minutes.

Over Web Service, Update A Table From Another Same Table Which Is In Different Location

I have two different database.
One of them, original database and another one is cache database.
This databases are in different location.
Ones a day, I must update cache database from original database.
And I must this update progress with a Web Service which is working on Original Database machine.
I can it with clear all Cache DB Tables and Insert Original Datas in every progress.
But I think is a Bad scenario.
So how can I this update progress with efficiency.
And have you any suggestion.
I'm pretty sure that there are DB syncing technologies out there, but since you already have the requirement, I'd recommend to use a change-log.
So, you'll have a "CHANGE_LOG" table, to which you insert rows whenever you do "writes" on your tables (INSERT,UPDATE,DELETE). Once a day, you can apply these changes one-by-one to the cache DB.
Deleting the change-log once it's applied is okay, but you can also confer "version" to the DBs. So each change to the DB will increment the version number. That can be used to manage more than one chache DBs.
To provide additional assurance for example, you can have a trigger in the cache DB that increment their own version numbers. That way, your process can inquire a cache DB and will know what changes must be applied, without maintaining that in the master DB (that way, hooking up a new cache DB, bringing up a crashed cache DB up to date is easy, too.).
Note that you probably need to purge the change log from time to time.
The way I see it you're going to have to grab all the data from the source database, as you don't seem to have any way of interrogating it to see what data has changed. A simple way to do it would be to copy all the data from the source database into temporary or staging tables in the cache database. Then you can do a diff between both sets of tables and update the records that have changed. Or once you have all the data in the staging tables drop/rename the existing tables and rename the staging tables to the existing table names.

SQL Server 2008 Auto Backup

We want to have our test servers databases updated from our production server databases on a nightly basis to ensure we're developing on the most recent data. We, however, want to ensure that any fn, sp, etc that we're currently working on in the development environment doesn't get overwritten by the backup process. What we were thinking of doing was having a prebackup program that saves objects selected by our developers and a postbackup program to add them back in after the backup process is complete.
I was wondering what other developers have been doing in a situation like this. Is there an existing tool to do this for us that can run automatically on a daily basis and allow us to set objects not to overwrite (without requiring the attention of a sysadmin to run it daily).
All the objects in our databases are maintained in code - tables, view, triggers, stored procedures, everything - if we expect to find it in the database then it should be in DDL in code that we can run. Actual schema changes are versioned - so there's a table in the database that says this is schema version "n" and if this is not the current version (according to the update code) then we make the necessary changes.
We endeavour to separate out triggers and views - don't, although we probably should, do much with SP and FN - with drop and re-create code that is valid for the current schema version. Accordingly it should be "safe" to drop and recreate anything that isn't a table, although there will be sequencing issues with both the drop and the create if there are dependencies between objects. The nice thing about this generally is that we can confidently bring a schema from new to current and have confidence that any instance of the schema is consistent.
Expanding to your case, if you have the ability to run the schema update code including the code to recreate all the database objects according to the current definitions then your problem should substantially go away... backup, restore, run schema maint logic. This would have the further benefit that you can introduce schema (table) changes in the dev servers and still keep the same update logic.
I know this isn't a completely generic solution. And its worth noting that it probably works better with database per developer (I'm an old fashioned programmer, so I see all problems as having code based solutions (-:) but as a general approach I think it has considerable merit because it gives you a consistent mechanism to address a number of problems.

Resources