Re-Running Database Development Scripts - sql-server

In our current database development evironment we have automated build procceses check all the sql code out of svn create database scripts and apply them to the various development/qa databases.
This is all well and good, and is a tremdous improvement over what we did in the past, but we have a problem with rerunning scripts. Obviously this isn't a problem with some scripts like altering procedures, because you can run them over and over without adversly affecting the system. Right now to add metadata and run statements like create/alter table statements we add code to check and see if the objects exists, and if they do, don't run them.
Our problem is that we really only get one shot to run the script, because once the script has been run, the objects are in the environment and system won't run the script again. If something needs to change once it's been deployed, we have a difficult process of running update scripts agaist the update scripts and hoping that everything falls in the correct order and all of the PKs line up between the environments (the databases are, shall we say, "special").
Short of dropping the database and starting the process from scratch (the last most current release), does anyone have a more elegant solution to this?

I'm not sure how best to approach the problem in your specific environment, but I'd suggest reading up on Rail's migrations feature for some inspiration on how to get started.
http://wiki.rubyonrails.org/rails/pages/UnderstandingMigrations

We address this - or at least a similar problem to this - as follows:
The schema has a version number - this is represented by a table which has one row per version which, as well as the version number, carries boring things like a date/time stamp for when that version came into existence.
By having the schema create/modify DDL wrapped in code that performs the changes for us.
In the context above one would build the schema change code as part of the build process then run it and it would only apply schema changes that haven't already been applied.
In our experience (which is bound not to be representative) in most cases the schema changes are sufficiently small/fast that they can safely be run in a transaction which means that if it fails we get a rollback and the db is "safe" - although one would always recommend taking backups before applying schema updates if practicable.
I evolved this out of nasty painful experience. Its not a perfect system (or an original idea) but as a result of working this way we have a high degree of confidence that if there are two instances of one of our databases with the same version that then the schema for those two databases will be the same in almost all respects and that we can safely bring any db up to the current schema for that application without ill effects. (That last isn't 100% true unfortunately - there's always an exception - but its not too far from the truth!)

Do you keep your existing data in the database? If not, you may want to look at something similar to what Matt mentioned for .NET called RikMigrations
http://www.rikware.com/RikMigrations.html
I use that on my projects to update my database on the fly, while keeping track of revisions. Also, it makes it very simple to move database schema to different servers, etc.

if you want to have re-runnability in your scripts, then you can't have them as definitions... what I mean by this is that you need to focus on change scripts rather than here is my Table script.
let's say you have a table Customers:
create table Customers (
id int identity(1,1) primary key,
first_name varchar(255) not null,
last_name varchar(255) not null
)
and later you want to add a status column. Don't modify your original table script, that one has already run (and can have the if(! exists) syntax to prevent it from causing errors while running again).
Instead, have a new script, called add_customer_status.sql
in this script you'll have something like:
alter table Customers
add column status varchar(50) null
update Customers set status = 'Silver' where status is null
alter table Customers
alter column status varchar(50) not null
Again you can wrap this with an if(! exists) block to allow re-running, but here we've leveraged the notion that this is a change script, and we adapt the database accordingly. If there is data already in the customers table then we're still okay, since we add the column, seed it with data, then add the not null constraint.
Both of the migration frameworks mentioned above are good, I've also had excellent experience with MigratorDotNet.

Scott named a couple of other SQL tools that address the problem of change management. But I'm still rolling my own.
I would like to second this question, and add my puzzlement that there is still no free, community-based tool for this problem. Obviously, scripts are not a satisfactory way to maintain a database schema; neither are instances. So, why don't we keep metadata in a separate (and while we're at it, platform-neutral) format?
That's what I'm doing now. My master database schema is a version-controlled XML file, created initially from a simple web service. A simple javascript program compares instances against it, and a simple XSL transform yields the CREATE or ALTER statements. It has limits, like RikMigrations; for instance it doesn't always sequence inter-depdendent objects correctly. (But guess what — neither does Microsoft's SQL Server Database Publication tool.) Really, it's too simple. I simply didn't include objects (roles, users, etc.) that I wasn't using.
So, my view is that this problem is indeed inadequately addressed, and that sooner or later we'll have to get together and tackle the devilish details.

We went the 'drop and recreate the schema' route. We had some classes in our JUnit test package which parameterized the scripts to create all the objects in the schema for the developer executing the code. This allowed all the developers to share one test database and everyone could simultaneously create/test/drop their test tables without conflicts.
Did it take a long time to run? Yes. At first we used the setup method for this which meant the tables were dropped/created for every test and that took way too long. Then we created a TestSuite which could be run once before all the tests for a class and then cleaned up when all the class tests were complete. This still meant that the db setup ran many times when we ran our 'AllTests' class which included all the tests in all our packages. How I solved it was adding a semaphore to the OracleTestSuite code so when the first test requested the database to be setup it would do that but any subsequent call would just increment a counter. As each tearDown() method was called, the counter would decrement the counter until it reached 0 and the OracleTestSuite code would drop everything. One issue this leaves is whether the tests assume that the database is empty. It can be convenient to let database tests know the order in which they run so they can take advantage of the state of the database because it can reduce the duplication of DB setup.
We used the concept of ObjectMothers to solve a similar problem with creating complex domain objects for testing purposes. Mock objects might be a better answer but we hadn't heard about them at the time. After all this time, I'd recommend creating test helper methods that could create standardized datasets for the typical scenarios. Plus that would help document the important edge cases from a data perspective.

Related

How to properly handle database data changes with laravel migrations and seeds

I have my migration files, which creates the initial schema for the database, and, I have my seed files, which populates the initial schema, with random data, and set types.
My understanding of the migrations and seeds is that, whenever a new team member joins, he can just run them and be up to speed with all db changes and the require data for the product to work, plus, you can apply changes to stg and prod by running migration files.
But, as my project has advanced, new types of data has been emerging and the only way to apply them was to create a migration that will run inserts into the database.
The problem that I have is that the way artisan works with migrations and seeds is that it first runs all migrations, and then, it runs all the seeds, and there does not seems to be a way for me to specify the order in which they should be run. So, if I run a migrate:refresh --seed I get into errors because it applies the latest migrations, which inserts new data (that might or might not, depend on types inserted in the seeds) before the seed has inserted his own data.
One solution we tried is to update the seeds as well, and check before applying changes in migrations that inserts data, but this has become very troublesome to maintain.
What is the expected use of the migrations and seeds for this scenario?
Update
To try to make it more clear:
Lets say I have a migration to create users:
Users:{id, name, type}
And I have my seed to create users.
I run both, and I have a users table with a bunch of users.
Time goes by, and we decide that we need a table for user_types.
A migration is created, that will create the new table, and populate the data of the new user types and updates to match the current user.type to user.type_id.
Devs runs the migrations and they have theyre db all up to date.
A new dev joins the team. He runs the migrations. And then the seeds.
It breaks.
Now, if we update the seeds to match the latest, we will run into duplicated data for the user_types table. To avoid this we will need to have some sort of defensive code in the migration to not run things if there is no data, and update if there it is.
The question is, is this the right way of using the migrations? How do you push a data change to all devs, withouth having to re run the seeds?
I think migrations should not depend on any data in seeds. And seeds should expect the latest migration schema. So, whenever your schema changes, you should update seeds accordingly and do a migrate:fresh --seed. At least, that's what I'm doing.
As you are already doing that, I wonder what makes it a "troublesome process" for you. Can you elaborate?
From your explanation, I think I understand.
The first thing your seeders should do is delete everything so you always start with an empty database, at least where the seeders should be concerned.
I'd also take a closer look at the docs https://laravel.com/docs/5.2/seeding specifically the section "Using Model Factories".
public function run()
{
factory(App\User::class, 50)->create()->each(function($u) {
$u->posts()->save(factory(App\Post::class)->make());
});
}
In this case, you'd do something similar except you'd start at the top which for you is user_types, then for each type it creates, it would generate however many users you need. This seems a bit different from your current process where it sounds like you have a seeder file for each table.
This way, you have the parent/child relationships handled and made very clear in the code and it's very easy to add additional items in the future. Additionally, because you are essentially starting from scratch each time the seeders run, you can be sure this will work for new and existing devs.

How to handle multiple db alter scripts coming from different Git feature branches?

A bit complex to describe, but I'll do my best. Basically we're using the Git workflow, meaning we have the following branches:
production, which is the live branch. Everything is production is running in the live web environment.
integration, in which all new functionality is integrated. This branch is merged to production every week.
one or more feature branches, in which developers or development teams develop new functionality. After this is done, developers merge their feature branch to integration.
So, nothing really complex here. But, since our application is a web application running against a MySQL database, new functionality often requires changes to the database scheme. To automate this, we're using dbdeploy, which allows us to create alter scripts, given a number. E.g. 00001.sql, 00002.sql, etc. Upon merging to the integration branch, dbdeploy will check which alter scripts have a higher number than the latest executed one on that specific database, and will execute those.
Now assume the following.
- integration has alter scripts up until 00200.sql. All of these are executed on the integration database.
- developer John has a feature branch featureX, which was created when integration still had 00199.sql as the highest alter script.
John creates 00200.sql because of some required db schema changes.
Now, at some point John will merge his modifications back to the integration branch. John will get a merge conflict and will see that his 00200.sql already exists in integration. This means he needs to open the conflicting file, extract his contents, reset that file back to 'mine' (the original state as in integration) and put his own contents in a new file.
Now, since we're working with ten developers, we get this situation daily. And while we do understand the reasons behind this, it's sometimes very cumbersome. John renames his script, does a merge commit to integration, pushes the changes to the upstream only to see that somebody else already created a 00201.sql, requiring John to do the proces again.
Surely there must be more teams using the Git workflow and using a database change management tool for automating database schema changes?
So, in short, my questions are:
How to automate database schema changes, when working on different feature branches, that operate on different instances of the same db?
How to prevent merge conflicts all the time, while still having the option to have a fixed order in the executed alter scripts? E.g. 00199.sql must be executed before 00200.sql, because 00200.sql might be depending on something done in 00199.sql.
Any other tips are most welcome ofcourse.
Rails used to do this, with exactly the problems you describe. They changed to the following scheme: the files (rails calls them migrations) are labelled with a utc timestamp of when the file was created, eg
20140723069701_add_foo_to_bar
(The second part of the name doesn't contribute to the ordering).
Rails records the timestamps of all the migrations that have been run. When you ask it to run pending migrations it selects all the migration files whose timestamp isn't in the list of already run migrations and runs them in numerical order.
You'll no longer get merge conflicts unless two people create one at exactly the same point in time.
Files still get executed in the order you wrote them, but possibly interleaved with someone else's work. In theory you can still have problems - eg developer a decides to rename a table that I had decided to add a column too. That is much less common than 2 developers both making any changes to the db and you would have problems even not considering the schema changes presumably I have just written code that queries a no longer existant table - at some point developers working on related stuff will have to talk to each other!
A few suggestions:
1 - have a look at Liquibase, each version gets a file that references the changes that need to happen, then the change files can be named using a meaningful string rather than by number.
2 - have a central location for getting the next available number, then people use the latest number.
I've used Liquibase in the past, pretty successfully, and we didn't have the problem you describe.
As Frederick Cheung suggested, use timestamps rather than a serial number. Applying schema changes by order of datestamp should work, because schema changes can only depend on changes of a prior date.
In addition, include the name of the developer in the name of the alter script. This will prevent merge conflicts 100%.
Your merge hook should just look for newly added alter scripts (present in the merged branch but not in the upstream branch) and execute them by order of timestamp.
I've used two different approaches to overcome your problem in the past.
The first is to use a n ORM which can handle the schema updates.
The other approach is to create a script, which incrementally builds the database schema. This way if a developer needs to an additional row in a table, he should add the appropriate sql statement after the table is create. Likewise if he needs a new table, he should add the sql statement for that. Then merging becomes a question of making sure things happen in the correct order. This is basically what the database update process in an ORM does. Such a script needs to be coded very defensively, and each statement should check if its perquisites exists.
For the dbvc commandline tool, I use git log to determine the order of the update scripts.
git log -c --no-merges --pretty="format:" --name-status -p dev/db/updates/ | \
grep '^A' | awk '{print $2}' | tac
In this case the way the order of your commits will determine the sequence in which the updates are run. Which is most likely what you want.
If you run git merge b, the updates from master will be run first and than from B.
If you run git rebase b, the update from B will run first and than from master.

Does hbm2ddl.auto=update not honor different DB users, maybe?

We are encountering strange hibernate behavior with hbm2ddl.auto set to update.
In our test setup, we have two database users, one containing the tables for our beta application, the other one is mainly used for development. I.e. same table names with different users.
When new tables are to be created, we do so by using hbm2ddl.auto=update.
Now suddenly the strange behavior is: the update process looks for existing tables with the wrong user and creates those not found with the right user.
E.g. if the following tables exists
USER_A.TABLE_1
USER_B.TABLE_2
and we update with three tables configured: TABLE_1, TABLE_2, TABLE_3 using USER_B, we end up with
USER_A.TABLE_1
USER_B.TABLE_2
USER_B.TABLE_3
TABLE_1 is not created for USER_B. After renaming USER_A.TABLE_1 to USER_A.TABLE_0 and updating again we end up with the expected result:
USER_A.TABLE_0
USER_B.TABLE_1
USER_B.TABLE_2
USER_B.TABLE_3
Does this make any sense to anyone? Is there something like an internal hibernate cache remembering like "Hey I have already created this table on this server (and I do not care about the user)".
We have spent quite some testing to reassure this is not a configuration problem, reproduced this on different machines, different configurations, from ant or using the IDE, making sure USER_A's password cannot be found anywhere in the build directory etc. So we are 100% sure, the behavior is as described - but we are completely out of ideas what happens.
I'd be very happy to hear your ideas about this, since this problem is nagging for some time now.
Thanks a lot,
Peter
Is there something like an internal hibernate cache remembering like "Hey I have already created this table on this server (and I do not care about the user)".
No. What is probably happening is that USER_A can see the tables created under USER_B account, and vice-versa. It's not clear which database you are using, but I would try to configure Hibernate to use two different schemas, in addition to use just different users. You may also want to try to set the property "hibernate.default_schema", but I'm not sure that this only will solve your problem.

How should I rename many Stored Procedures without breaking stuff?

My database has had several successive maintainers over the years and any naming guidelines that may have once been in place have been ignored.
I'd like to rename the stored procedures to a consistent format. Obviously I can rename them from within SQL Server Management Studio, but this will not then update the calls made in the website code behind (C#/ASP.NET).
Is there anything I can do to ensure all calls get updated to the new names, short of searching for every single old procedure name in the code? Does Visual Studio have the ability to refactor such stored procedure names?
NB I do not believe my question to be a duplicate of this question as the latter is solely about renaming within the database.
You could make the change in stages:
Copy of the stored procedures to the new stored procedures under their new name.
Alter the old stored procedures to call the new ones.
Add logging to the old stored procedures when you've changed all the code in the website.
After a while when you're not seeing any calls to the old stored procedures and you're happy you've found all the calls in the web site, you can remove the old stored procedures and logging.
You can move the 'guts' of the SPROC to a new SPROC meeting your new naming conventions, and then leave the original sproc as a shell / wrapper which delegates to the new SPROC.
You can also add an 'audit' table to track when the old wrapper SPROC is called - this way you will know that there are no dependencies on the old SPROC, and the old SPROC can be safely dropped (also, make sure that it isn't just 'your app' using the DB - e.g. cross database joins or other apps)
This has a small performance penalty, and won't really buy you that much (other than being able to 'find' your new SPROCs easier)
You will need to handle this in at least two areas, the application and the database. There could be other areas as well, and you have to be careful not to overlook them.
The Application
A Nice Practice for Future Projects
It helps to abstract your sprocs out. In our apps, we wrap all of our sprocs in a giant class, I can make calls like this:
Dim SomeData as DataTable = Sprocs.sproc_GetSomeData(5)
That way, the code end is nice and encapsulated. I can go into Sprocs.sproc_GetSomeData and tweak the sproc name in just one place, and of course I can right click on the method and do a symbolic rename to fix the method call solution-wide.
Without the Abstraction
Without that abstraction, you can just do Find In Files (Cntl+Shift+F) for the sproc name and then if the results looks right, open the files up and Find/Replace all the occurances.
The Sql Server
Don't Trust View Dependencies
On the SQL server end, theoretically in MSSMS 2008 you can right click on a sproc and select View Dependencies.
That should show you a list of all the places where the sproc is used in the database, however my confidence in this feature is very low. It might be better in SQL 2008, but in previous versions it definitely had problems.
View Dependencies hurt me, and it will take time for that to heal. :)
Wrap It!
You end up having to keep the old sproc around for awhile. This is the major reason why renaming sprocs is a such a project - it can take a month to finally be done with it.
First replace its contents with some simple TSQL that calls the the new sproc with the same parameters, and write some logging so that once some time goes by, you can tell if the old sproc is actually unused.
Finally, when you're sure the old sproc is unused, delete it.
Other Areas?
There could be a lot of other areas as well. Reporting Services springs to mind. SSIS packages. Using the technique of keeping the old sproc around and re-routing to the new one (mentioned above) will help you know if you missed anything, however it won't tell you what you missed. This can lead to much pain!
Good luck!
Short of testing every path in your application to ensure that any calls to the database and the relevant stored procedures have been updated... no.
Use global search and replace (but review each suggested replacement) to try to avoid missing any instances. If you app is well structured then there really should only be 1 place each stored proc is called.
As far as changing your application, I have all my stored procs as settings in the web.config file, so all the names are in one place and can be changed at any time to match changes to the database.
When the application needs to call a stored proc, the name is determined from web.config.
This makes it easier to manage all the potential calls which the application could make to the database services layer.
It will be a bit of a tedious search through your source code and other database objects I'm afraid.
Don't forget SSIS Packages, SQL Agent Jobs, Reporting Services rdl as well as your main application code.
You could use a regular expression like spProc1|spProc2 to search in the source code for all object names at the same time if you have a tool that supports searching through files using regular expressions (I have used RegexBuddy for this in the past)
If you want to just cover the possibility you might have missed the odd one you could leave all the previous stored procedures behind for a month and just have them log a custom SQL trace event with APP_NAME(), SUSER_NAME() and any other info you find helpful then have it call the renamed version. Then set up a trace monitoring this event.
If you use a connection to DB, stored procedures etc, you should create a service class to delegate these methods.
This way when something in your database, SP etc changes, you only have to update your service class, and everything is protected from breaking.
There are tools for VS that can manage changing a name, like refactor, and resharper
I did this and I relied heavily on global search in my source code for stored procedure names and SQL digger to find sql procs that called sql proces.
http://www.sqldigger.com/
SQL Server (as of SQL 2000) poorly understands it own dependencies, so one is left searching the text of the scripts to find dependencies, which could be other stored procs or substrings of dynamic sql.
I would obtain a list of references to a procedure by using the following, because SSMS dependencies doesn't pickup dynamic SQL references or references outside the database.
SELECT OBJECT_NAME(m.object_id), m.*
FROM SYS.SQL_MODULES m
WHERE m.definition LIKE N'%my_sproc_name%'
The SQL needs to be run in every database where there could be references.
syscomments and INFORMATION_SCHEMA.routines have nvarchar(4000) columns. So if "mySprocName" is used at position 3998, it won't be found. syscomments does have multiple lines but ROUTINES truncates. Should you disagree, take it up with gbn.
Based on that list of dependencies, I'd create new stored procedures starting the foundation stored procedures - those with the least dependencies. But I'd mind not to create stored procedures, prefixing the name with "sp_"
Verify the foundation procedures work identically to existing ones
Move to the next level of stored procedures - repeat steps 1-3 as needed till the highest level procedure has been processed.
Test the switch over the application uses to the new procedure - don't wait until the all the procedures are updated to test interaction with the application code. This doesn't need to be done for every stored procedure, but waiting to do this wholesale isn't a great approach either.
Developing in parallel has it's risks too:
Any changes to existing code needs to also be applied to the new code. If possible, work in areas where development is frozen or use a bug fix as an opportunity to migrate to new code rather than apply the patch in two places (while also minimizing downtime for transition).
Use a utility like FileSeek to search the contents inside each and every file in your project folder. Don't trust the windows search - it's slow and user-unfriendly.
So if you had a Stored Procedure named OldSprocOne and want to rename it to SP_NewONe, search all occurrences Of OldSprocOne then search all occurrences of OldSprocOne to see if that name isn't already being used somewhere else and won't cause problems. Then rename each and every occurrence in the code.
This can be very time consuming and repetitive for larger systems.
I would be more concerned about ignoring the names of the procedures and replacing your legacy DAL with Enterprise Library Data Access Block 5
Database Accessors in Enterprise Library 5 DAAB - Database.ExecuteSprocAccessor
Having code that is like
public Contact FetchById(int id)
{
return _database.ExecuteSprocAccessor<Contact>
("FetchContactById", id).SingleOrDefault();
}
Will have atleast a billion times more value than having stored procs with consistent names, especially if the current code passes around DataTables or DataSets ::shudders::
I'me all in favor of refactoring any sort of code.
What you really need here is a method slowly and incrementally renaming your stored procs.
I certainly would not do a global find and replace.
Rather, as you identify small pieces of functionality and understand the relationships between the procs, you can re-factor in small pieces.
Fundamental to this process, though, is source-code control of your database.
If you do not manage changes to your database the same as normal code, you will be in serious trouble.
Have a look at DBSourceTools. http://dbsourcetools.codeplex.com
It's specifically designed to help developers get their databases under source code control.
You need a repeatable method of restoring your database to a specific state - prior to refactoring.
Then re-apply your refactored changes in a controlled way.
Once you have embraced this mindset, this mammoth and error-prone task will become simple.
This is assuming that you use SQL Server 2005 or above. An option that I have used before is to rename the old database object and create a SQL Server Synonym with the old name. This will allow for you to update your objects to whatever convention you choose and replace the refrences in code, SSIS packages, etc... as you come along them. Then you can concentrate updating the references in your code gradually over however maintenance releases you choose (as opposed to breaking them all at once). As you feel that you've found all references you can remove the synonym as the code goes to QA.

What are the best practices for database scripts under code control

We are currently reviewing how we store our database scripts (tables, procs, functions, views, data fixes) in subversion and I was wondering if there is any consensus as to what is the best approach?
Some of the factors we'd need to consider include:
Should we checkin 'Create' scripts or checkin incremental changes with 'Alter' scripts
How do we keep track of the state of the database for a given release
It should be easy to build a database from scratch for any given release version
Should a table exist in the database listing the scripts that have run against it, or the version of the database etc.
Obviously it's a pretty open ended question, so I'm keen to hear what people's experience has taught them.
After a few iterations, the approach we took was roughly like this:
One file per table and per stored procedure. Also separate files for other things like setting up database users, populating look-up tables with their data.
The file for a table starts with the CREATE command and a succession of ALTER commands added as the schema evolves. Each of these commands is bracketed in tests for whether the table or column already exists. This means each script can be run in an up-to-date database and won't change anything. It also means that for any old database, the script updates it to the latest schema. And for an empty database the CREATE script creates the table and the ALTER scripts are all skipped.
We also have a program (written in Python) that scans the directory full of scripts and assembles them in to one big script. It parses the SQL just enough to deduce dependencies between tables (based on foreign-key references) and order them appropriately. The result is a monster SQL script that gets the database up to spec in one go. The script-assembling program also calculates the MD5 hash of the input files, and uses that to update a version number that is written in to a special table in the last script in the list.
Barring accidents, the result is that the database script for a give version of the source code creates the schema this code was designed to interoperate with. It also means that there is a single (somewhat large) SQL script to give to the customer to build new databases or update existing ones. (This was important in this case because there would be many instances of the database, one for each of their customers.)
There is an interesting article at this link:
https://blog.codinghorror.com/get-your-database-under-version-control/
It advocates a baseline 'create' script followed by checking in 'alter' scripts and keeping a version table in the database.
The upgrade script option
Store each change in the database as a separate sql script. Store each group of changes in a numbered folder. Use a script to apply changes a folder at a time and record in the database which folders have been applied.
Pros:
Fully automated, testable upgrade path
Cons:
Hard to see full history of each individual element
Have to build a new database from scratch, going through all the versions
I tend to check in the initial create script. I then have a DbVersion table in my database and my code uses that to upgrade the database on initial connection if necessary. For example, if my database is at version 1 and my code is at version 3, my code will apply the ALTER statements to bring it to version 2, then to version 3. I use a simple fallthrough switch statement for this.
This has the advantage that when you deploy a new version of your application, it will automatically upgrade old databases and you never have to worry about the database being out of sync with the software. It also maintains a very visible change history.
This isn't a good idea for all software, but variations can be applied.
You could get some hints by reading how this is done with Ruby On Rails' migrations.
The best way to understand this is probably to just try it out yourself, and then inspecting the database manually.
Answers to each of your factors:
Store CREATE scripts. If you want to checkout version x.y.z then it'd be nice to simply run your create script to setup the database immediately. You could add ALTER scripts as well to go from the previous version to the next (e.g., you commit version 3 which contains a version 3 CREATE script and a version 2 → 3 alter script).
See the Rails migration solution. Basically they keep the table version number in the database, so you always know.
Use CREATE scripts.
Using version numbers would probably be the most generic solution — script names and paths can change over time.
My two cents!
We create a branch in Subversion and all of the database changes for the next release are scripted out and checked in. All scripts are repeatable so you can run them multiple times without error.
We also link the change scripts to issue items or bug ids so we can hold back a change set if needed. We then have an automated build process that looks at the issue items we are releasing and pulls the change scripts from Subversion and creates a single SQL script file with all of the changes sorted appropriately.
This single file is then used to promote the changes to the Test, QA and Production environments. The automated build process also creates database entries documenting the version (branch plus build id.) We think this is the best approach with enterprise developers. More details on how we do this can be found HERE
The create script option:
Use create scripts that will build you the latest version of the database from scratch, which is empty except the default lookup data.
Use standard version control techniques to store,branch,tag versions and view histories of your objects.
When upgrading a live database (where you don't want to loose data), create a blank second copy of the database at the new version and use a tool like red-gate's link text
Pros:
Changes to files are tracked in a standard source-code like manner
Cons:
Reliance on manual use of a 3rd party tool to do actual upgrades (no/little automation)
Our company checks them in simply because someone decided to put it in some SOX document that we do. It makes no sense to me at all, except possible as a reference document. I can't see a time we'd pull them out and try and use them again, and if we did we'd have to know which one ran first and which one to run after which. Backing up the database is much more important then keeping the Alter scripts.
for every release we need to give one update.sql file which contains all the new table scripts, alter statements, new/modified packages,roles,etc. This file is used to upgrade the database from 1 version to 2.
What ever we include in update.sql file above one all this statements need to go to individual respective files. like alter statement has to go to table as a new column (table script has to be modifed not Alter statement is added after create table script in the file) in the same way new tables, roles etc.
So whenever if user wants to upgrade he will use the first update.sql file to upgrade.
If he want to build from scrach then he will use the build.sql which already having all the above statements, it makes the database in sync.
sriRamulu
Sriramis4u#yahoo.com
In my case, I build a SH script for this work: https://github.com/reduardo7/db-version-updater
How is an open question
In my case I am trying to create something simple that is easy to use for developers and I do it under the following scheme
Things I tested:
File-based script handling in git using GitlabCI
It does not work, collisions are created and the Administration part has to be done by hand in case of disaster and the development part is too complicated
Use of permissions and access via mysql clients
There is no traceability on changes to the database and the transition to production is manual
Use of programs mentioned here
They require uploading the structures and many adaptations and usually you end up with change control just like the word
Repository usage
Could not control the DRP part
I could not properly control the backups
I don't think it is a good idea to have the backups on the same server and you generate high lasgs for the process
This was what worked best
Manage permissions per user and generate traceability of everything that is sent to the database
Multi platform
Use of development-Production-QA database
Always support before each modification
Manage an open repository for change control
Multi-server
Deactivate / Activate access to the web page or App through Endpoints
the initial project is in:
In case the comment manager reads this part, I understand the self-promotion but please just remove this part and leave the rest since I think it complies with the answer to the question reacted in the post ...
https://hub.docker.com/r/arelis/gitdb
I hope this reaches you since I see that several
There is an interesting article with new URL at: https://blog.codinghorror.com/get-your-database-under-version-control/
It a bit old but the concepts are still there. Good Read!

Resources