Database - Version Control - Managing dropped/deleted objects - sql-server

We want to clean up our database schema and drop/delete objects which are no longer being used.
We suspect that sometime in the future we'll want to resurrect the removed functionality.
We've discussed the following options for dealing with dropped objects in version control:
Deleting the .sql files from source control once they are gone from the database and relying on the version history to store the definitions. Our concern with this approach is that sometime over the years source control will be moved and we will lose the history. It also seems difficult to know what to look for to recover if we can't see all the dropped objects.
Leaving the .sql files in source control but updating the definitions to "drop proc {someproc}". With this approach we our concerned about leaving the objects in version control which no longer exists and also the risk to losing the history if the vcs was moved
Creating a new repo for dropped objects and migrating .sql files to this repo once they have been dropped from SQL Server.
We're working in a windows environment and are fairly new to working with VCS for databases. Currently GIT + SSDT.
Currently option 3 is our preferred approach.

I see this a lot with database code, what happens is over time people end up with stuff in the database that is either not used or just does not work (think a proc that references a table and the table is modified but not the proc).
The thing to do is to get everything in source control (which it looks like you have) and then create a tag or branch of all the code before and after deleting it so you can get it back.
Two things normally transpire, either the code was genuinely never used or it was used at year end and when you find out, the world is about to fall on your head so better have a quick way to get it back.
Of course if you had a full suite of tests then even the year end process would be safe :)
I personally wouldn't use option 3, I would just keep the history in the main branch so you keep the history with it.
ed

There are a lot of good tools for versioning database changes: you have a big chance to get this question closed with "Too broad" reason, but I'll try to suggest to
Read about, understand and try to add Liquibase to your Development-Toolbox
Adopt your workflow for using this additional layer - technically it will be one more file (changelog in terms of Liquibase) in changesets, where you changing DD and|or data.
These changelogs provide good and smooth way of moving back and forth in linear history of changes in databases, not so good (or I don't know The Right Way) for direct jumping between nodes of diverged history, but it seems not your case
From your options-list it will be more p.1, than others (but it's storing changes in database in version-contol, not states)

Just to note another option, in SSDT you can mark the file property as Build Action = None. The file won't be included in the dacpac when this build option is selected. But I tend to agree with the idea that you should rely on your VCS to handle history.

Related

Is it possible to use Git as source control for code stored in a database?

I work on Labware LIMS, which has both configuration, and customization via its own programming language and internal code editor, and stores this customization code in database records. (Note, not the source code of the actual application itself, just the customization code a.k.a. LIMS Basic.) Almost everything in LIMS is stored in the database.
We want to investigate the possibility of using source control to protect this code but we don't know much more than the theory of using something like Git. (I have worked as a junior QA and used git but not as a dev and my knowledge is limited!)
Of particular use would be the merging tools, as currently we have to manually merge code in a text editor, if we even notice there is a conflict (checking content between dev and live is time consuming and involves using multiple tools, some of which are 3rd party tools we have developed ourselves, which are hit and miss. I personally find it easiest to cut and paste into a text file and then use Beyond Compare.
There is no notification that the code is different when moving it from dev to live (no deployment as such, you just import an xml file) so we often have things going live that someone was working on unbeknownst to each other. I.e. dev 1 is working on the code in object 1, dev 2 gets a ticket to make a change to object 1, does so and puts their change Live, whatever dev 1 was doing is now also Live in whatever state it was in. (Because we don't always have time to thoroughly check what state each object is in between up to 3 different databases.)
Is it possible to use source control just on the code within the database, but not necessarily the database itself? (We have backups and such for that but its easy for some aspects of the system to get overwritten by multiple devs working on overlapping areas at the same time.)
If anyone reading this has any specific knowledge of LW LIMS, we are referring to the Subroutines mostly, we have versioned Analyses which stands in for source control for the moment and is somewhat effective but no way to control who is doing what on the subroutines other than a comment log at the top. I have tried to find any information on how other teams source control their code in LIMS but to no avail.
The structure of one of these tables can range from as simple as the code just existing in one field as a straight text dump with a few other fields such as changed_on, changed_by and name (Subroutines), or more complex with code relating to one record being sprinkled around in multiple rows on another table entirely (Analyses) but even if it could just deal with the simple scenario to start with that would be great!
TL;DR: Could the contents of the Code field in a database record be treated like a regular code object in other dev environments somehow and source controlled using Git? (And is anyone willing to explain it simply for me to follow?)
As you need to version control table fields of subroutine, but LW LIMS doesn’t have the IDE for version control (such as git, svn etc). So the direct answer is no.
If you really want to do version control for the codes in database, you can create a git repository and only put the codes in git repository. when a file has updated, you can commit & push the changes. And it’s easy to compare the difference between versions.
More detail about git, you can refer git book.
LabWare LIMS has a number of options for version control. You COULD version the Subroutine table by adding a SUBROUTINE.VERSION field to the table, this works the same way as other versioned tables in LabWare where it asks you if you would like to create a new version of the object before saving. There are a few customers I work with that have done this.
Alternatively, (and possibly our more recommended method prior to LEM) there is the Snapshot capability where the system automatically takes a "snapshot" of objects as they are saved - when viewing these you have the ability to view them side by side in a comparison dialogue - it will show < or > for lines which are different.
Another approach is, if you have auditing turned on you are able to view the audit history for changes to specific objects - this includes subroutines.
One other approach is to use configuration packages - this has the ability to record version AND build numbers. Though individual subroutines is probably a bit too granular for it's intended design.
Lastly, since this question was originally posted we have developed a product called LabWare Environment Manager (LEM) which has some good change control functionality built-in.
For more information on the suggestions above, please have a look at the LabWare Technical manual for the version you are on. We also have a mailing list for questions like this to be posted. You might find an answer there. If you have access to our Support webpage you're able to search previous questions that have been asked. I'd also suggest that you get in touch with your Account Manager at LabWare who can help you answer some of your questions.
HTH

Creating the Front End MDE

I created a database for tracking metrics, with some automation tricks (email, .doc,.ppt presentations, etc) with a very large Main-table, and lots of forms/GUI. This is the first time I have ever I worried about an MDE/front-end for the thing. So if you would be so kind to answer a few questions, or offer any advice, it would be greatly appreciated (I would hate for all this work to not be utilized).
What is the first thing I need to do? It the 2000 version that must be converted to 03 to create the MDE, but does that get done before I use the database splitter?
Will the amount of objects in the database effect the ability to do this? I have something like 80 forms, 70 queries, 20+ macros, 12 tables, etc...but does the amount of objects prevent some of this from working well once the front end is there?
when i split the database, can I continue to work/make changes and such on the "back end", and have those changes directly effect the front end?
These may be some basic questions, but I don't know the answer so.....Thanks!
Here is my 2 ¢.
Question 1 - I have never used the database splitter as I feel I have more control doing it manually. If you do it manually you can do it to a version that does not have a database splitter. But if you do use the splitter then--yes--you will have to upgrade to a version that has a splitter before doing it.
To do it manually here are the steps.
Backup everything.
Create a copy of your file into the same directory. So if you have an MyApp.MDB create a copy into the same directory with a new name, such as MyAppDATA.mdb.
Open the new DATA file (MyAppDATA.mdb) and delete all of the objects EXCEPT the TABLES.
Open the App file (MyApp.mdb) and delete all of the tables.
Also in MyApp.mdb...go to the File/Get External Data/Link Tables menu to link the tables in MyAppDATA.mdb to MyApp.mdb. Select All and create the links.
That should do it. And if you screw up you made a backup...right?
A couple of tips and gotchas...be sure that you go to Tools/Options and that you are NOT showing System and Hidden tables. You just don't want to delete system tables from MyApp. Another way to do it is do NOT delete tables that start with MSys or USys.
Question 2 - Does not matter how many object you have. In fact you don't have that many objects anyway.
Question 3 - Yes...you will make backend changes in MyAppData.mdb and when you open MyApp.mdb those changes will auto-magically be there to see and query against etc. (In the query designer you may need to save/close/reopen to see new fields if you made the mod while in the query). The EXCEPTION to that is New Tables You will have to use the File/Get External Data/Link Tables option to create links to new tables.
One thing to remember (and that I hope you already realize) is that the one downside of splitting the database is that when you deploy the front end file that usually the relative path to the data will vary from machine to machine and there is no automatic re-linking of tables in access. If your target clients have full access you can always use Tools/Database Utilities/Linked Table Manager to refresh the links to the right location. If you can't do that then you will have to do one of the following:
1. Write code that does the automatic re-linking for you. Basically it will check the links...if invalid it will prompt the user for the data location (or look it up in an INI file) and re-link the tables.
2. Always deploy your app to the same location on all machines. If you have commercial visions for your application this won't work...I mention it for academic reasons. It might be doable for a limited deployment where you have a lot of control over file placement on each machine.
3. Put the Data file (MyAppDATA.mdb) onto a network share and link the table across the network using a drive mapping or UNC (\myserver\mydata\ApplicationData\MyAppData.mdb). The latter is preferred but both of them run the same risks as number two.
Seth
PS This answer assumes Access 2003.
PPS If you have commercial visions for your application then the table linking has got to be REALLY robust.
PPPS I agree with the commenter that you may want to take the plunge and do SQL if it is in your skill set.
One thing that hasn't been discussed, and that's the issue of whether the compile to MDE could fail. Basically, if your code compiles in your front-end MDB, it will convert to an MDE. But I've noticed that lots of people never compile.
Some hints for keeping your VBA code in good shape:
in VBE options, turn off COMPILE ON DEMAND.
add the COMPILE button to your standard VBE toolbar and USE IT OFTEN.
periodically, backup your MDB and decompile/recompile it.
Also, remember that you must keep the MDB source, as the VBA code is not editable in an MDE and not recoverable by any good method.
EDIT:
Steps for a decompile:
backup your MDB.
start an instance of Access with the /decompile commandline argument. For, instance, I have a shortcut on my deskstop that has this as the target:
"C:\Program Files\Microsoft Office\OFFICE11\MSACCESS.EXE" /decompile
having opened that instance of Access, open the MDB you want to decompile. You will see nothing happen. DO NOTHING FURTHER IN THIS INSTANCE OF ACCESS -- close this instance of Access (the reason for this is that Michael Kaplan, who knows a thing or two about this, recommended that you never do any work in an Access instance opened with the decompile switch because he said there was no guarantee that the Access application code executed under those circumstances in a way that was fully safe for all kinds of Access work).
open the just-decompiled MDB holding down the shift key (you want to be sure that startup routines don't run because that would likely recompile the product before you've finished your cleanup) and compact the MDB (holding down the shift key again).
open the code editor and compile the project (DEBUG -> COMPILE [db name] for those who haven't step #2 in my original compiling instructions at the top of the post before the edit).
compact the MDB (doesn't matter if you bypass startup, since it's already fully compiled).
Why so many steps?
Because the purpose of the decompile is to get rid of the compiled p-code in order to start afresh from the canonical VBA code. Following the steps above insures that you have completely cleared the data pages storing the compiled code before you recompile. The reason for this is that without the compact step after the decompile, under some very rare circumstances, the code can behave strangely. I can't imagine that the old discarded p-code is being used again, but there's something about the pointers between the canonical code and the compiled code that apparently doesn't get completely flushed by a decompile without a compact.
This would be a comment to Seth's answer, but my rep isn't high enough to comment yet.
Seth did a great job answering your questions, I just wanted to add a bit more to part #1 about using the Database Splitter. The Database Splitter in the Tools menu works fine. Doing it manually is alright too, but it's a whole lot faster and easier to use the Database Splitter. I've used it a dozen times and never encountered any issues after using it.
http://www.databasedev.co.uk/split_a_database.html has a decent page about some of the pros, cons of splitting your database.
http://www.accessmvp.com/TWickerath/articles/multiuser.htm also has some good info when dealing with a split database in a multi-user environment.
Seth gave you a very good answer. But I'll add a few comments.
The number of objects only becomes relevant when you get close to about 1000 forms, reports and modules which have code. There's a limit about there. If you do get that message when trying to make an MDE then you almost certainly have a code error and need to compile to find the error
Another resource is "Splitting your app into a front end and back end Tips"
See the Auto FE Updater downloads page to make the process of distributing new FEs relatively painless.. The utility also supports Terminal Server/Citrix quite nicely.

How can I put a database under git (version control)?

I'm doing a web app, and I need to make a branch for some major changes, the thing is, these changes require changes to the database schema, so I'd like to put the entire database under git as well.
How do I do that? is there a specific folder that I can keep under a git repository? How do I know which one? How can I be sure that I'm putting the right folder?
I need to be sure, because these changes are not backward compatible; I can't afford to screw up.
The database in my case is PostgreSQL
Edit:
Someone suggested taking backups and putting the backup file under version control instead of the database. To be honest, I find that really hard to swallow.
There has to be a better way.
Update:
OK, so there' no better way, but I'm still not quite convinced, so I will change the question a bit:
I'd like to put the entire database under version control, what database engine can I use so that I can put the actual database under version control instead of its dump?
Would sqlite be git-friendly?
Since this is only the development environment, I can choose whatever database I want.
Edit2:
What I really want is not to track my development history, but to be able to switch from my "new radical changes" branch to the "current stable branch" and be able for instance to fix some bugs/issues, etc, with the current stable branch. Such that when I switch branches, the database auto-magically becomes compatible with the branch I'm currently on.
I don't really care much about the actual data.
Take a database dump, and version control that instead. This way it is a flat text file.
Personally I suggest that you keep both a data dump, and a schema dump. This way using diff it becomes fairly easy to see what changed in the schema from revision to revision.
If you are making big changes, you should have a secondary database that you make the new schema changes to and not touch the old one since as you said you are making a branch.
I'm starting to think of a really simple solution, don't know why I didn't think of it before!!
Duplicate the database, (both the schema and the data).
In the branch for the new-major-changes, simply change the project configuration to use the new duplicate database.
This way I can switch branches without worrying about database schema changes.
EDIT:
By duplicate, I mean create another database with a different name (like my_db_2); not doing a dump or anything like that.
Use something like LiquiBase this lets you keep revision control of your Liquibase files. you can tag changes for production only, and have lb keep your DB up to date for either production or development, (or whatever scheme you want).
Irmin (branching + time travel)
Flur.ee (immutable + time travel + graph query)
XTDB (formerly called 'CruxDB') (time travel + query)
TerminusDB (immutable + branching + time travel + Graph Query!)
DoltDB (branching + time-travel + SQL query)
Quadrable (branching + remote state verification)
EdgeDB (no real time travel, but migrations derived by the compiler after schema changes)
Migra (diffing for Postgres schemas/data. Auto-generate migration scripts, auto-sync db state)
ImmuDB (immutable + time-travel)
I've come across this question, as I've got a similar problem, where something approximating a DB based Directory structure, stores 'files', and I need git to manage it. It's distributed, across a cloud, using replication, hence it's access point will be via MySQL.
The gist of the above answers, seem to similarly suggest an alternative solution to the problem asked, which kind of misses the point, of using Git to manage something in a Database, so I'll attempt to answer that question.
Git is a system, which in essence stores a database of deltas (differences), which can be reassembled, in order, to reproduce a context. The normal usage of git assumes that context is a filesystem, and those deltas are diff's in that file system, but really all git is, is a hierarchical database of deltas (hierarchical, because in most cases each delta is a commit with at least 1 parents, arranged in a tree).
As long as you can generate a delta, in theory, git can store it. The problem is normally git expects the context, on which it's generating delta's to be a file system, and similarly, when you checkout a point in the git hierarchy, it expects to generate a filesystem.
If you want to manage change, in a database, you have 2 discrete problems, and I would address them separately (if I were you). The first is schema, the second is data (although in your question, you state data isn't something you're concerned about). A problem I had in the past, was a Dev and Prod database, where Dev could take incremental changes to the schema, and those changes had to be documented in CVS, and propogated to live, along with additions to one of several 'static' tables. We did that by having a 3rd database, called Cruise, which contained only the static data. At any point the schema from Dev and Cruise could be compared, and we had a script to take the diff of those 2 files and produce an SQL file containing ALTER statements, to apply it. Similarly any new data, could be distilled to an SQL file containing INSERT commands. As long as fields and tables are only added, and never deleted, the process could automate generating the SQL statements to apply the delta.
The mechanism by which git generates deltas is diff and the mechanism by which it combines 1 or more deltas with a file, is called merge. If you can come up with a method for diffing and merging from a different context, git should work, but as has been discussed you may prefer a tool that does that for you. My first thought towards solving that is this https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration#External-Merge-and-Diff-Tools which details how to replace git's internal diff and merge tool. I'll update this answer, as I come up with a better solution to the problem, but in my case I expect to only have to manage data changes, in-so-far-as a DB based filestore may change, so my solution may not be exactly what you need.
There is a great project called Migrations under Doctrine that built just for this purpose.
Its still in alpha state and built for php.
http://docs.doctrine-project.org/projects/doctrine-migrations/en/latest/index.html
Take a look at RedGate SQL Source Control.
http://www.red-gate.com/products/sql-development/sql-source-control/
This tool is a SQL Server Management Studio snap-in which will allow you to place your database under Source Control with Git.
It's a bit pricey at $495 per user, but there is a 28 day free trial available.
NOTE
I am not affiliated with RedGate in any way whatsoever.
I've released a tool for sqlite that does what you're asking for. It uses a custom diff driver leveraging the sqlite projects tool 'sqldiff', UUIDs as primary keys, and leaves off the sqlite rowid. It is still in alpha so feedback is appreciated.
Postgres and mysql are trickier, as the binary data is kept in multiple files and may not even be valid if you were able to snapshot it.
https://github.com/cannadayr/git-sqlite
I want to make something similar, add my database changes to my version control system.
I am going to follow the ideas in this post from Vladimir Khorikov "Database versioning best practices". In summary i will
store both its schema and the reference data in a source control system.
for every modification we will create a separate SQL script with the changes
In case it helps!
You can't do it without atomicity, and you can't get atomicity without either using pg_dump or a snapshotting filesystem.
My postgres instance is on zfs, which I snapshot occasionally. It's approximately instant and consistent.
I think X-Istence is on the right track, but there are a few more improvements you can make to this strategy. First, use:
$pg_dump --schema ...
to dump the tables, sequences, etc and place this file under version control. You'll use this to separate the compatibility changes between your branches.
Next, perform a data dump for the set of tables that contain configuration required for your application to operate (should probably skip user data, etc), like form defaults and other data non-user modifiable data. You can do this selectively by using:
$pg_dump --table=.. <or> --exclude-table=..
This is a good idea because the repo can get really clunky when your database gets to 100Mb+ when doing a full data dump. A better idea is to back up a more minimal set of data that you require to test your app. If your default data is very large though, this may still cause problems though.
If you absolutely need to place full backups in the repo, consider doing it in a branch outside of your source tree. An external backup system with some reference to the matching svn rev is likely best for this though.
Also, I suggest using text format dumps over binary for revision purposes (for the schema at least) since these are easier to diff. You can always compress these to save space prior to checking in.
Finally, have a look at the postgres backup documentation if you haven't already. The way you're commenting on backing up 'the database' rather than a dump makes me wonder if you're thinking of file system based backups (see section 23.2 for caveats).
What you want, in spirit, is perhaps something like Post Facto, which stores versions of a database in a database. Check this presentation.
The project apparently never really went anywhere, so it probably won't help you immediately, but it's an interesting concept. I fear that doing this properly would be very difficult, because even version 1 would have to get all the details right in order to have people trust their work to it.
This question is pretty much answered but I would like to complement X-Istence's and Dana the Sane's answer with a small suggestion.
If you need revision control with some degree of granularity, say daily, you could couple the text dump of both the tables and the schema with a tool like rdiff-backup which does incremental backups. The advantage is that instead of storing snapshots of daily backups, you simply store the differences from the previous day.
With this you have both the advantage of revision control and you don't waste too much space.
In any case, using git directly on big flat files which change very frequently is not a good solution. If your database becomes too big, git will start to have some problems managing the files.
Here is what i am trying to do in my projects:
separate data and schema and default data.
The database configuration is stored in configuration file that is not under version control (.gitignore)
The database defaults (for setting up new Projects) is a simple SQL file under version control.
For the database schema create a database schema dump under the version control.
The most common way is to have update scripts that contains SQL Statements, (ALTER Table.. or UPDATE). You also need to have a place in your database where you save the current version of you schema)
Take a look at other big open source database projects (piwik,or your favorite cms system), they all use updatescripts (1.sql,2.sql,3.sh,4.php.5.sql)
But this a very time intensive job, you have to create, and test the updatescripts and you need to run a common updatescript that compares the version and run all necessary update scripts.
So theoretically (and thats what i am looking for) you could
dumped the the database schema after each change (manually, conjob, git hooks (maybe before commit))
(and only in some very special cases create updatescripts)
After that in your common updatescript (run the normal updatescripts, for the special cases) and then compare the schemas (the dump and current database) and then automatically generate the nessesary ALTER Statements. There some tools that can do this already, but haven't found yet a good one.
What I do in my personal projects is, I store my whole database to dropbox and then point MAMP, WAMP workflow to use it right from there.. That way database is always up-to-date where ever I need to do some developing. But that's just for dev! Live sites is using own server for that off course! :)
Storing each level of database changes under git versioning control is like pushing your entire database with each commit and restoring your entire database with each pull.
If your database is so prone to crucial changes and you cannot afford to loose them, you can just update your pre_commit and post_merge hooks.
I did the same with one of my projects and you can find the directions here.
That's how I do it:
Since your have free choise about DB type use a filebased DB like e.g. firebird.
Create a template DB which has the schema that fits your actual branch and store it in your repository.
When executing your application programmatically create a copy of your template DB, store it somewhere else and just work with that copy.
This way you can put your DB schema under version control without the data. And if you change your schema you just have to change the template DB
We used to run a social website, on a standard LAMP configuration. We had a Live server, Test server, and Development server, as well as the local developers machines. All were managed using GIT.
On each machine, we had the PHP files, but also the MySQL service, and a folder with Images that users would upload. The Live server grew to have some 100K (!) recurrent users, the dump was about 2GB (!), the Image folder was some 50GB (!). By the time that I left, our server was reaching the limit of its CPU, Ram, and most of all, the concurrent net connection limits (We even compiled our own version of network card driver to max out the server 'lol'). We could not (nor should you assume with your website) put 2GB of data and 50GB of images in GIT.
To manage all this under GIT easily, we would ignore the binary folders (the folders containing the Images) by inserting these folder paths into .gitignore. We also had a folder called SQL outside the Apache documentroot path. In that SQL folder, we would put our SQL files from the developers in incremental numberings (001.florianm.sql, 001.johns.sql, 002.florianm.sql, etc). These SQL files were managed by GIT as well. The first sql file would indeed contain a large set of DB schema. We don't add user-data in GIT (eg the records of the users table, or the comments table), but data like configs or topology or other site specific data, was maintained in the sql files (and hence by GIT). Mostly its the developers (who know the code best) that determine what and what is not maintained by GIT with regards to SQL schema and data.
When it got to a release, the administrator logs in onto the dev server, merges the live branch with all developers and needed branches on the dev machine to an update branch, and pushed it to the test server. On the test server, he checks if the updating process for the Live server is still valid, and in quick succession, points all traffic in Apache to a placeholder site, creates a DB dump, points the working directory from 'live' to 'update', executes all new sql files into mysql, and repoints the traffic back to the correct site. When all stakeholders agreed after reviewing the test server, the Administrator did the same thing from Test server to Live server. Afterwards, he merges the live branch on the production server, to the master branch accross all servers, and rebased all live branches. The developers were responsible themselves to rebase their branches, but they generally know what they are doing.
If there were problems on the test server, eg. the merges had too many conflicts, then the code was reverted (pointing the working branch back to 'live') and the sql files were never executed. The moment that the sql files were executed, this was considered as a non-reversible action at the time. If the SQL files were not working properly, then the DB was restored using the Dump (and the developers told off, for providing ill-tested SQL files).
Today, we maintain both a sql-up and sql-down folder, with equivalent filenames, where the developers have to test that both the upgrading sql files, can be equally downgraded. This could ultimately be executed with a bash script, but its a good idea if human eyes kept monitoring the upgrade process.
It's not great, but its manageable. Hope this gives an insight into a real-life, practical, relatively high-availability site. Be it a bit outdated, but still followed.
Update Aug 26, 2019:
Netlify CMS is doing it with GitHub, an example implementation can be found here with all information on how they implemented it netlify-cms-backend-github
I say don't. Data can change at any given time. Instead you should only commit data models in your code, schema and table definitions (create database and create table statements) and sample data for unit tests. This is kinda the way that Laravel does it, committing database migrations and seeds.
I would recommend neXtep (Link removed - Domain was taken over by a NSFW-Website) for version controlling the database it has got a good set of documentation and forums that explains how to install and the errors encountered. I have tested it for postgreSQL 9.1 and 9.3, i was able to get it working for 9.1 but for 9.3 it doesn't seems to work.
Use a tool like iBatis Migrations (manual, short tutorial video) which allows you to version control the changes you make to a database throughout the lifecycle of a project, rather than the database itself.
This allows you to selectively apply individual changes to different environments, keep a changelog of which changes are in which environments, create scripts to apply changes A through N, rollback changes, etc.
I'd like to put the entire database under version control, what
database engine can I use so that I can put the actual database under
version control instead of its dump?
This is not database engine dependent. By Microsoft SQL Server there are lots of version controlling programs. I don't think that problem can be solved with git, you have to use a pgsql specific schema version control system. I don't know whether such a thing exists or not...
Use a version-controlled database, of which there are now several.
https://www.dolthub.com/blog/2021-09-17-database-version-control/
These products don't apply version control on top of another type of database -- they are their own database engines that support version control operations. So you need to migrate to them or start building on them in the first place.
I write one of them, DoltDB, which combines the interfaces of MySQL and Git. Check it out here:
https://github.com/dolthub/dolt
I wish it were simpler. Checking in the schema as a text file is a good start to capture the structure of the DB. For the content, however, I have not found a cleaner, better method for git than CSV files. One per table. The DB can then be edited on multiple branches and merges extremely well.

What is a good way to implement an agile database process, which is in synch with the code base, especially in regards to continuous integration? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
The project I am working on were are trying to come up with a solution for having the database and code be agile and be able to be built and deployed together.
Since the application is a combination of code plus the database schema, and database code tables, you can not truly have a full build of the application unless you have a database that is versioned along with the code.
We have not yet been able to come up with a good agile method of doing the database development along with the code in an agile/scrum environment.
Here are some of my requirements:
I want to be able to have a svn revision # that corresponds to a complete build of the system.
I do not want to check in binary files into source control for the database.
Developers need to be able to commit code to the continuous integration server and build the entire system and database together.
Must be able to automate deployment to different environments without doing a rebuild other than the original build on the build server.
(Update)
I'll add some more info here to explain a bit further.
No OR/M tool, since its a legacy project with a huge amount of code.
I have read the agile database design information, and that process in isolation seems to work, but I am talking about combining it with active code development.
Here are two scenario's
Developer checks in a code change, that requires a database change. The developer should be able to check in a database change at the same time, so that the automated build doesn't fail.
Developer checks in a DB change, that should break code. The automated build needs to run and fail.
The biggest problem is, how do these things synch up. There is no such thing as "checking in a database change". Right now the application of the DB changes is a manual process someone has to do, while code change are constantly being made. They need to be made together and checked together, the build system needs to be able to build the entire system.
(Update 2)
One more add here:
You can't bring down production, you must patch it. Its not acceptable to rebuild the entire production database.
You need a build process that constructs the database schema and adds any necessary bootstrapping data. If you're using an O/R tool that supports schema generation, most of that work is done for you. Whatever is not tool-generated, keep in scripts.
For continuous integration, ideally a "build" should include a complete rebuild of the database and a reload of static testing data.
I just saw that you have no ORM tool... here's what we had at a company I used to work for
db/
db/Makefile (run `make` to rebuild db from scratch, `make clean` to close db)
db/01_type.sql
db/02_table.sql
db/03_function.sql
db/04_view.sql
db/05_index.sql
db/06_data.sql
Arrange however necessary... each of those *.sql scripts would be run in order to generate the structure. Developers each had local copies of the DB, and any DB change was just another code change, nothing special.
If you're working on a project that already has a build process (Java, C, C++), this is second nature. If you're using scripts in such a way that there is no build process at all, this'll be a bit of extra work.
"There is no such thing as "checking in a database change"."
Actually, I think you can check in database change. The trick is to stop using simple -- unversioned -- schema and table names.
If you have a version number attached to a schema as a whole (or a table), then you can easily have a version check-in.
Note that database versions doesn't have fancy major-minor-release. The "major" revision in application software usually reflects a basic level of compatibility. That basic level of compatibility should be defined as "uses the same data model".
So app version 2.23 and 2.24 use the version 2 of a the database schema.
The version check-in has two parts.
The new table. For example, MyTable_8 is version 8 of a given table.
The migration script. For example MyTable_8 includes a MyTable_7 to MyTable_8 script which moves the data, providing defaults or whatever is required.
There are several ways this is used.
Compatible upgrades. When merely altering a table to add a column that permits nulls, the version number stays the same.
Incompatible upgrades. When adding non-null columns (that need initial values) or changing the fundamental shape of tables or data types of columns, you're making a big change and you have a migration script.
Note that the old data stays in place until explicitly dropped at the end of the change procedure. You have to run tests to assure that everything worked.
You might have two-part drop -- first rename, then (a week later) finally drop.
Make sure that your O/R-Mapping tool is able to build the necessary tables out of the default configuration it has and also add missing columns. This should cover 90% of your cases.
The other 10% are
coping with missing values for columns that where added after the data was inserted
write data-migration scripts for the rare case where you need to do more fundamental changes between versions
See the DBDeploy open source project. http://dbdeploy.com/
It allows you to check in database change scripts. It will then produce a consolidated change script including all changes that have not been applied.
The site describes the process pretty well.
This project is based on the techniques in the Martin Fowler article that was mentioned before. I was on the project that Martin based the article on. DbDeploy is a pretty good implementation of the process we used.
The migrations facility of Ruby on Rails was developed to handle exactly this need. If you're not using Rails for your application, you might see if this same concept has been ported to the framework of your choice, or read up on it and determine whether you could write some quick scripts that implement the same sort of functionality.

How to keep Stored Procedures and other scripts in SVN/Other repository?

Can anyone provide some real examples as to how best to keep script files for views, stored procedures and functions in a SVN (or other) repository.
Obviously one solution is to have the script files for all the different components in a directory or more somewhere and simply using TortoiseSVN or the like to keep them in SVN, Then whenever a change is to be made I load the script up in Management Studio etc. I don't really want this.
What I'd really prefer is some kind of batch script that I can run periodically (nightly?) that would export all the stored procedures / views etc that had changed in a given timeframe and then commit them to SVN.
Ideas?
Sounds like you're not wanting to use Revision Control properly, to me.
Obviously one solution is to have the
script files for all the different
components in a directory or more
somewhere and simply using TortoiseSVN
or the like to keep them in SVN
This is what should be done. You would have your local copy you are working on (Developing new, Tweaking old, etc) and as single components/procedures/etc get finished, you would commit them individually until you have to start the process over.
Committing half-done code just because it's been 'X' time since it was last committed is sloppy and guaranteed to cause anyone else using the repository grief.
I find it best to treat Stored Procedures just like any other compilable code: Code lives in the repository, you check it out to make changes and load it in your development tool to compile or deploy the code.
You can create a batch file and schedule it:
delete the contents of your scripts directory
using something like ExportSQLScript to export all objects to script/scripts
svn commit
Please note: That although you'll have the objects under source control, you'll not have the data or it's progression (is that a renamed field, or 1 new field and 1 deleted?).
This approach is fine for maintaining change history. But, of course, you should never be automatically committing to the "production build" (unless you like broken builds).
Although you didn't ask for it: This approach also won't produce a set of scripts that will upgrade a current DB. You'll only have initial creation scripts. Recording data progression and creation upgrade scripts is beyond basic source control systems.
I'd recommend Redgate SQL Compare for this - it allows you to compare database versions and generate change scripts - it's also fairly easily scriptable.
Based on your expanded question, you really want to use DDL triggers. Check out this article that details how to create a changelog system for your database.
Not sure on your price range, however DB Ghost could be an option for you.
I don't work for this company (or own the product) but in my researching of the same issue, this product looked quite promising.
I should've been a little more descriptive. The database in question is for an internal ERP system and thus we don't have many versions of our database, just Production/Testing/Development. When we've done a change request, some new fancy feature or something, we simply execute a script or series of scripts to update the procedures in question on the Testing database, if that is all good, then we do the same to Production.
So I'm not really after a full schema script per se, just something that can keep track of the various edits to the stored procedures over time. For example, PROCESS_INVOICE does stuff. It gets updated in some minor way in March. Some time later in say May it is discovered that in a rare case customers get double invoiced (or some other crazy corner case). I'd like to be able to see what has happened over time to this procedure. Currently the way the development environment is setup here I don't have that, which I'm trying to change.
I can recommend DBPro which is part of Visual Studio Team Edition. Have been using it for a few months for storing all parts of the database in Team Foundation Server as well as for deployment and database compares, etc.
Of course, as someone else mentioned, it does depend on your environment and price range.
I wrote a utility for dumping all of the relevant parts of my db into a directory structure that I use SVN on. I never got around to trying to incorporate it into the Manager but, if you're interested, it's here: http://www.reluctantdba.com/dbas-and-programmers/sqltools/svnforsql2005.aspx
It's free and, since I regularly run it, you know any bugs get fixed quickly.
You can always try integrating SourceSafe with SQL Server. Here's a quick start : link . To work with it you've got to have Managment Studio Developers Edition.

Resources