PostgreSQL development workflow - database

I am starting to build a new database for my project using PostgreSQL. (I am new to PostgreSQL and database by the way.)
I think my development workflow is very bad, and here is a part of it:
create table/view/function with pgAdmin.
determine the name of the file before saving the code.
The goal is to be able to recreate the database automatically by running all the saved scripts,
I need to know the order to run these scripts for dependency reason.
So I add a number for each file indicating the order.
for example: 001_create_role_user.ddl, 002_create_database_project.ddl, 013_user_table.ddl
save the code.
commit the file to repository using GIT.
Here are some bads I can think of:
I can easily forget what changes I made. For example, created a new type,
or edited comment
It is hard to determine a name (order) for the file.
Change the code would be a pain in the ass, especially when the new
code changes the order.
So my workflow is bad. I was wondering what other Postgres developers' workflow looks like.
Are there any good tools (free or cheap) for editing and saving scripts? good IDE maybe?
It would be great if I can create automated unit tests for the database.
Any tool for recreating the database? CI server tool?
Basically I am looking for any advice, good practice, or good tool for database development.
(Sorry, this question may not fit for the Q&A format, but I do not know where else to ask this question.)

Check out liquibase. We use it in the company I work at to setup our PostgreSQL database. It's open source, easy to use and the changelog file you end up with can be added to source control. Each changeset gets an id, so that each changeset is only run once. You end up with two extra tables for tracking the changes to the database when it's run.
While it's DB agnostic, you can use PostgreSQL SQL directly in each changeset and each changeset can have it's own comments.
The only caveat from having used it is that you have to caution yourself and others not to re-use a changeset once it's been applied to a database. Any changes to an already applied changeset result in a different checksum (even whitespace) which can cause liquibase to abort it's updates. This can end up in failed DB updates in the field, so each update to any of the changelogs should be tested locally first. Instead all changes, however minor should be inserted into a new changeset with a new id. They have a changeset sub-tag called "validCheckSum" to let you work around this, but I think it's better to try to enforce always making a new changeset.
Here are the doc links for creating a table and creating a view for example.

Well, your question is actually quite relevant to any database developer, and, if I understand it correctly, there is another way to get to your desired results.
One interesting thing to mention is that your idea of separating different changes into different files is the concept of migrations of Ruby On Rails. You might even be able to use the rake utility to keep track of a workflow like yours.
But now to what I think migh be your solution. PostgreSQL, and others to be sincere, have specific utilities to handle data and schemas like what you probably need.
The pg_dumpall command-line executable will dump the whole database into a file or the console, in a way that the psql utility can simply "reload" into the same, or into another (virgin) database.
So, if you want to keep only the current schema (no data!) of a running database cluster, you can, as the postgres-process owner user:
$ pg_dumpall --schema-only > schema.sql
Now the schema.sql will hold exactly the same users/databases/tables/triggers/etc, but not data. If you want to have a "full-backup" style dump (and that's one way to take a full backup of a database), just remove the "--schema-only" option from the command line.
You can reload the file into another (should be virgin, you might mess up a database with other data doing this):
$ psql -f schema.sql postgres
Now if you only want to dump one database, one table, etc. you should use the pg_dump utility.
$ pg_dump --schema-only <database> > database-schema.sql
And then, to reload the database into a running postgresql server:
$ psql <database> < database-schema.sql
As for version control, you can just keep the schema.sql file under it, and just dump the database again into the file before every vc commit. So at some particular version control state will you have the code and the working database schema that goes with it.
Oh, and all the tools I mentioned are free, and pg_dump and pg_dumpall come with the standard PostgreSQL installation.
Hope that helps,
Marco

You're not far off. I'm a Java developer, not a DBA, but building out the database as a project grows is an important task for the teams I've been on, and here's how I've seen it done best:
All DB changes are driven by DDL (SQL create, alter, or delete statements) plain text scripts. No changes through the DB client. Use a text editor that supports syntax highlighting like vim or notepad++, as the highlighting can help you find errors before you run the script.
Use a number at the beginning of each DDL script to define the order that scripts are run in. Base scripts have lower numbers. As the project grows, use alter new alter scripts to change the table, don't redefine the table in the initial script.
Use a script and the psql client to load the DDL scripts from lowest to highest. Here's the bash script we use. You can use it as a base for a .bat script on windows.
#!/bin/bash
export PGDATABASE=your_db export
export PGUSER=your_user export
export PGPASSWORD=your_password
for SQL_SCRIPT in $( find ./ -name "*.sql" -print | sort);
do
echo "**** $SQL_SCRIPT ****"
psql -q < $SQL_SCRIPT
done
As the project grows, use new alter scripts to change the table, don't redefine the table in the initial script.
All scripts are checked into source control. Each release is tagged so you can regenerate that version of the database in the future.
For unit testing and CI, most CI servers can run a script to drop and recreate a schema. One oft-cited framework for PostGresql unit testing is pgTAP

I'm a DBA and my workflow is almost equal the suggested by #Ireeder... but besides use a script shell to maintain the ddl scripts updated, I use a tool called dbmaintain DBMaintain
DbMaintain needs some configuration, but it is not a pain... It maintain control of which scripts were executed and in which order.
The principal benefit, is that if a script sql that were already executed changes, it complain by default, or execute just that script (if configured to do so)... The similar behavior works when you add a new script on the environment... it executes just that new script.
it's perfect to deploy and maintain development and production environments up to date... dont being necessary execute all scripts every time (like that shell suggested by Ireeder) or being necessary execute manually each new scripts.

If the changes are slotted you can create scripts that do the DDL changes and dump the expected database new state( version ).
pg_dump -f database-dump-production-yesterday.sql // all commands to create populate a startup
Today need to introduce a new table to new feature
psql -f change-production-for-today.sql // DDL and DML commands to make database reflect the new state
pg_dump --schema -f dump-production-today.sql // all new commands to create database for today app
psql -i sql-append-table-needed-data-into-dump.sql -f dump-production-today.sql
All developers should use the new database create script from now on development.

Related

Liquibase: UPDATE script (when some changes has already implemented in the database)

Ok, so the problem is probably in my approach to liquibase, I have implemented some changes in the database side, and I want to create changesets, so I simply add a new sql file to my changesets. When I try to run luquibase update command I get error which tells me that some columns exist in the database.
For me is normal that before I create the changesets script I try to add columns in the database (i.e. using PhpMyAdmin). Then I want to share with this changes with other developers, so I generate sql (from my changes), adding this in the sql file and launching this file in changeset.
Can somebody tell me what I make wrong?
The problem concerns situation when I added some new columns to my mysql table, thenI created sql file whit alter_table script and thenI run liquibase update command.
Don't make manual updates in your database. All schema changes have to be done with liquibase or else - as in your case - your changesets will conflict with the existing schema.
While having all changes to your database be done with Liquibase before hand is ideal, there are certainly situations where that is not possible. One is the use case you've described. Another would be if a hotfix is applied to production and needs to be merged back to development.
If you are certain that your changeset has been applied to the environment, then consider running changelogSync. It will assert that all changesets have been applied and will update the Liquibase meta table with the appropriate information.
Although not ideal, we think that that changelogSync is required for real world applications where sometimes life does not progress as we would like. That's why we made certain to expose it clearly in Datical DB. We think it strikes a balance between reality and idealism.

Restoring dev db from production: Running a set of SQL scripts based on a list stored in a table?

I need to restore a backup from a production database and then automatically reapply SQL scripts (e.g. ALTER TABLE, INSERT, etc) to bring that db schema back to what was under development.
There will be lots of scripts, from a handful of different developers. They won't all be in the same directory.
My current plan is to list the scripts with the full filesystem path in table in a psuedo-system database. Then create a stored procedure in this database which will first run RESTORE DATABASE and then run a cursor over the list of scripts, creating a command string for SQLCMD for each script, and then executing that SQLCMD string for each script using xp_cmdshell.
The sequence of cursor->sqlstring->xp_cmdshell->sqlcmd feels clumsy to me. Also, it requires turning on xp_cmdshell.
I can't be the only one who has done something like this. Is there a cleaner way to run a set of scripts that are scattered around the filesystem on the server? Especially, a way that doesn't require xp_cmdshell?
First off and A-number-one, collect all the database scripts in one central location. Some form of Source Control or Version Control is best, as you can then see who modified what when and (using diff tools if nothing else) why. Leaving the code used to create your databases hither and yon about your network could be a recipe for disaster.
Second off, you need to run scripts against your database. That means you need someone or something to run them, which means executing code. If you're performing this code execution from within SQL Server, you pretty much are going to end up using xp_cmdshell. The alternative? Use something else that can run scripts against databases.
My current solution to this kind of problem is to store the scripts in text (.sql) files, store the files in source control, and keep careful track of the order in which they are to be executed (for example, CREATE TABLEs get run before ALTER TABLEs that add subsequent columns). I then have a batch file--yeah, I've been around for a while, you could do this in most any language--to call SQLCMD (we're on SQL 2005, I used to use osql) and run these scripts against the necessary database(s).
If you don't want to try and "roll your own", there may be more formal tools out there to help manage this process.
Beyond the suggestions about centralization and source control made by Phillip Kelley, if you are familiar with .NET, you might consider writing a small WinForms or WebForms app that uses the SQL Server SMO (SQL Server Management Objects). With it, you can pass an entire script to the database just as if you had droppped it into Management Studio. That avoids the need for xp_cmdshell and sqlcmd. Another option would be to create a DTS/SSIS package that would read the files and use the Execute T-SQL task in a loop.

Updating SQL Server database from SQL Scripts on installation

Our current scenario is like this:
we have an existing database that needs to be updated for each new release that we install
we have to do this from individual SQL scripts (we can't use DB compare/diff tools)
the installation should run the SQL scripts as automatically as possible
the installation should only run those SQL scripts, that haven't been run before
the installation should dump a report of which script ran and which did not
installation is done by the customer's IT staff which is not awfully SQL knowledgeable
Whether this is a good setup or not is beyond this question - right now, take this as an absolute given - we can't change that this or next year for sure.
Right now, we're using a homegrown "DB Update Manager" to do this - it works, most of the time, but the amount of work needed to really make it totally automatic seems like too much work.
Yes, I know about SQLCMD - but that seems a bit "too basic" - or not?
Does anyone out there do the same thing? If so - how? Are you using some tool and if so - which one?
Thanks for any ideas, input, thoughts, pointers to tools or methods you use!
Marc
I have a similar situation. We maintain the database object scripts in version control. For a relesase the appropriate versions are tagged and pulled from version control. A custom script concatenates the invidual object scripts into a set of Create_DB, Ceate_DB_Tables, CreateDB Procs, ...
At a prior job I used manually crafted batch files and OSQL to run the database create/update scripts.
In my current position we have an InstallSheild set up with a custom "Install Helper" written in C++ to invoke the database scripts using SqlCmd.
Also, like CK, we have SchemaVersion table in each database. The table contains both app and database version information. The schema verison is just an integer that gets incremanted with each release.
Sounds complicated but it works pretty well.
I have a similar setup to this and this is my solution:
Have a dbVersion table that stores a version number and a datetime stamp.
Have a fodler where scripts are stored with a numbering system, e.g. x[000]
Have a console / GUI app that runs as part of the installation and compares the dbVersion number with the numbers of the files.
Run each new file in order, in a transaction.
This has worked for us for quite a while.
Part of our GUI app allows the user to choose which database to update, then a certain string #dbname# in the script is replaced by the database name they choose.
You might try Wizardby: it allows you to specify database changes incrementally and it will apply these changes in a very controlled manner. You'll have to write an MDL file and distribute it with your application along with Wizardby binaries, and while installing setup will check whether database version is up to date and if not it will apply all necessary changes in a transaction.
Internally it maintains a SchemaInfo table, which tracks which migrations (versions) were applied to a particular instance of the database, so it can reliably run only required ones.
If you maintain your changes in source control (e.g. SVN) you could use a batch file with SQLCMD to deploy only the latest changes from a particular SVN branch.
For example,
rem Code Changes
sqlcmd -U username -P password -S %1 -d DatabaseName -i "..\relativePath\dbo.ObjectName.sql"
Say you maintain an SVN branch specifically for deployment. You would commit your code to that branch, then execute a batch file which would deploy the desired objects.
Drawbacks include inability to check on the fly for table-related changes (e.g. if one of your SQL scripts happens to involve adding a column; if you tried to rerun the script it wouldn't be smart enough to see the column if it were added in a previous run). To mitigate that, you could build an app to build the batch file for you and create some logic to interact with the destination database, check for changes that have or haven't already been applied, and act accordingly.

How to update a database schema without losing your data with Hibernate?

Imagine you are developing a Java EE app using Hibernate and JBoss. You have a running server that has some important data on it. You release the next version of the app once in a while (1-2 weeks) and they have a bunch of changes in the persistence layer:
New entities
Removed entities
Attribute type changes
Attribute name changes
Relationship changes
How do you effectively set up a system that updates the database schema and preserves the data? As far as I know (I may be mistaking), Hibernate doesn't perform alter column, drop/alter constraint.
Thank you,
Artem B.
LiquiBase is your best bet. It has a hibernate integration mode that uses Hibernate's hbm2ddl to compare your database and your hibernate mapping, but rather than updating the database automatically, it outputs a liquibase changelog file which can be inspected before actually running.
While more convenient, any tool that does a comparison of your database and your hibernate mappings is going to make mistakes. See http://www.liquibase.org/2007/06/the-problem-with-database-diffs.html for examples. With liquibase you build up a list of database changes as you develop in a format that can survive code with branches and merges.
I personally keep track of all changes in a migration SQL script.
You can use https://github.com/Devskiller/jpa2ddl tool which provides Maven and Gradle plugin and is capable of generating automated schema migrations for Flyway based on JPA entities. It also includes all properties, dialects, user-types, naming strategies, etc.
For one app I use SchemaUpdate, which is built in to Hibernate, straight from a bootstrap class so the schema is checked every time the app starts up. That takes care of adding new columns or tables which is mostly what happens to a mature app. To handle special cases, like dropping columns, the bootstrap just manually runs the ddl in a try/catch so if it's already been dropped once, it just silently throws an error. I'm not sure I'd do this with mission critical data in a production app, but in several years and hundreds of deployments, I've never had a problem with it.
As a further response of what Nathan Voxland said about LiquiBase, here's an example to execute the migration under Windows for a mySql database:
Put the the mysql connector under lib folder in liquibase distribution for example.
Create a file properties liquibase.properties in the root of the liquibase distribution and insert this recurrent lines :
driver: com.mysql.jdbc.Driver
classpath: lib\\mysql-connector-java-5.1.30.jar
url: jdbc:mysql://localhost:3306/OLDdatabase
username: root
password: pwd
Generate or retrieve an updated database under another name for example NEWdatabase.
Now you will exctract differences in a file Migration.xml with the following command line :
liquibase diffChangeLog --referenceUrl="jdbc:mysql://localhost:3306/NEWdatabase"
--referenceUsername=root --referencePassword=pwd > C:\Users\ME\Desktop\Migration.xml
Finally execute the update by using the just generated Migration.xml file :
java -jar liquibase.jar --changeLogFile="C:\Users\ME\Desktop\Migration.xml" update
NB: All this command lines should be executed from the liquibase home directory where liquibase.bat/.sh and liquibase.jar are present.
I use the hbm2ddl ant task to generate my ddl. There is an option that will perform alter tables/columns in your database.
Please see the "update" attribute of the hbm2ddl ant task:
http://www.hibernate.org/hib_docs/tools/reference/en/html/ant.html#d0e1137
update(default: false): Try and create
an update script representing the
"delta" between what is in the
database and what the mappings
specify. Ignores create/update
attributes. (Do not use against
production databases, no guarantees at
all that the proper delta can be
generated nor that the underlying
database can actually execute the
needed operations)
You can also use DBMigrate. It's similar to Liquibase :
Similar to 'rake migrate' for Ruby on
Rails this library lets you manage
database upgrades for your Java
applications.

Keeping development databases in multiple environments in sync

I'm early in development on a web application built in VS2008. I have both a desktop PC (where most of the work gets done) and a laptop (for occasional portability) on which I use AnkhSVN to keep the project code synced. What's the best way to keep my development database (SQL Server Express) synced up as well?
I have a VS database project in SVN containing create scripts which I re-generate when the schema changes. The original idea was to recreate the DB whenever something changed, but it's quickly becoming a pain. Also, I'd lose all the sample rows I entered to make sure data is being displayed properly.
I'm considering putting the .MDF and .LDF files under source control, but I doubt SQL Server Express will handle it gracefully if I do an SVN Update and the files get yanked out from under it, replaced with newer copies. Sticking a couple big binary files into source control doesn't seem like an elegant solution either, even if it is just a throwaway development database. Any suggestions?
There are obviously a number of ways to approach this, so I am going to list a number of links that should provide a better foundation to build on. These are the links that I've referenced in the past when trying to get others on the bandwagon.
Database Projects in Visual Studio .NET
Data Schema - How Changes are to be Implemented
Is Your Database Under Version Control?
Get Your Database Under Version Control
Also look for MSDN Webcast: Visual Studio 2005 Team Edition for Database Professionals (Part 4 of 4): Schema Source and Version Control
However, with all of that said, if you don't think that you are committed enough to implement some type of version control (either manual or semi-automated), then I HIGHLY recommend you check out the following:
Red Gate SQL Compare
Red Gate SQL Data Compare
Holy cow! Talk about making life easy! I had a project get away from me and had multiple people in making schema changes and had to keep multiple environments in sync. It was trivial to point the Red Gate products at two databases and see the differences and then sync them up.
In addition to your database CREATE script, why don't you maintain a default data or sample data script as well?
This is an approach that we've taken for incremental versions of an application we have been maintaining for more than 2 years now, and it works very well. Having a default data script also allows your QA testers to be able to recreate bugs using the data that you also have?
You might also want to take a look at a question I posted some time ago:
Best tool for auto-generating SQL change scripts
You can store backup (.bak file) of you database rather than .MDF & .LDF files.
You can restore your db easily using following script:
use master
go
if exists (select * from master.dbo.sysdatabases where name = 'your_db')
begin
alter database your_db set SINGLE_USER with rollback IMMEDIATE
drop database your_db
end
restore database your_db
from disk = 'path\to\your\bak\file'
with move 'Name of dat file' to 'path\to\mdf\file',
move 'Name of log file' to 'path\to\ldf\file'
go
You can put above mentioned script in text file restore.sql and call it from batch file using following command:
osql -E -i restore.sql
That way you can create script file to automate whole process:
Get latest db backup from SVN
repository or any suitable storage
Restore current db using bak file
We use a combo of, taking backups from higher environments down.
As well as using ApexSql to handle initial setup of schema.
Recently been using Subsonic migrations, as a coded, source controlled, run through CI way to get change scripts in, there is also "tarantino" project developed by headspring out of texas.
Most of these approaches especially the latter, are safe to use on top of most test data. I particularly like the automated last 2 because I can make a change, and next time someone gets latest, they just run the "updater" and they are ushered to latest.

Resources