YAGNI and database creation scripts - database

Right now, I have code which creates the database (just a few CREATE queries on a SQLite database) in my main database access class. This seems unnecessary as I have no intention of ever using the code. I would just need it if something went wrong and I needed to recreate the database. Should I...
Leave things as they are, even though the database creation code is about a quarter of my file size.
Move the database-creation code to a separate script. It's likely I'll be running it manually if I ever need to run it again anyway, and that would put it out-of-sight-out-of-mind while working on the main code.
Delete the database-creation code and rely on revision control if I ever find myself needing it again.

I think it is best to keep the code. Even more importantly, you should maintain this code (or generate it) every time the database schema changes.
It is important for the following reasons.
You will probably be surprised how many times you need it. If you need to migrate your server, or setup another environment (e.g. TEST or DEMO), and so on.
I also find that I refer to the DDL SQL quite often when coding, particularly if I have not touched the system for a while.
You have a reference for the decisions you made, such as the indexes you created, unique keys, etc etc.
If you do not have a disciplined approach to this, I have found that the database schema can drift over time as ad-hoc changes are made, and this can cause obscure issues that are not found until you hit the database. Even worse without a disciplined approach (i.e. a reference definition of the schema) you may find that different databases have subtly different schema.

I would just need it if something went
wrong and I needed to recreate the
database.
Recreating the database is absolutely not an exceptional case. That code is part of your deployment process on a new / different system, and it represents the DB structure your code expects to work with. You should actually have integration tests that confirm this. Working indefinitely with a single DB server whose schema was created incrementally via manually dispatched SQL statements during development is not something you should rely on.
But yes, it should be separated from the access code; so option 2 is correct. The separate script can then be used by tests as well as for deployment.

Related

Altering database tables on updating website

This seems to be an issue that keeps coming back in every web application; you're improving the back-end code and need to alter a table in the database in order to do so. No problem doing manually on the development system, but when you deploy your updated code to production servers, they'll need to automatically alter the database tables too.
I've seen a variety of ways to handle these situations, all come with their benefits and own problems. Roughly, I've come to the following two possibilities;
Dedicated update script. Requires manually initiating the update. Requires all table alterations to be done in a predefined order (rigid release planning, no easy quick fixes on the database). Typically requires maintaining a separate updating process and some way to record and manage version numbers. Benefit is that it doesn't impact running code.
Checking table properties at runtime and altering them if needed. No manual interaction required and table alters may happen in any order (so a quick fix on the database is easy to deploy). Another benefit is that the code is typically a lot easier to maintain. Obvious problem is that it requires checking table properties a lot more than it needs to.
Are there any other general possibilities or ways of dealing with altering database tables upon application updates?
I'll share what I've seen work best. It's just expanding upon your first option.
The steps I've usually seen when updating schemas in production:
Take down the front end applications. This prevents any data from being written during a schema update. We don't want writes to fail because relationships are messed up or a table is suddenly out of sync with the application.
Potentially disconnect the database so no connections can be made. Sometimes there is code out there using your database you don't even know about!
Run the scripts as you described in your first option. It definitely takes careful planning. You're right that you need a pre-defined order to apply the changes. Also I would note often times you need two sets of scripts, one for schema updates and one for data updates. As an example, if you want to add a field that is not nullable, you might add a nullable field first, and then run a script to put in a default value.
Have rollback scripts on hand. This is crucial because you might make all the changes you think you need (since it all worked great in development) and then discover the application doesn't work before you bring it back online. It's good to have an exit strategy so you aren't in that horrible place of "oh crap, we broke the application and we've been offline for hours and hours and what do we do?!"
Make sure you have backups ready to go in case (4) goes really bad.
Coordinate the application update with the database updates. Usually you do the database updates first and then roll out the new code.
(Optional) A lot of companies do partial roll outs to test. I've never done this, but if you have 5 application servers and 5 database servers, you can first roll out to 1 application/1 database server and see how it goes. Then if it's good you continue with the rest of the production machines.
It definitely takes time to find out what works best for you. From my experience doing lots of production database updates, there is no silver bullet. The most important thing is taking your time and being disciplined in tracking changes (versioning like you mentioned).

how to tap into PostgreSQL DDL parser?

The DDL / SQL scripts used to create my PostgreSQL database are under version control. In theory, any change to the database model is tracked in the source code repository.
In practice however, it happens that the structure of a live database is altered 'on the fly' and if any client scripts fail to insert / select / etc. data, I am put in charge of fixing the problem.
It would help me enormously if I could run a quick test to verify the database still corresponds to the creation scripts in the repo, i.e. is still the 'official' version.
I started using pgTAP for that purpose and so far, it works great. However, whenever a controlled, approved change is done to the DB, the test scripts need changing, too.
Therefore, I considered creating the test scripts automatically. One general approach could be to
run the scripts to create the DB
access DB metadata on the server
use that metadata to generate test code
I would prefer though not having to create the DB, but instead read the DB creation scripts directly. I tried to google a way to tap into the DDL parser and get some kind of metadata representation I could use, but so far, I have learned a lot about PostgreSQL internals, but couldn't really find a solution to the issue.
Can someone think of a way to have a PostgreSQL DDL script parsed ?
Here is my method for ensuring that the live database schema matches the schema definition under version control: As part of the "build" routine of your database schema, set up a temporary database instance, load in all the schema creation scripts the way it was intended, then run a pg_dump -s off that, and compare that with a schema dump of your production database. Depending your exact circumstances, you might need to run a little bit of sed over the final product to get an exact match, but it's usually possible.
You can automate this procedure completely. Run the database "build" on SCM checking (using a build bot, continuous integration server, or similar), and get the dumps from the live instance by a cron job. Of course, this way you'd get an alert every time someone checks in a database change, so you'll have to tweak the specifics a little.
There is no pgTAP involved there. I love pgTAP and use it for unit testing database functions and the like (also done on the CI server), but not for verifying schema properties, because the above procedure makes that unnecessary. (Also, generating tests automatically from what they are supposed to test seems a little bit like the wrong way around.)
There is a lot of database metadata to be concerned about here. I've been poking around the relevant database internals for a few years, and I wouldn't consider the project you're considering feasible to build without dumping a few man months of programming time just to get a rough alpha quality tool that handles some specific subset of changes you're concerned about supporting. If this were easy, there wouldn't be a long standing (as in: people have wanted it for a decade) open item to build DDL Triggers into the database, which is exactly the thing you'd like to have here.
In practice, there are two popular techniques people use to make this class of problem easier to deal with:
Set log_statement to 'ddl' and try to parse the changes it records.
Use pg_dump --schema-only to make a regular snapshot of the database structure. Put that under version control, and use changes in its diff to find the information you're looking for.
Actually taking either of these and deriving the pgTAP scripts you want directly is its own challenge. If the change made is small enough, you might be able to automate that to some degree. At least you'd be starting with a reasonably sized problem to approach from that angle.

Adding primary keys to a production database

I just inherited a relatively small SQL Server database. We have a decentralized system operating on about ten sites, with each site being pounded all day by between sixty and one hundred clients. Upon inspecting the system, a couple of things jumped out at me: there are no maintenance plans or keys defined.
I have dozens of different applications that are already accessing the database. The majority of them are written in C with inline SQL. Part of what I was brought in to do was write stored procedures for everything and have our applications move to that. Before I do this, however, I really think I should be focusing on these seemingly glaring issues.
Also, we'll eventually be looking into replication to a central site, so I really think these things should be addressed before we even think of that.
Figuring out a redesign scheme and maintenance plan will be time-consuming but not problematic - I've done it before at single sites. But, how am I going to go about implementing these major changes to the database across ten (or more) production sites while ensuring data integrity and not breaking the applications?
I would suspect that with no keys officaly defined, that this database probaly has tons of data integrity problems. Lucky you.
For replication you will need GUIDs. I would do this, Add the GUIDs and PK definitions in the dev environment and test test test. You'll prbably find alot of crap where people did select * and adding the columns will cause probnalem or cause things to show up on reports that you don't want. Find and fix all these things. Be sure to script allthe changes to the data and put them in source control along with any code changes you need to make to the application. Then schedule down time for maintenance of the database during the lowest usage hours. Let the users know the application will be down ahead of time. During the down time, have the application show a down message, change the datbase to single user mode so no one except the team making this change can affect the database, make a fullbackup, run the scripts to make the changes to the database, run the code to change the application, test, take the database out of single user mode and turn the application back on.
Under no circumstances would I try to make a change this major without going to single user mode.
First ensure you have valid backups of every db, and test-restore them to make sure they restore OK.
Consider using Ola Hallengren's maintenance vs. Maintenance Plans if you need to deploy identical, consistent, scripted solutions to all your sites (Ola Hallengren's site)
Then I'd say look at getting some basic indexing in place, starting with heavy-hitter tables first. You can identify them with various methods - presume you know how, but just to throw a few out thoughts: code review, SQL Trace, Query Plan analysis, and then there are 3rd party tools e.g., Idera SQLdm, Confio Ignite, Quest's Spotlight on SQL Server or Foglight Performance Analysis for SQL Server.
I think this will get you rolling.
Some additional ideas.
One of the first thing's I'd check is: are all the database instances alike, as far as database objects are concerned? Do they all have the exact same tables, columns (and their order in the tables), nullability, etc. etc. Be sure to check pretty much everything listed in sys.objects. Once you know that the database structures are all in synch, then you know that any database modification scripts you generate will work on all the instances.
Once you modify your test environment with your planned changes, you have to ensure that they don't break existing functionality. Can you accurately emulate "...being pounded all day by between sixty and one hundred clients" on your test environment? If you can't, then you of course cannot know if your changes will break anything until they go live. (An assumption I'd avoid: just because a given instance has no duplicates in the columns you wish to build a primary key on does not mean that there are never any duplicates present...)

Database source control with Oracle

I have been looking during hours for a way to check in a database into source control. My first idea was a program for calculating database diffs and ask all the developers to imlement their changes as new diff scripts. Now, I find that if I can dump a database into a file I cound check it in and use it as just antother type of file.
The main conditions are:
Works for Oracle 9R2
Human readable so we can use diff to see the diferences. (.dmp files doesn't seem readable)
All tables in a batch. We have more than 200 tables.
It stores BOTH STRUCTURE AND DATA
It supports CLOB and RAW Types.
It stores Procedures, Packages and its bodies, functions, tables, views, indexes, contraints, Secuences and synonims.
It can be turned into an executable script to rebuild the database into a clean machine.
Not limitated to really small databases (Supports least 200.000 rows)
It is not easy. I have downloaded a lot of demos that does fail in one way or another.
EDIT: I wouldn't mind alternatives aproaches provided that they allows us to check a working system against our release DATABASE STRUCTURE AND OBJECTS + DATA in a batch mode.
By the way. Our project has been developed for years. Some aproaches can be easily implemented when you make a fresh start but seem hard at this point.
EDIT: To understand better the problem let's say that some users can sometimes do changes to the config data in the production eviroment. Or developers might create a new field or alter a view without notice in the realease branch. I need to be aware of this changes or it will be complicated to merge the changes into production.
So many people try to do this sort of thing (diff schemas). My opinion is
Source code goes into a version control tool (Subversion, CSV, GIT, Perforce ...). Treat it as if it was Java or C code, its really no different. You should have an install process that checks it out and applies it to the database.
DDL IS SOURCE CODE. It goes into the version control tool too.
Data is a grey area - lookup tables maybe should be in a version control tool. Application generated data certainly should not.
The way I do things these days is to create migration scripts similar to Ruby on Rails migrations. Put your DDL into scripts and run them to move the database between versions. Group changes for a release into a single file or set of files. Then you have a script that moves your application from version x to version y.
One thing I never ever do anymore (and I used to do it until I learned better) is use any GUI tools to create database objects in my development environment. Write the DDL scripts from day 1 - you will need them anyway to promote the code to test, production etc. I have seen so many people who use the GUIs to create all the objects and come release time there is a scrabble to attempt to produce scripts to create/migrate the schema correctly that are often not tested and fail!
Everyone will have their own preference to how to do this, but I have seen a lot of it done badly over the years which formed my opinions above.
Oracle SQL Developer has a "Database Export" function. It can produce a single file which contains all DDL and data.
I use PL/SQL developer with a VCS Plug-in that integrates into Team Foundation Server, but it only has support for database objects, and not with the data itself, which usually is left out of source control anyways.
Here is the link: http://www.allroundautomations.com/bodyplsqldev.html
It may not be as slick as detecting the diffs, however we use a simple ant build file. In our current CVS branch, we'll have the "base" database code broken out into the ddl for tables and triggers and such. We'll also have the delta folder, broken out in the same manner. Starting from scratch, you can run "base" + "delta" and get the current state of the database. When you go to production, you'll simply run the "delta" build and be done. This model doesn't work uber-well if you have a huge schema and you are changing it rapidly. (Note: At least among database objects like tables, indexes and the like. For packages, procedures, functions and triggers, it works well.) Here is a sample ant task:
<target name="buildTables" description="Build Tables with primary keys and sequences">
<sql driver="${conn.jdbc.driver}" password="${conn.user.password}"
url="${conn.jdbc.url}" userid="${conn.user.name}"
classpath="${app.base}/lib/${jdbc.jar.name}">
<fileset dir="${db.dir}/ddl">
<include name="*.sql"/>
</fileset>
</sql>
</target>
I think this is a case of,
You're trying to solve a problem
You've come up with a solution
You don't know how to implement the solution
so now you're asking for help on how to implement the solution
The better way to get help,
Tell us what the problem is
ask for ideas for solving the problem
pick the best solution
I can't tell what the problem you're trying to solve is. Sometimes it's obvious from the question, this one certainly isn't. But I can tell you that this 'solution' will turn into its own maintenance nightmare. If you think developing the database and the app that uses it is hard. This idea of versioning the entire database in a human readable form is nothing short of insane.
Have you tried Oracle's Workspace Manager? Not that I have any experience with it in a production database, but I found some toy experiments with it promising.
Don't try to diff the data. Just write a trigger to store whatever-you-want-to-get when the data is changed.
Expensive though it may be, a tool like TOAD for Oracle can be ideal for solving this sort of problem.
That said, my preferred solution is to start with all of the DDL (including Stored Procedure definitions) as text, managed under version control, and write scripts that will create a functioning database from source. If someone wants to modify the schema, they must, must, must commit those changes to the repository, not just modify the database directly. No exceptions! That way, if you need to build scripts that reflect updates between versions, it's a matter of taking all of the committed changes, and then adding whatever DML you need to massage any existing data to meet the changes (adding default values for new columns for existing rows, etc.) With all of the DDL (and prepopulated data) as text, collecting differences is as simple as diffing two source trees.
At my last job, I had NAnt scripts that would restore test databases, run all of the upgrade scripts that were needed, based upon the version of the database, and then dump the end result to DDL and DML. I would do the same for an empty database (to create one from scratch) and then compare the results. If the two were significantly different (the dump program wasn't perfect) I could tell immediately what changes needed to be made to the update / creation DDL and DML. While I did use database comparison tools like TOAD, they weren't as useful as hand-written SQL when I needed to produce general scripts for massaging data. (Machine-generated code can be remarkably brittle.)
Try RedGate's Source Control for Oracle. I've never tried the Oracle version, but the MSSQL version is really great.

Is there a simple way reset a MSSQL 2005 database to a predefined state during debugging?

I often find myself in the following situation:
The customer reports corrupt data in his database.
We investige the problem, sometimes by try and error.
We try to solve the problem by using our application or one or two repair tools we ship with our product. (Only in bad cases we write a repair script for individually addressing a specific problem)
We send a step-by-step procedure to the customer or to customer service to repair the problem.
All steps (except the first) lead to changes of the customer's database. But it is
vital, especially in step 4, that everything works with the original database.
Currently we ensure that by working only with copies of it and before every test we switch back to a fresh copy.
That becomes really annoying when there are one-time processes involved. (Things that can only be done once, like creating a specific invoice, processing a specific delivery, ...)
Of course that is a very slow process, especially with large databases.
I've tried the backup feature, but that seems to be even slower than just copying the .mdf file.
Is there any way to quickly revert any changes made after a predefined checkpoint?
Take a look at "Snapshots" as they might be what you're after:
http://www.simple-talk.com/sql/database-administration/sql-server-2005-snapshots/
http://msdn.microsoft.com/en-us/library/ms175876.aspx
The only other option, really, is to take copies of the specific tables you're manipulating and then restore those (copy original table by SELECT INTO, manipulate, drop old, rename copy).

Resources