How can I maintain production databases in a developer team? - database-management

So we have many clients using our software (with databases) on many different servers. We also have several developers all making changes to these databases daily.
When developers update their code daily, their database needs to be updated with other developers changes.
And when clients update their software, the databases need to be updated.
We are currently using our own solution to this problem, but it seems like an issue many teams would be facing.
Is there a commercial/OS solution to this?

Related

SQL Server development environment for a small team - best practice needed

We are a small team of 5 working on a same database. It's a reporting solution so there is about 5 more 3rd party databases which is source of data.
It is very important to have latest data for development, so sometimes those linked databases backed up and restored on each dev's local SQL Server(an they pretty big). Then there is always a problem with dev's databases being out of sync from each other. When it was 2 of us, there was no problem. But when more people got added to the team - it starts to be a pain to arrange.
So, I was thinking about building a dev SQL Server. It will be more powerful then laptops (faster queries) and latency should not be an issue. We always have internet when working and that is not a problem either. I only wonder if it's OK to work together out of the same database. Sounds like a good plan, but not sure if there will be any "gotchas" comparing having databases locally. We will be able to keep data fresh for everyone easy. And I think it should save time and making easier to build/rebuild dev machines if needed. I even think that database where we code can be 1 per person, but still on a same box. This way we can always compare them "right on a box" if needed.
Any advices on this setup? Pro's or Con's?
A new option is to use isolated containers on a shared server, with clones of the production database environment. Full disclosure, I work for Windocks, where this is the primary use of SQL Server containers. The most recent release includes built-in SQL Server db cloning. Benefits of this approach include speed (environments deliverd in <1 minute), and license and labor savings by going to a shared server rather than a score of VMs.

Updating the production IIS WebSite and SQL Server Database without stopping

There is a testing server that uses the testing database. We test the website on the testing server. If it is okay, we update the website and the database schema from the testing server to the production server. But this method is very painful and risky.
First, we have to redirect the users to a maintenance page, so the website is paused for a while.
Second, if something goes wrong when updating, we have to back to old website, because we can't put the website in a maintenance mode for a long time.
So I'm seeking a solid solution to update an IIS website and an Sql Server Database without data loss and using maintenance mode. Is there any way to do this? How the big websites do this without data loss and pausing.
We've thought using a release candidate website. We've planned to use this RC website for temporary. First, we update the RC site, then swap the bindings between RC and production website. But this time the database is problem. Because we can change the database schema, and the old one can't work with new database. So, if we use a temp site with temp db, there will be data loss. Also, the updated website won't work with old database if the temp site uses old production database. So I need a solid and practical solution for this problem.
This is orders of magnitude more complicated than you imagine. This specifically is not about HA nor about contiguous integration. Neither of those will provide what you need, they're only pieces of the much more complex puzzle.
There simply isn't possible to write code changes in a manner that is transparent/oblivious to schema changes as they occur. At best you can write the code in a manner that supports the schema at v. N and at v. N+1, which in itself is a big challenge. But is impossible to write the code in a manner that supports the schema as it transitions from v. N to v. N+1. The schema change induced by a deployment has to be atomic for the code operating on the schema. Since the schema change itself cannot be atomic, it follow that the upgrade has two possible avenues:
take the code offline during the schema change. This is what you're doing now and is the safest approach. Of course, it implies service availability down time and runs the risks you already experienced (rollback of failed upgrade, lengthy upgrade, etc). A variant of this approach is to redirect the service to a read-only copy of the data and offer a degraded service experience(no changes are possible during the downtime) which may or may not be acceptable, depending on the business specifics.
standby upgrade. This implies that you take a snapshot of the service data (various HA solutions may provide a standby snapshot out-of-the-box, eg. log shipping). Upgrade the snapshot, then apply all the transactions that occurred on the real service data to the upgraded snapshot. This is always tricky, because it requires a technology to detect, capture and apply the changes (eg. change tracking, replication, custom solution etc) and requires to transform each change to the new, upgraded, schema. Once the upgraded schema is up to date with changes from the main service, the service can be redirected to the upgraded schema. This redirection is also much more complex than it sounds. For one choosing the moment when to cut-off the old schema and stop accepting new changes, while making sure all changes were applied to the new upgraded schema DB is a challenge in itself. Another challenge is to resolve the conflict of the code understanding pre-upgrade and post-upgrade schema versions. Developing code that handles both is, as I said, problematic and error prone, so one solution is to, again, take the service offline for a short period and replace the code. Another solution is to have a standby service, running code that handles the post-upgrade DB schema and is connected to the post-upgrade DB, and redirect the live requests to your standby, upgraded, service.
And I did not even touch the thorny subject of service interaction, when a particular service of a much larger deployed solution has to be upgraded. This is when service API protocol back compatibility plays the major role to allow the post-upgrade service to play along with its peer services.
Ultimately there just isn't any silver bullet. I've witnessed single machine large DB deployments that took weeks to roll out version N+1, with transcriptional replication contiguously feeding the post-upgrade DB schema with changes from the pre-upgrade DB. And I witnessed deployments of thousands of machines deploying version N+1 in stages, as a complicated dance of enabling code and data changes over the course of several days to reach the full functionality of the post-upgrade. This problem is just plain hard.
This is what Azure is fantastic for. The Azure Cloud platform allows for staging servers and production servers. You can set it up so once you commit your changes to Git or TFS it is pushed to either a staging or production server automatically. You can also set up to manually push changes. Most of the ORM libraries like Entity Framework have migration support.
There is a lot more information out there on this topic like:
Azure seamless upgrade when database schema changes
Staging or Production Instance?
Youre describing a high availability solution (HA). HA solutions are expensive and can quickly be overkill. you would need redundancy for your app and db servers, which means setting up a db cluster. All this increases the amount of time you would spend deploying changes, but the tradeoff is that your app is always available.
The main thing with deployment is having a repeatable process. So my best recommendation is to script or automate as much as possible.

How do you manage databases during development?

My development team of four people has been facing this issue for some time now:
Sometimes we need to be working off the same set of data. So while we develop on our local computers, the dev database is connected to remotely.
However, sometimes we need to run operations on the db that will step on other developers' data, ie we break associations. For this a local db would be nice.
Is there a best practice for getting around this dilemma? Is there something like an "SCM for data" tool?
In a weird way, keeping a text file of SQL insert/delete/update queries in the git repo would be useful, but I think this could get very slow very quickly.
How do you guys deal with this?
You may find my question How Do You Build Your Database From Source Control useful.
Fundamentally, effective management of shared resources (like a database) is hard. It's hard because it requires balancing the needs of multiple people, including other developers, testers, project managers, etc.
Often, it's more effective to give individual developers their own sandboxed environment in which they can perform development and unit testing without affecting other developers or testers. This isn't a panacea though, because you now have to provide a mechanism to keep these multiple separate environments in sync with one another over time. You need to make sure that developers have a reasonable way of picking up each other changes (both data, schema, and code). This isn't necesarily easier. A good SCM practice can help, but it still requires a considerable level of cooperation and coordination to pull it off. Not only that, but providing each developer with their own copy of an entire environment can introduce costs for storage, and additional DBA resource to assist in the management and oversight of those environments.
Here are some ideas for you to consider:
Create a shared, public "environment whiteboard" (it could be electronic) where developers can easily see which environments are available and who is using them.
Identify an individual or group to own database resources. They are responsible for keeping track of environments, and helping resolve the conflicting needs of different groups (developers, testers, etc).
If time and budgets allow, consider creating sandbox environments for all of your developers.
If you don't already do so, consider separating developer "play areas", from your integration, testing, and acceptance testing environments.
Make sure you version control critical database objects - particularly those that change often like triggers, stored procedures, and views. You don't want to lose work if someone overwrites someone else's changes.
We use local developer databases and a single, master database for integration testing. We store creation scripts in SCM. One developer is responsible for updating the SQL scripts based on the "golden master" schema. A developer can make changes as necessary to their local database, populating as necessary from the data in the integration DB, using an import process, or generating data using a tool (Red Gate Data Generator, in our case). If necessary, developers wipe out their local copy and can refresh from the creation script and integration data as needed. Typically databases are only used for integration testing and we mock them out for unit tests so the amount of work keeping things synchronized is minimized.
I recommend that you take a look at Scott Allen´s views on this matter. He wrote a series of blogs which are, in my opinion, excellent.
Three Rules for Database Work,
The Baseline,
Change scripts,
Views, stored procs etc,
Branching and Merging.
I use these guidelines more or less, with personal changes and they work.
In the past, I've dealt with this several ways.
One is the SQL Script repository that creates and populates the database. It's not a bad option at all and can keep everything in sync (even if you're not using this method, you should still maintain these scripts so that your DB is in Source Control).
The other (which I prefer) was having a single instance of a "clean" dev database on the server that nobody connected to. When developers needed to refresh their dev databases, they ran a SSIS package that copied the "clean" database onto their dev copy. We could then modify our dev databases as needed without stepping on the feet of other developers.
We have a database maintenance tool that we use that creates/updates our tables and our procs. we have a server that has an up-to-date database populated with data.
we keep local databases that we can play with as we choose, but when we need to go back to "baseline" we get a backup of the "master" from the server and restore it locally.
if/when we add columns/tables/procs we update the dbMaintenance tool which is kept in source control.
sometimes, its a pain, but it works reasonably well.
If you use an ORM such as nHibernate, create a script that generate both the schema & the data in the LOCAL development database of your developers.
Improve that script during the development to include typical data.
Test on a staging database before deployment.
We do replicate production database to UAT database for the end users. That database is not accessible by developers.
It takes less than few seconds to drop all tables, create them again and inject test data.
If you are using an ORM that generates the schema, you don't have to maintain the creation script.
Previously, I worked on a product that was data warehouse-related, and designed to be installed at client sites if desired. Consequently, the software knew how to go about "installation" (mainly creation of the required database schema and population of static data such as currency/country codes, etc.).
Because we had this information in the code itself, and because we had pluggable SQL adapters, it was trivial to get this code to work with an in-memory database (we used HSQL). Consequently we did most of our actual development work and performance testing against "real" local servers (Oracle or SQL Server), but all of the unit testing and other automated tasks against process-specific in-memory DBs.
We were quite fortunate in this respect that if there was a change to the centralised static data, we needed to include it in the upgrade part of the installation instructions, so by default it was stored in the SCM repository, checked out by the developers and installed as part of their normal workflow. On reflection this is very similar to your proposed DB changelog idea, except a little more formalised and with a domain-specific abstraction layer around it.
This scheme worked very well, because anyone could build a fully working DB with up-to-date static data in a few minutes, without stepping on anyone else's toes. I couldn't say if it's worthwhile if you don't need the install/upgrade functionality, but I would consider it anyway because it made the database dependency completely painless.
What about this approach:
Maintain a separate repo for a "clean db". The repo will be a sql file with table creates/inserts, etc.
Using Rails (I'm sure could be adapted for any git repo), maintain the "clean db" as a submodule within the application. Write a script (rake task, perhaps) that queries a local dev db with the SQL statements.
To clean your local db (and replace with fresh data):
git submodule init
git submodule update
then
rake dev_db:update ......... (or something like that!)
I've done one of two things. In both cases, developers working on code that might conflict with others run their own database locally, or get a separate instance on the dev database server.
Similar to what #tvanfosson recommended, you keep a set of SQL scripts that can build the database from scratch, or
On a well defined, regular basis, all of the developer databases are overwritten with a copy of production data, or with a scaled down/deidentified copy of production, depending on what kind of data we're using.
I would agree with all the LBushkin has said in his answer. If you're using SQL Server, we've got a solution here at Red Gate that should allow you to easily share changes between multiple development environments.
http://www.red-gate.com/products/sql_source_control/index.htm
If there are storage concerns that make it hard for your DBA to allow multiple development environments, Red Gate has a solution for this. With Red Gate's HyperBac technology you can create virtual databases for each developer. These appear to be exactly the same as ordinary database, but in the background, the common data is being shared between the different databases. This allows developers to have their own databases without taking up an impractical amount of storage space on your SQL Server.

Getting the production database to work with locally

I'm nearing the stage of saying "lets go live" of a client's system I've been working on for the past few months, basically a few autonomous services including a public facing website, an intranet website, an hourly exporter from a legacy OLTP DB based on modified flags/triggers etc.. and few others services that compose the system, with each their own specialized database etc.. , communicating with each other via messages (NServiceBus).
When I started I tried to keep a local replication of everything but its proving more and more difficult and on reflection probably a major friction point of the past few weeks, I like to keep regularly up to date as the legacy database is growing and causing hundreds events daily. Having high latency & mediocre bandwidth (between myself and the client's site, I'm in SE Asia where bandwidth is generally crap anyway) is also an issue for RDP, SQL tools, remote connection strings etc.. Tracking down integration bugs and understanding scenarios they present during feedback/integration/QA is also difficult as my data doesn't reflect the current state of the client's DB (the clients' staff have been working and evolving the data their end) and means another break, coffee and lengthy sync again. It would be ideal to do it all locally and then deploy at the end but I have to deliver parts incrementally (to get the check), and some parts are even in use (although not critical) so bug fixing on in use pats needs turn around quick, and with it being a small company, incremental feedback, it helps flush out some of the more vague requirements along the way (curse me).
I was thinking it would be good to have twice daily sync between the environments (their DBs to mine), I somewhat have design control over everything apart from the legacy SQL server database.
What are the best options SO users?
I was thinking of setting up a Windows 2003 light VM on my dev box. And in this install the same setup of the client sites (but not spread across multiple servers obviously). And then for syncing the databases I was thinking about SQL Server replication? or batch scripts? Or is there any better tools - ones that are fast and good compressions? I don't want my changes to go back to production (I have a separate CI & deployment procedure), I just want (I think I want.. tell me if better idea) my databases to be refreshed every night or twice a day (maybe whilst I'm at lunch bandwidth permitting).
How does everyone approach this?
I would recommend two ways to do this:
Snapshot Replication
Backing up the transaction log and manually (or batch) applying it
Snapshot replication can be difficult to get working, but it is possible even in offline situations such as physcially carrying snapshots to another location.
The transaction log method can be used as part of your standard backup procedures. ie: A full backup twice a week with transaction log backups more regularly.
Remember that best practice is to cleanse the data before using it in a test environment. At the very least this should be changing all personal data, especially email addresses, passwords and any other method which could result in some automated process making contact with the user in your database.

How would you migrate hundreds of MS Access databases to a central service?

We have literally 100's of Access databases floating around the network. Some with light usage and some with quite heavy usage, and some no usage whatsoever. What we would like to do is centralise these databases onto a managed database and retain as much as possible of the reports and forms within them.
The benefits of doing this would be to have some sort of usage tracking, and also the ability to pay more attention to some of the important decentralised data that is stored in these apps.
There is no real constraints on RDBMS (Oracle, MS SQL server) or the stack it would run on (LAMP, ASP.net, Java) and there obviously won't be a silver bullet for this. We would like something that can remove the initial grunt work in an automated fashion.
We upsize (either using the upsize wizard or by hand) users to SQL server. It's usually pretty straight forward. Replace all the access tables with linked tables to the sql server and keep all the forms/reports/macros in access. The investment in access isn't lost and the users can keep going business as usual. You get reliability of sql server and centralized backups. Keep in mind - we’ve done this for a few large access databases, not hundreds. I'd do a pilot of a few dozen and see how it works out.
UPDATE:
I just found this, the sql server migration assitant, it might be worth a look:
http://www.microsoft.com/sql/solutions/migration/default.mspx
Update: Yes, some refactoring will be necessary for poorly designed databases. As for how to handle access sprawl? I've run into this at companies with lots of technical users (engineers esp., are the worst for this... and excel sprawl). We did an audit - (after backing up) deleted any databases that hadn't been touched in over a year. "Owners" were assigned based the location &/or data in the database. If the database was in "S:\quality\test_dept" then the quality manager and head test engineer had to take ownership of it or we delete it (again after backing it up).
Upsizing an Access application is no magic bullet. It may be that some things will be faster, but some types of operations will be real dogs. That means that an upsized app has to be tested thoroughly and performance bottlenecks addressed, usually by moving the data retrieval logic server-side (views, stored procedures, passthrough queries).
It's not really an answer to the question, though.
I don't think there is any automated answer to the problem. Indeed, I'd say this is a people problem and not a programming problem at all. Somebody has to survey the network and determine ownership of all the Access databases and then interview the users to find out what's in use and what's not. Then each app should be evaluated as to whether or not it should be folded into an Enterprise-wide data store/app, or whether its original implementation as a small app for a few users was the better approach.
That's not the answer you want to hear, but it's the right answer precisely because it's a people/management problem, not a programming task.
Oracle has a migration workbench to port MS Access systems to Oracle Application Express, which would be worth investigating.
http://apex.oracle.com
So? Dedicate a server to your Access databases.
Now you have the benefit of some sort of usage tracking, and also the ability to pay more attention to some of the important decentralised data that is stored in these apps.
This is what you were going to do anyway, only you wanted to use a different database engine instead of NTFS.
And now you have to force the users onto your server.
Well, you can encourage them by telling them that you aren't going to overwrite their data with old backups anymore, because now you will own the data, and you won't do that anymore.
Also, you can tell them that their applications will run faster now, because you are going to exclude the folder from on-access virus scanning (you don't do that to your other databases, which is why they are full of sql-injection malware, but these databases won't be exposed to the internet), and planning to turn packet signing off (you won't need that on a dedicated server: it's only for people who put their file-share on their domain-server).
Easy upgrade path, improved service to users, greater centralization and control for IT. Everyone's a winner.
Further to David Fenton's comments
Your administrative rule will be something like this:
If the data that is in the database is just being used by one user, for their own work (alone), then they can keep it in their own network share.
If the data that is in the database is for being used by more than one person (even if it is only two), then that database must go on a central server and go under IT's management (backups, schema changes, interfaces, etc.). This is because, someone experienced needs to coordinate the whole show or we will risk the time/resources of the next guy down the line.

Resources