Hello I am still a newbie for database deployment.
Generally how are changes to a production database deployed for a release?
My client wants an entire new setup. We have 3 environments: DEV, INT, PROD. He want to make INT as PRODUCTION when QA has certified. This will be fine with application servers but as the state of database is very important, this is a problem for the database because we cannot make the INT database to be production unless we sync the production data to integration. But our database is of more than 300GB so it will take a lot of time to sync data and therefore a huge down time which is not advisible.
Can you guys please advise me in this scenario.
The Most suitable way i know of to do such a synchronization before deployement consists on an offline integration using copies of the original data. At first it might seem as a heavy process, but it bears the advantages of keeping the original data available (problems might always happen during the sync) and alowing you to do all the necessary testing with the data before full deployement.
Here are some tips for deploying to a production database:
Always have current backups of your production database. Just in case something goes wrong.
Make sure you use some kind of source control for your scripts. Just like code. Check in scripts, stored procedures, etc.
Always have rollback scripts for any data updates. For example, have a script that updates 100 records? Write a script that copies the data somewhere else temporarily and that can restore any changes you make. It's easy to test this in DEV and INT. Gives you a bit of peace of mind when making production changes to data.
Always have a recent backup for any schema changes. If you're adding a field to a table, see if you can copy the table to a temp table and then make your changes. Might not always be possible if the table is really large, but again, it lets you quickly rollback in case of an error.
Practice, practice, practice. Practice restoring backups of old production data. Practice running your scripts in DEV and INT. Be ready to re-deploy all stored procedures at any moment.
Another subject that can be tough that you touched on is having production data in INT. I would regularly restore production database backups to INT and DEV. It's well worth it for QA since it provides them with both the quality of production data and the quantity.
I would advise against turning the INT database into production however. Developers and QA will always put in garbage data for testing and you don't want to make that live.
Related
I'm studying AWS and trying to get a mental picture of the process for a SQL rollback. Let's say I've got a containerized front-end and middle tier deployment + a SQL script that deploys several new stored procedures, modifies a few stored procedures, drops a column in TableX, adds a constraint on TableY, inserts several records to TableZ, etc. A boatload of changes. On Day 1 the front-end and middle tier containers are deployed and the SQL script is applied. All looks good for a few days as clients start working with the updates. Then it's realized on Day 5 that we need to roll things back to pre-Day 1. No problem on the front-end and middle tier deployments as we can just deploy the prior containers. However, we can't just restore the database as it has 4 days of client changes to the data. In this case, does AWS offer a service that can "undo" Day 1's SQL deployment without wiping out all of the clients' subsequent data changes? Or are we still just making sure we have a custom rollback SQL script prepared to return the db objects to their pre-Day 1 state as in years past?
The best approach to somehting like this is to try to make sure that a major rollback doesn't get to your production platform (so gets picked up in dev or staging first). QA and integration testing is your friend.
That said, life doesn't always work out the way you want. So for a major structure change like this (that would need you to retain data entered after the change) an approach I've used with success in the past is to have a mirrored script that undoes the changes in full (as far as possible). You may end up with synthetic data when it comes to restoring a dropped column, but it should get you back to where you started.
Oh, and it goes without saying that you should version everything: table structure, stored procedures etc. It makes isolating and backing out changes a whole lot easier.
In all the years of my experience, I always connected to a database by creating a new connection using IP address, username and password.
I recently joined a company where they use a desktop application written in VB6 that has an SQL server backend. Here, the practice is, get a backup of the latest version of SQL server and name it as a different DB, use it for testing purposes.
We now have a issue where we have loads of these databases created by users and it needs cleanup.
My question: Is it possible to have a centralised database which exists remotely to which everyone connects and gets the data? what are the things that we need to keep in mind to achieve this goal, so everyone can have one single database to access to, where they can make the changes.
We've been using a single centralized dev/test environment for over a decade now, with up to 50 full time developers using it -- and I'd say it works quite fine. Most of the changes are new columns into tables and not that many developers are working with the same tables / modules at the same time, so it doesn't cause that much issues.
All our stored procedures / functions are renamed for each release separately (by adding a release number in the end), and installed automatically with compilation process, even for developers. For developers compilation, the version numbers also include the developers userid. This way changing stored procedures in development won't break the test environment, or the procedures other developers are using.
The biggest advantage of this is that we can use similarly sized databases for testing and production.
Your ability to do that is really a functional and/or procedural issue. There's nothing technical that prevents you from having a single, shared database for dev/test. The challenge is, dev/test environments tend to be destructive and/or disruptive.
If you have a single DB used for all development and testing requirements, you'll probably get little to no work done. One dev modifying an object (SP, FN, table, view, etc...) can potentially break everyone else (or no one). A tester running stress tests will have everyone else getting mad about slow responses, timeouts, etc... Someone decides to test Always Encrypted or even sometime simpler like TDE can end up breaking everybody.
Dev environments almost always need their own sandbox before check-in. Checked in code/schema then get tested in a central environment that mimics prod before going to pre-prod that is (ideally) identical to prod. This is pretty basic though each team/company will have its variations.
One thing you could do right away is to automate taking a copy backup of the prod database so you drop a fresh .bak to a common location where everyone can grab from and restore to their own instance. This reduces the impact on your production system and reduces storage consumption. Another benefit is you remove all non-essential access to your production database - this is really, really important. Finally, once this is standard op, you can implement further controls or tasks in the future easily (e.g. restore to a secure instance, obfuscate/mask sensitive data, take new backup for dev/test use).
It is possible but it's usually not a good idea. It would be ok (and no more than ok) if all database access was inquiry only, but imagine the confusion that could arise if developer a fires in some updates to a table that developer be is writing a report for or if the DB was recovered in an uncontrolled manner. Development and test need a lot of managing and how many databases you need and where will depend on an analysis of your dev and test requrements.
Thanks for all the answers. We all had a discussion in our team and came up with a process that suits to our team:
There is a master database backed up and restored from the most recent and stable source
Only QA team has got write access to this database
Developers make their own test database using the Master backup
If new data is required, write SQL scripts to add it
Run unit & E2E tests on their copy
Give the new tests and scripts to add new data (if any) to QA
QA runs the tests and data scripts on the Master
When the tests are passed, if there is a SQL update script, then QA restores the Master Database from the backup (to remove data changes made by running the tests), runs the SQL scripts to update the data then backs it up as the new Master
Scripts are added to source control so we have a history
Note: As an extra safeguard we can keep a copy of the very first ever Master database somewhere else. So if anybody ever does something dumb and corrupts it, we can retrieve it and run all the SQL scripts to bring it up to date.
We have the following problem:Our customer has a the life database.Sometimes,we face bugs that are due to data in life,those bugs doesn't appear in our staging and development databases because they are usually related to the actual data.
So, for accurate debugging,we need to have the same copy of life data in another database.This database should be synchronized with the life database (either automatically or in-demand),so that we can replicate the erroneous scenarios without impacting the actual data.How can we perform that?Is it better to create this "semi-mirror" in the staging itself? As a final note,I don't want the changes from the "semi-mirror" database to be reflected one the life, Only from the life to the "semi-mirror".
Per definition you ahve no staging database. Staging should reflect real world, so contain real world data (and size) and run on a similar system.
Your customer should take a backup and you load it into staging. You do that regularly (weekly, monthly, after updates) to make sure you are in sync. Standad procecedure in every project I have ever been that worked well.
My development team of four people has been facing this issue for some time now:
Sometimes we need to be working off the same set of data. So while we develop on our local computers, the dev database is connected to remotely.
However, sometimes we need to run operations on the db that will step on other developers' data, ie we break associations. For this a local db would be nice.
Is there a best practice for getting around this dilemma? Is there something like an "SCM for data" tool?
In a weird way, keeping a text file of SQL insert/delete/update queries in the git repo would be useful, but I think this could get very slow very quickly.
How do you guys deal with this?
You may find my question How Do You Build Your Database From Source Control useful.
Fundamentally, effective management of shared resources (like a database) is hard. It's hard because it requires balancing the needs of multiple people, including other developers, testers, project managers, etc.
Often, it's more effective to give individual developers their own sandboxed environment in which they can perform development and unit testing without affecting other developers or testers. This isn't a panacea though, because you now have to provide a mechanism to keep these multiple separate environments in sync with one another over time. You need to make sure that developers have a reasonable way of picking up each other changes (both data, schema, and code). This isn't necesarily easier. A good SCM practice can help, but it still requires a considerable level of cooperation and coordination to pull it off. Not only that, but providing each developer with their own copy of an entire environment can introduce costs for storage, and additional DBA resource to assist in the management and oversight of those environments.
Here are some ideas for you to consider:
Create a shared, public "environment whiteboard" (it could be electronic) where developers can easily see which environments are available and who is using them.
Identify an individual or group to own database resources. They are responsible for keeping track of environments, and helping resolve the conflicting needs of different groups (developers, testers, etc).
If time and budgets allow, consider creating sandbox environments for all of your developers.
If you don't already do so, consider separating developer "play areas", from your integration, testing, and acceptance testing environments.
Make sure you version control critical database objects - particularly those that change often like triggers, stored procedures, and views. You don't want to lose work if someone overwrites someone else's changes.
We use local developer databases and a single, master database for integration testing. We store creation scripts in SCM. One developer is responsible for updating the SQL scripts based on the "golden master" schema. A developer can make changes as necessary to their local database, populating as necessary from the data in the integration DB, using an import process, or generating data using a tool (Red Gate Data Generator, in our case). If necessary, developers wipe out their local copy and can refresh from the creation script and integration data as needed. Typically databases are only used for integration testing and we mock them out for unit tests so the amount of work keeping things synchronized is minimized.
I recommend that you take a look at Scott AllenĀ“s views on this matter. He wrote a series of blogs which are, in my opinion, excellent.
Three Rules for Database Work,
The Baseline,
Change scripts,
Views, stored procs etc,
Branching and Merging.
I use these guidelines more or less, with personal changes and they work.
In the past, I've dealt with this several ways.
One is the SQL Script repository that creates and populates the database. It's not a bad option at all and can keep everything in sync (even if you're not using this method, you should still maintain these scripts so that your DB is in Source Control).
The other (which I prefer) was having a single instance of a "clean" dev database on the server that nobody connected to. When developers needed to refresh their dev databases, they ran a SSIS package that copied the "clean" database onto their dev copy. We could then modify our dev databases as needed without stepping on the feet of other developers.
We have a database maintenance tool that we use that creates/updates our tables and our procs. we have a server that has an up-to-date database populated with data.
we keep local databases that we can play with as we choose, but when we need to go back to "baseline" we get a backup of the "master" from the server and restore it locally.
if/when we add columns/tables/procs we update the dbMaintenance tool which is kept in source control.
sometimes, its a pain, but it works reasonably well.
If you use an ORM such as nHibernate, create a script that generate both the schema & the data in the LOCAL development database of your developers.
Improve that script during the development to include typical data.
Test on a staging database before deployment.
We do replicate production database to UAT database for the end users. That database is not accessible by developers.
It takes less than few seconds to drop all tables, create them again and inject test data.
If you are using an ORM that generates the schema, you don't have to maintain the creation script.
Previously, I worked on a product that was data warehouse-related, and designed to be installed at client sites if desired. Consequently, the software knew how to go about "installation" (mainly creation of the required database schema and population of static data such as currency/country codes, etc.).
Because we had this information in the code itself, and because we had pluggable SQL adapters, it was trivial to get this code to work with an in-memory database (we used HSQL). Consequently we did most of our actual development work and performance testing against "real" local servers (Oracle or SQL Server), but all of the unit testing and other automated tasks against process-specific in-memory DBs.
We were quite fortunate in this respect that if there was a change to the centralised static data, we needed to include it in the upgrade part of the installation instructions, so by default it was stored in the SCM repository, checked out by the developers and installed as part of their normal workflow. On reflection this is very similar to your proposed DB changelog idea, except a little more formalised and with a domain-specific abstraction layer around it.
This scheme worked very well, because anyone could build a fully working DB with up-to-date static data in a few minutes, without stepping on anyone else's toes. I couldn't say if it's worthwhile if you don't need the install/upgrade functionality, but I would consider it anyway because it made the database dependency completely painless.
What about this approach:
Maintain a separate repo for a "clean db". The repo will be a sql file with table creates/inserts, etc.
Using Rails (I'm sure could be adapted for any git repo), maintain the "clean db" as a submodule within the application. Write a script (rake task, perhaps) that queries a local dev db with the SQL statements.
To clean your local db (and replace with fresh data):
git submodule init
git submodule update
then
rake dev_db:update ......... (or something like that!)
I've done one of two things. In both cases, developers working on code that might conflict with others run their own database locally, or get a separate instance on the dev database server.
Similar to what #tvanfosson recommended, you keep a set of SQL scripts that can build the database from scratch, or
On a well defined, regular basis, all of the developer databases are overwritten with a copy of production data, or with a scaled down/deidentified copy of production, depending on what kind of data we're using.
I would agree with all the LBushkin has said in his answer. If you're using SQL Server, we've got a solution here at Red Gate that should allow you to easily share changes between multiple development environments.
http://www.red-gate.com/products/sql_source_control/index.htm
If there are storage concerns that make it hard for your DBA to allow multiple development environments, Red Gate has a solution for this. With Red Gate's HyperBac technology you can create virtual databases for each developer. These appear to be exactly the same as ordinary database, but in the background, the common data is being shared between the different databases. This allows developers to have their own databases without taking up an impractical amount of storage space on your SQL Server.
We have been storing our staging database on the production database server with the mindset that it makes sense to be as identical to production as possible.
Lately, some comments have made me question that idea. Since there is some remote chance that I will do something to production by mistake it may make sense to not put both on the same server.
Should my Staging database really live on the same server as my development database and not the same server as production?
Ideally you would want to have a separate staging environment that mirrors your live environment, but doesn't actually exist on it. However, $$$ doesn't always permit this, so the ideal isn't always followed.
This includes (but may not be limited to) the following:
Web servers
Database servers
Application servers
And anything on those machines (physical or virtual) should be isolated in their respective environments, so you shouldn't see staging code on a production server, and similarly you shouldn't see a staging database on a production database server. They should be separate.
Also, if you use a high amount of bandwidth internally you may want to even isolate the networks, to prevent the staging environment bandwidth usage from saturating the production environment's bandwidth.
In my book a staging environment should be independent because it lets you rehearse the roll-out procedures for a new release. If you are on the same box or same virtual machine you aren't getting the "full" experience of library updates and the like.
Personally I like virtual machines because I can pull production back to stage and then update it. This means that my update is very realistic, because all of the edge case data, libraries and such are being reproduced. This is a good thing... I can't count the number of times over the 9 year history of our major product that a library module wasn't included or some update script for the database hit edge cases that weren't detected in development and testing environments.
As far as touching the production environment... I would say never do this if there is an alternative. Update a shared library in staging that also impacts production and you will feel the pain. Update the code and cause your web server to go into a tizzy and you brought (at least part of) your live environment down.
If you have to fake it, I would recommend sharing with the development environments and just realize that updating production make cause unexpected downtime during the update as you validate everything works. We had to do that for the first few years for budgetary reasons and it can work as long as you don't just update production and walk away.
In summary
Production is sacrosanct: don't share any non-production aspects if you can avoid it.
Virtual machines are your friend: they let you clone working environments and update them with nearly zero risk (just copy the VM file over any botched update attempts).
Staging should be isolated from development to avoid overconfidence with your update routine.
Whichever solution you might choose in the end, I would say : keep your production server for production, and production only !
If you put some non-production on it, there is the risk of mistakes, of course, as you said... But there is also the risks of bugs : what if your application goes mad, and uses all the CPU of the server, for instance ? Your production might suffer from it.
And that's just an example, of course ;-)
In my opinion, the best solution would be to have another server for staging, with a setup that is as close as possible (a real "clone" would be the best) to the production setup.
Considering this might cost quite a bit for a machine used by only a few testers, it's often not that possible :-( An alternative I've seen is to use a Virtual Machine (hosted on your development server -- not the production one) : it acts like a "real" machine, on which you can do whatever you want, without impact eiter prod nor dev.
And, if necessary, you can use several Virtual Machine, if needed to be closest to your production settings.
Your staging DB should never be on the same server as production. I'd say its fine to have it on the same server as your dev server.
Are are a number of things that could go wrong,
Manipulating data on the wrong DB
Doing something that could actually
bring down the server. You may need
to reboot your DB server during
development and testing.
As a rule I dont think developers should have access to the live environment. Only operations should have access.
As others have said, keeping non-production entities in your production environment should be avoided like the plague. There are too many possibilities for developers to mistakenly add or modify something upon which your production environment depends. Our production server is modified only during deployment. We track every file that's changed and have a mechanism in place to roll back changes with minimal effort.
Keep staging in your dev environment if you can't get dedicated hardware.
Having a staging database on the production server is risky. however, with a sufficiently strenuous debugging / testing stages, the actual risk to production is minimal. This is especially true if the load to staging is minimal.
If you don't have specific hardware for Development, Staging, and Production then having your Staging database on the Development SQL Server is a common solution. I
t's much safer than having your Staging database on the Production Server, trying to do something with the Staging Database, and taking down the Production SQL Server.