I got a database in SQL where everything is in the dbo schema. Now we want to copy that schema to have two schemas in the same database with the exact same content. Is there any easy way to do this in an Azure Database?
(We want to separate our Development and UAT but still only use one database)
While the other answer posted here, using the SSIS to transfer the SQL objects, will work, I feel compelled to point out that your approach raises a lot of other concerns.
Using a single database for two environments is not a good practice. The first big issue with is how do you handle deployments? Let's say UAT is in the uat schema and development is in development is the dev schema. You make a change to the Customers table, how do you deploy the table change to both schemas? If you use SSIS, you will need an on-premise SSIS server that handles copying the changes to the various shcemas in the target database. This will create a large maintenance headache and likely lead to important changes being wiped out.
Another issue this results in how does your application target a specific schema? You can have a login defaulted to a specific schema when it runs, but many ORM tools will want to the schema ahead of time. This will force to write the code in way that could potentially force to deploy different code to different environments. This opens up the possibility that parts of the code won't get test until production.
The last concern I have is with this approach, versioning your database becomes difficult and many of the tools that are out there, won't support what you are doing. This means you will likely be creating custom processes and tools to deploy a database instead of leveraging tools built by vendors like Microsoft or Red Gate. This puts you in a position where you need to support not only the application you make for your customers, but also an application to do your job (basically doubling your work).
My suggestion is think about the need to run two environments in a single database. I'm assuming this is likely due to cost, in which case you might find this to be false. Azure has many pricing tiers to support customers with various budgets. Depending on your application workload for both environments, you will likely find you need a large DTU database to support both. You might find that by having two databases, you can leverage smaller DTUs tiers which may end up being cheaper.
Please use the CopySchema option of the Transfer SQL Server Objects Task in SSIS as explained here.
Related
There are many questions like this on Stack Overflow, but it seems to me that they already have the migration scripts in place. For example, insert and update statements are available, as such, Flyway can just use those scripts to create the tables in the target database and its data.
However, my question is that, what if we don't have those scripts? For example, tables are being created manually or with some other tools and the data are being inserted over the years with the bound application, now we want to switch to a different SQL database. Can Flyway be used as a tool to transfer all the tables and databases only with providing connections?
If the answer is no, how this sort of migration can be done and what are the best practices.
I did a search and went through Flyway documentation but they are all vague and doesn't give you a clear example of that. Some of these tools I found are used for Salesforce but I need a tool/library that possibly can be used in Java using JDBC connection, or other languages such as Python etc, as our databases - for security reasons - cannot be accessed directly and are cloud based.
For your information, we are using a range of databases PostgreSQL, Aurora MySQL, SQL Server.
No, Flyway can't do this sort of thing.
Flyway is a deployment tool. While it certainly can include data movement, as with the deployment of database objects, the scripts supporting data movement have to be completely idempotent or completely isolated in their deployment. Neither of these is lending itself to what you're talking about.
What you're talking about is something like Redgate SQL Compare along with SQL Data Compare. These two would allow you to compare two databases, identify the differences, then generate the necessary scripts. I'm aware of no open source tools that do all that, especially that do all that across multiple data platforms. And that tool only supports SQL Server (there is a second one for Oracle, but no others).
The thing is, if you're allowing deployments to occur using manual processes or 3rd party mechanisms, without going through source control as centralized management of your code, you can't use Flyway anyway. Flyway requires a consistent and stable process wherein it is the thing running deployments. Allowing, or even encouraging, drift through out-of-band deployments will break your Flyway deployments.
DISCLOSURE: I work for Redgate, but we're not the solution you're looking for.
Every shop at which I've worked has had their own cobbled-together, haphazard, poorly understood and poorly maintained method for updating production databases.
I've never seen a consistent method for doing this.
So, in the most recent versions of SQL Server, what is the best practice for updating schema changes and migrating data from a development or test server to a production server?
Is there a 3rd party tool which handles this painlessly?
I'd imagine the ultimate tool would be able to
detect schema changes between two DBs and generate DDL to update one to the other.
include the ability to have custom code which performs custom data migration steps
allow versioning so a v1 db could be updated all the way to a v99 database, running all scripts and migration steps in order.
The three things I've used are:
For schemas
Visual Studio Database Projects. Meh. They are okay but you still have to do alot of the work yourself.
Red Gate's SQL Compare and the entire SQL Toolbelt. They've worked pretty hard to make this something you can version control. In practice I've found with databases you are usually trying to get from point A in the version timeline to point B. With binaries, you often just clobber whatever is there with point B (an oversimplification I know, but often true).
http://www.red-gate.com/
xSQL is a good place to start if your system is small and perhaps will remain small:
http://www.xsqlsoftware.com/LiteEdition.aspx
I don't work for or know anyone who works for or get any money from these people. Just telling you what I've done in the past.
For data
Red Gate has SQL Data Compare.
However, if you want something "free" (or included with SQL Server)
I've actually had a lot of success just using BCP and writing a small system that injects and extracts data. Generally when I find myself doing this I ask myself, "Why? If I am changing data, does that mean I am really changing something that is configuration? Can I use a different method here?" But sometimes you can't (maybe it's a legacy system where the original devs thought databases are for everything).
The problem with BCP extracts is they don't version control very well. There are tricks I've used like extracting in character mode and stuffing an order by in the extract query to try and pull rows out in an order that makes them somewhat more palatable for version control.
For small Projects I have used RedGate to manage schema and data migrations with alot of success. Very easy to use works for most cases.
For larger enterprise systems for Schema and data changes normally you save all the SQL scripts as text files and run them. We also include a Rollback script to run incase something goes wrong during the migration. Run this on UAT server then Test/staging/pre prod server then on Production. Saving a copy of all these files plus their roll back scripts should allow you to move from multiple versions of a DB.
There is also http://code.google.com/p/migratordotnet/ if your using .NET it allows you to define these scripts in CODE. Very usesful if you want to deploy across multiple DBs in an automated way. Makes it easy to say set my DB to version 23. Or revert my DB to version 5. etc. Works for schema and data, but I would only really use it for a few lines of data.
First you have to think that the requirements between scenarios vary a lot:
Customers purchase v1 of the product at Costco and install it in they home office or small business. When v2 comes out, customer purchases a box of the product and installs it on a new computer. It exports the data from the v1 installation and imports it into v2 installation. Even though behind the scenes both v1 and v2 use a SQL Express instance there is no supported upgrade. Schema changes on the deployed databases are not expected (hidden database, non technical user) and definitely not supported. The only 'upgrade' path supported is an explicit export/import, which probably uses an XML file or something similar.
A business purchases v1 of the product with a support contract. It installs it on its department SQL Server instance, from where the data is accessed by the purchased product and by many more integration services, reports etc. When v2 is released, the customer runs the prescribed upgrade procedure, if it runs into problems it calls the product vendor customer support line which walks the customer through some specific steps for his deployment. Database schema customizations are expected and often supported, including upgrade scenarios, but the schema changes are done by the customer (not known at v2 design time).
A web startup has database that backs the site. Developers make changes on their personal instances and check in changes. Automated build deployment with contiguous integration picks up the changes and deploys them against a test instance, and run build validation tests. The main branch build can be, at any moment, deployed into production. Production is the one database that backs the site. The structure of the production database is documented and understood 100%, every single change to the production database schema occurs through the build system and QA process. On a side note, this is the scenarios most SO users that ask your question have in mind, minus the part about '100% documented and understood'. I give the example of WWW backing site, but deplyment can really be anything. The gist of it is that there is only one production database (it may include HA/DR copies, and it may consist of multiple actual SQL Server databases), and is the only database that has to be upgraded.
A succesfull web startup. Same as above, but the production database has 5TB of data and 5 minutes of downtime make the CNN headlines. Schema changes may involve setting up replicas and copying data into new schemas with contiguous updates, followed by an online switch of operations to the replica. Schema changes are designed by MCM experts and deployn a schema change can be a multi-week process.
I can go on wit more scenarios. The point is that the requirement of each of these cases are so vastly different, that no 'state of the art' can answer all of them. Some scenarios will be perfectly OK with a schema diff deployment tool like vsdbcmd or SQL Compare. Other scenarios will be much better faced with explicit versioning scripts. Other might have such specific requirements (eg. 0 downtime) that each upgrade is a project on its own and has to be specifically custom tailored.
One thing is clear though across all scenarios: if your shop threats the development database MDF file* as 'source' and makes changes to it using the management tools, that is always a major #fail. All changes should be captured explicitly as some sort of source control artifact, and this is why I favor most the explicit version scripts, as in Version Control and your Database. But I recon that the VSDB project support for compile time schema validation and its ease of refactoring schema objects make a pretty powerful proposition and VSDB schema compare deployment may be OK.
Another important approache that has to be addressed is the code first schema modeling from tools like EF or LinqToSql. It works brilliantly to deploy v1, but fails miserably at any subsequent version. I strongly discourage these approaches.
But to sum up and answer in brief: as today, the state of the art sucks.
At Red Gate we'd recommend one of two approaches depending on your requirements and how formal you need your processes to be. If you have a development database and simply want to push changes to production, SQL Compare is the tool for the job. A level of versioning can be achieved by using the schema snapshots.
However, if you wants full source control benefits, such as team collaboration, sandboxed environments, audit trail, compliance, history, rollback, etc, you should consider SQL Source Control. This links development databases to Team Foundation Server or Subversion.
I am using github for maintaining versions and code synchronization.
We are team of two and we are located at different places.
How can we make sure that our databases are synchronized.
Update:--
I am rails developer. But these days i m working on drupal projects (where database is the center of variations). So i want to make sure that team must have a synchronized database. Also the values in various tables.
I need something which keep our data values synchronized.
Centralized database is a good solution. But things get disturbed when someone works offline
if you use visual studio then you can script your database tables, views, stored procedures and functions as .sql files from a database solution and then check those into version control as well - its what i currently do at my workplace
In you dont use visual studio then you can still script your sql as .sql files [but with more work] and then version control them as necessary
Have a look at Red Gate SQL Source Control - http://www.red-gate.com/products/SQL_Source_Control/
To be honest I've never used it, but their other software is fantastic. And if all you want to do is keep the DB schema in sync (rather than full source control) then I have used their SQL Compare product very succesfully in the past.
(ps. I don't work for them!)
You can use Sql Source Control together with Sql Data Compare to source control both: schema and data. Here is an article from redgate: Source controlling data.
These are some of the possibilities.
Using the same database. Set-up a central database where everybody can connect to. This way you are sure everybody uses the same database all the time.
After every change, export the database and commit it to the VCS. This option requires discipline and manual labor.
Use some kind of other definition of the schema. For example, Doctrine for php has the ability to build the database from a yaml definition which can be stored in the vcs. This can be easier automated then point 2.
Use some other software/script which updates the database.
I feel your pain. I had terrible trouble getting SQL Server to play nice with SVN. In the end I opted for a shared database solution. Every day I run an extensive script to backup all our schema definitions (specifically stored procedures) for version control into text files. Due to the limited number of changes this works well.
I now use this technique for our major project and personal projects too. The only negative is that it relies on being connected all the time. The other answers suggest that full database versioning is very time consuming and I tend to agree. For "live" upgrades we use the Red Gate tools, they do both schema and data compare and it works very well.
http://www.red-gate.com/products/SQL_Data_Compare/. We were using this tool for keeping databases in sync in our company. Later we had some specific demands so we had to write our own code for synchronization. Depends how complex is you database and how much changes is happening. It is much simpler if you have time when no one is working and you can lock database for syncronization.
Check out OffScale DataGrove.
This product tracks changes to the entire DB - schema and data. You can tag versions in any point in time, and return to older states of the DB with a simple command. It also allows you to create virtual, separate, copies of the same database so each team member can have his own separate DB. All the virtual copies are tracked into the same repository so it's super-easy to revert your DB to someone else's version (you simply check-out their version, just like you do with your source control). This means all your DBs can always be synchronized.
Regarding a centralized DB - just like you don't want to work on the same source code, you don't want to be working on the same DB. It means you'll constantly break each other's code and builds each time someone changes something in the DB.
I suggest that you go with a separate DB for each developer, and sync them using DataGrove.
Disclaimer - I work at OffScale :-)
Try Wizardby. This is my personal project, but I've used it in my several previous jobs with great deal of success.
Basically, it's a tool which lets you specify all changes to your database schema in a database-independent manner and then apply these changes to all your databases.
I am developing a multi-tenant app. I chose the "Shared Database/Separate Schemas" approach.
My idea is to have a default schema (dbo) and when deploying this schema, to do an update on the tenants' schemas (tenantA, tenantB, tenantC); in other words, to make synchronized schemas.
How can I synchronize the schemas of tenants with the default schema?
I am using SQL Server 2008.
First thing you will need is a table or other mechanism to store the version information of the schema. If nothing else so that you can bind your application and schema together. There is nothing more painful than a version of the application against the wrong schema—failing, corrupting data, etc.
The application should reject or shutdown if its not the right version—you might get some blowback when its not right, but protects you from the really bad day when the database corrupts the valuable data.
You'll need a way to track changes such as Subversion or something else—from SQL you can export the initial schema. From here you will need a mechanism to track changes using a nice tool like SQL compare and then track the schema changes and match to an update in version number in the target database.
We keep each delta in a separate folder beneath the upgrade utility we built. This utility signs onto the server, reads the version info and then applies the transform scripts from the next version in the database until it can find no more upgrade scripts in its sub folder. This gives us the ability upgrade a database no matter how old it is to the current version. If there are data transforms unique the tenant, these are going to get tricky.
Of course you should always make a backup of the database that writes to an external file preferable with an human identifiable version number so you can find it and restore it when the script(s) go bad. And eventually it will so just plan on figuring out how to recover and restore.
I saw there is some sort of schema upgrader tool in the new VS 2010 but I haven't used it. That might also be useful to you.
There is no magic command to synchronize the schemas as far as I know. You would need to use a tool - either built in house or bought (Check out Red Gate's SQL Compare and SQL Examiner - you need to tweak them to compare different schemas).
Just synchronizing can often be tricky business though. If you added a column, do you need to also fill that column with data? If you split a column into two new columns there has to be conversion code for something like that.
My suggestion would be to very carefully track any scripts that you run against the dbo schema and make sure that they also get run against the other schemas when appropriate. You can then use a tool like SQL Compare as an occasional sanity check to look for any unexpected differences.
I've recently asked a question about how suitable a DVCS is for the corporate environment, and that has sparked another question for me.
One of the plus sides to a DVCS seems to be that you can easily branch and try out new things. My problem starts when I begin to think about database changes. I've always found it tricky to get a DB into a VCS and it just sounds like it's going to be even harder with a DVCS.
So, whats the best way to work with databases and a DVCS?
EDIT: I've started looking into Migrator.NET. What do people think of projects like this for easily moving between versions specificaly with experimental branches in your DVCS?
I think the best way to deal with this issue is to work with DB Schemas, not the databases themselves. In this case, each developer would have their own database to develop against.
Here are some of the options available:
Migrations framework within Ruby on Rails.
South for Django, in addition to the schema being defined in the model classes themselves.
Visual Studio 2008 Team System Database Edition for .NET: You define the schema and the tool can do a diff on schema and data to generate scripts to go between different versions of the database.
These may give you some inspiration on how to deal with putting a database in version control. Another benefit that comes when you deal schemas is that you can more readily implement TDD and Continuous Integration (CI). Your TDD/CI environment would be able to build up a new version of the database and then run tests against the newly generated environment.
Version all the scripts you're using to manage your database. If you need to have "in-development" changes to a DB, make them on your personal DB until such time as you "publish" your changes.
Database version control is always the most difficult thing in a multi-developer environment.
Typically each user will have their own DB which is a chimera of some but not all of the DB changes. When they make changes, they'll need to commit their change scripts. This gets really awkward. The core problems seem to stem from database changes affecting many aspects of the system and multiple table changes being dependent on each other - and how to migrate to the new schema from the old schema. Migrating data to a new schema is typically non-trivial. Often you want to default a column when data is copied to the new schema, but NOT default a column in general for INSERT, say. These are typically already difficult in production deployment issues and having to manage the database during development when the database design could be in major flux in the same way as a major deployment is a lot more work than you usually need to be doing in development. Time that could be better spent ensuring that your database is well-designed - constraints, foregin keys, etc.
Because the developers are more likely to step on each other with database changes, we always had a database chokepoint - the developers all developed against the SAME development database and made their changes "live". Then the dev database was version controlled independently. This is not really easy when people are offsite or whatever. Another alternative is to have designated database developers who coordinate changes several developers need to the same table - that doesn't need to be their entire job, but gives you better DB design consistency. Or you can coordinate database revisions so that people become more aware of the DB revs other people are doing and time their changes to wait until a DB rev is available from another developer.
The best way to not put database into VCS in binary form. Period.
If you have text representation of your database and you have special merge tool to resolve conflicts when your database will be changed in different branches -- then you can start thinking about versioning databases. Otherwise it will be constant pain in the ass.