Managing databases in Open Source Software Projects

Managing databases in Open Source Software Projects - database

I was wondering how databases are managed in open source projects which are usually hosted in repositories like CVS or SVN. Placing codes in the SVN is very logical as it allows different team members to get updated pieces of code but how about databases?
Are their schemas and contents (.sql files I assume) placed inside the SVN too? In this case if I were creating a web application, would this require developers to keep on updating their local databases with the newest .sql file?
Or, is it more like having a central server which members can modify and their software just connects to over the net?
I'm planning to start an open source web application project (which requires the use of a database) but am a bit confused of how to go about the database management part.

Typically you would include one or two things:
Schema creation scripts
Initial data to be loaded into the database
Both of these should be text files. If your data needs any special processing before it can be loaded into the DBMS, then include a tool for that too.
One thing you should not do is include any particular binary database file in your source control. For example, a SQLite database file would not be appropriate. Binary database files are not normally portable across architectures or versions.

In my experience these types of applications usually include a database set up included with the build. Usually you have to install the DB where you need it then install the client. Also, usually these databases are also open source like MySQL or something.

This depends on the project you are doing. For example if all the developers are in same location in one company, the central database server may be applicaple. If the developers are distributed around the world, then the central database server is probably out of the question and every developer creates his own copy of database for development.
I would think that most common option is that every developer uses his own database.
In any case you'll want to keep the schema creation and initial data creation files in version control. This way all the developers can create a new database easily.

Related

Creating a generic update/install script from a Sql Server Database Project

Can you create a generalized deployment script from a Sql Server Db Project in VS 2015 that doesn't require a schema compare / publish against a specific target database?
Some background:
We are using Sql Server Database projects to manage our database schema. Primarily we are using the projects to generate dacpacs that get pushed out to our development environments. They also get used for brand new installations of our product. Recently we have developed an add-on to our product and have created a new db project for it, referencing our core project. For new installations of our product where clients want the add-on, our new project will be deployed.
The problem we are having is that we need to be able to generate a "generic" upgrade script. Most of our existing installations were not generated via these projects and all contain many "custom" stored procedures/etc specific to that client's installation. I am looking for a way to generate a script that would do an "If Not Exists/Create + Alter" without needing to specify the target database.
Our add-on project only contains stored procedures and a couple tables, all of which will be new to any client opting for this add-on. I need to avoid dropping items not in the project while being able to deploy all of our new "stuff". I've found the option to Include Composite Objects which I can uncheck so that the deployment is specific to our add-on, but publishing still requires me to specify a target database so that a schema compare can be performed and I get scripts that are specific to that particular database. I've played with pretty much every option and cannot find a solution.
Bottom Line: Is there a way for me to generate a generic script that I can give to my deployment team whenever the add-on is requested on an existing install without needing to do a schema compare or publish for each database directly from the project?
Right now I am maintaining a separate set of .sql files in our (non db) project following the if not exists/create+alter paradigm that match the items in the db project. These get concatenated during build of our add on so that we can give our deployment team a script to run. This is proving to be cumbersome and we'd like to be able to make use of the database projects for this, if at all possible.

Best solution is to give the dacpacs to your installers. They run SQLPackage (maybe through a batch file or PowerShell) to point it at the server/DB to update. It would then generate the script or update directly. Sounds like they already have access to the servers so should be able to do this. SQLPackage should also be included on the servers or it can be run locally for the installer as long as they can see the target DB. This might help: schottsql.wordpress.com/2012/11/08/ssdt-publishing-your-project
There are a couple of examples of using PowerShell to do this, but it depends on how much you need to control DB names or Server names. A simple batch file where you edit/replace the Server/DB Names might suffice. I definitely recommend a publish profile and if this is hitting customer databases they could have modified, setting the "do not drop if not in project" options that show up is almost essential. As long as your customers haven't made wholesale changes to core objects, you should be good to go.

How to version control SQL Server database with Visual Studio's Git Source Control Provider

I have Git Source Control Provider setup and running well.
For Visual Studio projects, that is.
The problem is that each such project is tightly tied to a SQL Server database.
I know how to version control a database in its own .git repository but this is neither convenient nor truly robust because ideally I would want the same ADD, COMMIT, TAG and BRANCH commands to operate on both directory trees simultaneously, in a synchronized manner.
Is there a way to Git SQL Server database with Visual Studio's Git Source Control Provider in the manner I described?

You can install the SQL Server Data Tools if you want to, but you don't have to: You can Use the Database Publishing Wizard to script your table data right from Visual Studio into the solution's folder, then Git it just like you do with the other project files in that folder.

You can store your database schema as Visual studio project using SQL Server Data Tools and then version control this project using Git.

Being in the database version control space for 5 years (as director of product management at DBmaestro) and having worked as a DBA for over two decades, I can tell you the simple fact that you cannot treat the database objects as you treat your Java, C# or other files and save the changes in simple DDL scripts.
There are many reasons and I'll name a few:
Files are stored locally on the developer’s PC and the change s/he
makes do not affect other developers. Likewise, the developer is not
affected by changes made by her colleague. In database this is
(usually) not the case and developers share the same database
environment, so any change that were committed to the database affect
others.
Publishing code changes is done using the Check-In / Submit Changes /
etc. (depending on which source control tool you use). At that point,
the code from the local directory of the developer is inserted into
the source control repository. Developer who wants to get the latest
code need to request it from the source control tool. In database the
change already exists and impacts other data even if it was not
checked-in into the repository.
During the file check-in, the source control tool performs a conflict
check to see if the same file was modified and checked-in by another
developer during the time you modified your local copy. Again there
is no check for this in the database. If you alter a procedure from
your local PC and at the same time I modify the same procedure with
code form my local PC then we override each other’s changes.
The build process of code is done by getting the label / latest
version of the code to an empty directory and then perform a build –
compile. The output are binaries in which we copy & replace the
existing. We don't care what was before. In database we cannot
recreate the database as we need to maintain the data! Also the
deployment executes SQL scripts which were generated in the build
process.
When executing the SQL scripts (with the DDL, DCL, DML (for static
content) commands) you assume the current structure of the
environment match the structure when you create the scripts. If not,
then your scripts can fail as you are trying to add new column which
already exists.
Treating SQL scripts as code and manually generating them will cause
syntax errors, database dependencies errors, scripts that are not
reusable which complicate the task of developing, maintaining,
testing those scripts. In addition, those scripts may run on an
environment which is different from the one you though it would run
on.
Sometimes the script in the version control repository does not match
the structure of the object that was tested and then errors will
happen in production!
There are many more, but I think you got the picture.
What I found that works is the following:
Use an enforced version control system that enforces
check-out/check-in operations on the database objects. This will
make sure the version control repository matches the code that was
checked-in as it reads the metadata of the object in the check-in
operation and not as a separated step done manually. This also allow
several developers to work in parallel on the same database while
preventing them to accidently override each other code.
Use an impact analysis that utilize baselines as part of the
comparison to identify conflicts and identify if a difference (when
comparing the object's structure between the source control
repository and the database) is a real change that origin from
development or a difference that was origin from a different path
and then it should be skipped, such as different branch or an
emergency fix.
Use a solution that knows how to perform Impact Analysis for many
schemas at once, using UI or using API in order to eventually
automate the build & deploy process.
An article I wrote on this was published here, you are welcome to read it.

Proper structure of asp.net website and database in visual studio

My main problem is where does database go?
The project will be on SVN and is developed using asp.net mvc repository pattern. Where do I put the sql server database (mdf file)? If I put it in app_data, then my other team mates can check out the source and database and run it with the database being deployed in the vs instance.
The problem with this method are:
I cannot use SQL Management Studio with this database.
Most web hosts require me to deploy the database using their UI or SQL Management studio. Putting it in App Data will make no sense.
Connection String has to be edited each time I'm moving from testing locally to testing on the web host.
If I create the database using SQL Management studio, my problems are:
How do I keep this consistent with the source control (team mates have to re-script the db if the schema changes).
Connection string again. (I'd like to automatically use the string when on production server).
Is there a solution to all my problems above? Maybe some form of patterns of tools that I am missing?

Basically your two points are correct - unless you're working off a central database everyone will have to update their database when changes are made by someone else. If you're working off a central database you can also get into the issues where a database change is made (ie: a column dropped), and the corresponding source code isn't checked in. Then you're all dead in the water until the source code is checked in, or the database is rolled back. Using a central database also means developers have no control over when databsae schema changes are pushed to them.
We have the database installed on each developer's machine (especially good since we target different DBs, each developer has one of the supported databases giving us really good cross platform testing as we go).
Then there is the central 'development' database which the 'development' environment points to. It is build by continuous integration each checkin, and upon successful build/test it publishes to development.
Changes that developers make to the database schema on their local machine need to be checked into source control. They are database upgrade scripts that make the required changes to the database from version X to version Y. The database is versioned. When a customer upgrades, these database scripts are run on their database to bring it up from their current version to the required version they're installing.
These dbpatch files are stored in the following structure:
./dbpatches
./23
./common
./CONV-2345.dbpatch
./pgsql
./CONV-2323.dbpatch
./oracle
./CONV-2323.dbpatch
./mssql
./CONV-2323.dbpatch
In the above tree, version 23 has one common dbpatch that is run on any database (is ANSI SQL), and a specific dbpatch for the three databases that require vendor specific SQL.
We have a database update script that developers can run which runs any dbpatch that hasn't been run on their development machine yet (irrespective of version - since multiple dbpatches may be committed to source control during a single version's development).
Connection strings are maintained in NHibernate.config, however if present, NHibernate.User.config is used instead, however NHibernate.User.config is ignored from source control. Each developer has their own NHibernate.User.config, which points to their local database and sets the appropriate dialects etc.
When being pushed to development we have a NAnt script which does variable substitution in the config templates for us. This same script is used when going to staging as well as when doing packages for release. The NAnt script populates a templates config file with variable values from the environment's settings file.

Use management studio or Visual Studios server explorer. App_Data isn't used much "in the real world".
This is always a problem. Use a tool like SqlCompare from Redgate or the built in Database Compare tools of Visual Studio 2010.
Use Web.Config transformations to automatically update the connection string.

I'm not an expert by any means but here's what my partner and I did for our most recent ASP.NET MVC project:
Connection strings were always the same since we were both running SQL Server Express on our development machines, as were our staging and production servers. You can just use a dot instead of the computer name (eg. ".\SQLEXPRESS" or ".\SQL_Named_Instance").
Alternatively you could also use web.config transformations for deploying to different machines.
As far as the database itself, we just created a "Database Updates" folder in the SVN repository and added new SQL scripts when updates needed to be made. I always thought it was a good idea to have an organized collection of database change scripts anyway.

A common solution to this type of problem is to have the database versioning handled in code rather than storing the database itself in version control. The code is typically executed on app_start but could be triggered in other ways (build/deploy process). Then developers can run their own local databases or use a shared development database. The common term for this is called database migrations (migrating from one version to the next). Here is a stackoverflow question for .net tools/libraries to make this easier: https://stackoverflow.com/questions/8033/database-migration-library-for-net
This is the only way I would handle this on projects with multiple developers. I've used this successfully with teams of over 50 developers and it's worked great.

The Red Gate solution would be to use SQL Source Control, which integrates into SSMS. Its maintains a sql scripts folder structure in source control, which you can keep in the same folder/ respository that you keep your app code in.
http://www.red-gate.com/products/SQL_Source_Control/

Sub version for database (I want something for data values in the database, not for the schema)

I am using github for maintaining versions and code synchronization.
We are team of two and we are located at different places.
How can we make sure that our databases are synchronized.
Update:--
I am rails developer. But these days i m working on drupal projects (where database is the center of variations). So i want to make sure that team must have a synchronized database. Also the values in various tables.
I need something which keep our data values synchronized.
Centralized database is a good solution. But things get disturbed when someone works offline

if you use visual studio then you can script your database tables, views, stored procedures and functions as .sql files from a database solution and then check those into version control as well - its what i currently do at my workplace
In you dont use visual studio then you can still script your sql as .sql files [but with more work] and then version control them as necessary

Have a look at Red Gate SQL Source Control - http://www.red-gate.com/products/SQL_Source_Control/
To be honest I've never used it, but their other software is fantastic. And if all you want to do is keep the DB schema in sync (rather than full source control) then I have used their SQL Compare product very succesfully in the past.
(ps. I don't work for them!)

You can use Sql Source Control together with Sql Data Compare to source control both: schema and data. Here is an article from redgate: Source controlling data.

These are some of the possibilities.
Using the same database. Set-up a central database where everybody can connect to. This way you are sure everybody uses the same database all the time.
After every change, export the database and commit it to the VCS. This option requires discipline and manual labor.
Use some kind of other definition of the schema. For example, Doctrine for php has the ability to build the database from a yaml definition which can be stored in the vcs. This can be easier automated then point 2.
Use some other software/script which updates the database.

I feel your pain. I had terrible trouble getting SQL Server to play nice with SVN. In the end I opted for a shared database solution. Every day I run an extensive script to backup all our schema definitions (specifically stored procedures) for version control into text files. Due to the limited number of changes this works well.
I now use this technique for our major project and personal projects too. The only negative is that it relies on being connected all the time. The other answers suggest that full database versioning is very time consuming and I tend to agree. For "live" upgrades we use the Red Gate tools, they do both schema and data compare and it works very well.

http://www.red-gate.com/products/SQL_Data_Compare/. We were using this tool for keeping databases in sync in our company. Later we had some specific demands so we had to write our own code for synchronization. Depends how complex is you database and how much changes is happening. It is much simpler if you have time when no one is working and you can lock database for syncronization.

Check out OffScale DataGrove.
This product tracks changes to the entire DB - schema and data. You can tag versions in any point in time, and return to older states of the DB with a simple command. It also allows you to create virtual, separate, copies of the same database so each team member can have his own separate DB. All the virtual copies are tracked into the same repository so it's super-easy to revert your DB to someone else's version (you simply check-out their version, just like you do with your source control). This means all your DBs can always be synchronized.
Regarding a centralized DB - just like you don't want to work on the same source code, you don't want to be working on the same DB. It means you'll constantly break each other's code and builds each time someone changes something in the DB.
I suggest that you go with a separate DB for each developer, and sync them using DataGrove.
Disclaimer - I work at OffScale :-)

Try Wizardby. This is my personal project, but I've used it in my several previous jobs with great deal of success.
Basically, it's a tool which lets you specify all changes to your database schema in a database-independent manner and then apply these changes to all your databases.

Transferring changes from a dev DB to a production DB

Say I have a website and a database of that website hosted locally on my computer (for development) and another database hosted (for production)...ie first I do the changes on the dev db and then I do the changes to the prod DB.
What is the best way to transfer the changes that I did on the local database to the hosted database?
If it matters, I am using MS Sql Server (2008)

The correct way to do this with Visual Studio and SQL Server is to add a Database Project to the web app solution. The database project should have SQL files that can recreate the entire database completely on a new server along with all the necessary tables, procedures users and roles.
That way, they are included in the source control for all the rest of the code as well.
There is a Changes sub-folder in the Database Project where I put the SQL files that apply any new alterations or additions to the database for subsequent versions.
The SQL in the files should be written with proper "if exists" blocks such that it can be run safely multiple times on an already updated database without error.
As a rule, you should never make your changes directly in the database - instead modify the SQL script in the project and apply it to the database to make sure your source code (the SQL files) is always up to date.

We do this in the (Ruby on) Rails world by writing "migrations," which capture the changes you make to the DB structure at each point. These are run with a migration tool (a task for rake), which also writes to a DB table so it knows whether a particular migration has been run or not.
You could make a structure like this for your dev platform (.Net?), but I think that in other answers to this question people will suggest available tools for handling database versioning in your development platform, or perhaps for your specific DB.
I don't know any of these, but check out this list. I see a lot of pay things out there, but there must be something free. Also check this out.

I migrate changes via change scripts written by developers when they have tested/verified their changes. (The exception being moving large data.) All scripts are stored in a Source control system. and can be verified by DBAs.
It is manual, sometime time consuming but effective, safe and controled process.
Databases are too vital to copy from dev.
There are tools to help create/verify these scripts.
See http://www.red-gate.com/
I have used their tools to compare 2 databases to create scripts.
Brian

If the changes are small, I sometimes make them by hand. For larger changes, I use Red Gate's SQL Compare to generate change scripts. These are hand-verified and run in the QA environment first to make sure they don't break anything. For large changes, we run a special backup prior to making the change both in QA and in production.

We used to use the approach provided by Ron. It makes sense for a big project with dedicated team of DBAs. But if you do not have a dedicated developers who write code only for DB this approach is time and resource expensive.
The approach to use RedGate DB compare is also not good. You still have a do a lot of manual work you can skip some step by mistake.
It needs something better. This is was the reason why we built the "Agile DB Recreation/Import/Reverse/Export tool"
The tool is free.
Advantages: your developers use any prefered tools to develop DEV DB. Then they run the DB RIRE and it makes reverseengeniring DB (tables, views, stor proc, etc) and export data into XML files. XML files you can keep in the any code repository system.
And the second step is to run DB RIRE one more time to generate difference scripts between structure and data in XML files and in Production DB.
Of course you can make as much iterations as you need.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight