Need good scheme/workflow for managing database objects using Subversion - sql-server

How do you track/manage your stored procedures, views, and functions in SQL Server?
I'd like to use Subversion, but it looks like I would have to just save & commit the CREATE/ALTER statements. That might work okay for me, but I suspect I'd end up doing a lot of nagging.
Is anyone using versioning with their databases? Is there a better way?
In the past, people have just commented out parts of the code and left it in. Or, they add little "added on 2/31/2010" comments all over. It drives me nuts, because I know there is a better way.
We do log changes in the object's header, but that's pretty limited. It would make my life easier to be able to diff versions.
Additional Info
We are using SQL Server 2005. I have Subversion (via VisualSVN Server) and TortoiseSVN installed, but I'm open to other suggestions.
By database objects, I specifically mean stored procedures, views, and functions.
There are only a few tables I would need to track. The database is the backend for a commercial application, and we mostly pull information out for reporting
I found a related question about stored procedure versioning

We script everything and put it into Subversion. Nothing can be loaded to Prod without a script (developers do not have rights to prod) and the people with rights on prod only accept scripts they loaded from Subversion.

We revision our database, schema creation, dw, etl, stored procedures just like any other piece of code, because it's code!
I have also seen people type dates in headers, etc. This is normally due to them completely missing the point of revision control.

Have a look at liquibase, here
It manages your sql changes/scripts for you, and can apply them in conjunction with svn via hooks or scripts. Makes doing all sorts of setup easy, and helps eliminate the case of the missing trigger/sproc/etc...

I'm not sure what you all mean with "database objects". Are these only the tables, views, procedures etc or also data? I mean daily created data?
Assumed you mean the database schema definition. By my experience there is only one way to handle database schema definitions (if you don't have NHibernate or some similar tool). You write sql scripts that create your database from scratch and check them in. You use the same scripts for installation of your software. You see the differences by just comparing the scripts files.

Whenever I've gone through this excercise, it's come down to 3 main things that need to be source-controlled:
Stored Procedures / Views / Triggers (more or less anything that can be fairly expressed as "code". These are fairly simple, include a conditional drop and create at the top of the file.
Table Schema - DROP / CREATE statements as above. You can try to get fancy with ALTER statements, but it tends to get really messy.
The biggest challenge we faced was this forces you into a system where your DB goes back to an initial state often - if there's a fair amount of work involved in bringing DBs to something usable / testable, it can be a pain. In that case we kept a library of scripts that brought a DB to various usable states, and source controlled those as well.
Data within tables. We looked at a couple of approaches here - either a series of INSERT statements stored in a file like "TableName_Data.sql" or a CSV file with custom build tooling that parsed and inserted when the DB was rebuilt.
Ultimately we went with the INSERT statements for simplicity's sake.

Related

Group Stored Procedures based on Module in SQL Server 2008 [duplicate]

I have a SQL Server database that has a huge proliferation of stored procedures. Large numbers of stored procedures are not a problem in my Oracle databases because of the Oracle "package" feature.
What do programmers do to get around the lack of a "package" feature like that of Oracle?
While SQL Server has nothing to offer by way of the "cool features" of encapsulation and package state like you are used to, you can organize your stored procedures into schemas.
In enterprise manager, these procs are still all listed together which makes for a HUGE treelist if you have hundreds of procs. I too miss the organization and cool features of Oracle packages. However, all platforms have their strengths.
NOTE: Writing stored procedures in the .NET language DOES give you encapsulation and state. It still does not however separate them in the EM treeview in any special way.
Come up with a good naming convention, use it, and enforce it.
Schemas may be used to organize stored procedures and other objects. Personally, I prefer to use schemas when they organize objects by functional area, and where those funcational areas correspond to security boundaries. An example of this is found in the AdventureWorks sample databse, which has schemas like "HumanResources" and "Sales". The theory being that a given user may need access to objects in "HumanResources", but may not need access to "Sales" information.
An alternative is to use a naming convention and enforce it, as James says above. I'll add that SQL Server Management Studio has a filter button that can be used to filter the list of objects displayed. For instance, one can click on the "Stored Procedures" folder and filter on Name contains "Add".
On my current project, I have pulled a number of SQL queries out of SSIS packages and into stored procedures. In order to distinguish between these stored procedures and those that should be of general use, I have prefixed the names with "ssis". It would certainly have been more pleasant if I could have created something similar to a namespace in C# or C++, and created "SSIS.SelectUserLookupData" instead of "ssis_SelectUserLookupData". It would be even nicer if these namespaces could be nested.
If this is one of the featues of Packages in Oracle, then perhaps someone would let me know.
I've worked with both SQL Server and Oracle so have seen the good and bad of both. As the above comments have beena bit heated I'll try and keep this as neutral as possible...
So, what's an Oracle Package? Think of it like a database class
The Package has two elements: a header file and a body file. The header file is your public interface, and contains the signature (name, params and return type if applicable) of all the stored procedures or functions (in Oracle a function returns a value, a stored proc doesn't) that are directly callable. The package body must implement all the procedure signatures in the package header file.
The body element of the package contains all the stored procs and logic that actually do the work. You may have a Save procedure declared in the package header that calls an insert or update proc that exists in the body. The developer can only see the "Save" proc. It's important to keep in mind that the package body can also implement procs or functions not declared in the package header, they're just not accessible outside of the package itself.
I found packages to be really useful for a number of reasons:
You've got the concept of a public interface that can be provided to other developers
Packages can mirror your compiled classes. My Orders.Save() C# method will call my Oracle Orders.SaveLineItem method to save each line item and an Oracle SaveOrder method to save the order summary details.
My procs are grouped together in a nice, logical way inside the packages
Personally, I would be love MS to implement some kind of package functionality as I think it makes for a cleaner database.
One additional feature of packages that was not mentioned is the ability to 'wrap' the body. The header is always public and can be viewed by anyone with permissions to execute the package. But that also allows them to view the code in the body. You can wrap the body, encrypting it, and prevent anyone from seeing what the code is actually doing. Its a nice feature where security is a big issue.
3) The best argument against oracle packages is that based on experience and research on the Ask Tom site, one can't update a package without taking it off line. This is unacceptable. With SQL Server, we can update stored procedures on the fly, without interrupting production operation.
I understand the frustration of this statement, but I would not call id "unacceptable". In a true production environment, changes should never be tested in production. Updates should be moved from a test environment to production in a scheduled and orderly manner. In a 24/7 system, then redundant production environment should handle down time while servers are updated. Not only does the package have to be taken off line, but the new package, if not compiled, will fail when placed back on line. There is a DBA element required for Oracle databases. However, I do miss the Oracle packages.
It is somewhat funny to see how emotional one can get over such a dry subject.
The fact that Oracle has a feature that SQL Server does not seems to generate all kinds of reactions towards the disputable characteristics of this feature.
For starters, the question was in the style: there is this feature in Oracle that is being missed in SQL Server and what is the recommended approach.
No need to get emotional about it.
For those who do not like the packaging feature in Oracle, -for whatever the reason-, they still can go about it the same way one can do with SQL Server.
Getting more into the detail, there could be a follow-up question in the style: when modifying a function or procedure within a package, the entire package is invalidated and this "sucks", what would be the recommended way to avoid the sucking aspect.
Personally, I have never seen anyone complaining about not being able to modify a statically linked library in an executable without having to relink it.
Like people have said, Schema's are a more logical and ANSI compliant way to organize database tables and procedures.
Software engineering best practices are that we should never make a change directly on any server. Since all database sprocs are scripted and under configuration control, we can arrange those scripts into any folder structure we want.
(outdated info sourced from AskTom has been removed)
I would thank my lucky stars that SQL Server doesn't have packages. Oracle packages suck.
Hmm, we need a way to take all these procedures and put them in one place. I know! Let's make developers create and maintain two files for each package. They will love us forever!
As long as MS never implements packages like Oracle did, it'll be a win in my book.
EDIT for commenters:
Oracle Packages are simply a way to organize your stored procedures into, well, packages so that you don't have 100 stored procedures sitting around, but maybe 5 packages. They're not stackable like packages in Java or C# code. All packages are at the same level.
A package requires two files: the headers file and the body file. This creates frustration when adding new procedures to an existing package, because you cannot add the body without adding the header, even though it contains the exact same information as is in the body.
For example, here is a snippet from the header file of one of my packages:
PROCEDURE bulk_approve_events
(
i_last_updated_by IN VARCHAR2,
o_event OUT NUMBER
);
And here's the corresponding procedure in the body:
PROCEDURE bulk_approve_events
(
i_last_updated_by IN VARCHAR2,
o_event OUT NUMBER
) IS
...
BEGIN
...
END;
No difference. The header file is useless and is simply another hurdle for the developer to step over when developing with packages. On my project, we have a convention that all the commented documentation for each procedure goes in the header, along with the details of when it was added and by whom, but that could just as easily be included in the body.

Generating database scripts for SQL Server database versioning

In the scope of responsible programming and versioning, I would like to start to version my database changes especially since I am developing on my database instance then moving it to production. I haven't found any thing that truly makes sense to me on how to do this. I am using Visual Studio 2010 Pro as my IDE. Is there a document that makes this process simple and able to detect changes to the database with relative ease? Or what should I change in my workflow to make this easier?
One way that I've successfully done this sort of thing in the past, is via Sql Source Control. Visual Studio does not offer this functionality for you.
Alternatively, you can use SSMS to generate the Database scripts for you and save it as a file; then you can check in the script. You would chose whether you generate the whole DB script in one file or whether you do it on an object by object basis. The syncing part will have to be done by you by executing your scripts in production. In conclusion a total nightmare.
Redgate also offers Sql Compare, which is great for syncing databases. Take a look at their products if you or your company can afford them.
We use our own DB solution in-house which brings all the tools required for proper DB versioning. While I realize that it may not be a perfect solution for everyone, I invite you to have a look at it (it is open-source): bsn ModuleStore
The versioning aspect is as follows: the tool can script out the SQL semi-automatically, and it does reformat the source code to be in an uniform format. The files will therefore always be identical for the same source, no matter of when and by whom something has been scripted; this therefore works nicely with non-locking source control systems (especially SVN, Git or Mercurial).
The reformat puts all statements in the same form (e.g. optional keywords such as AS, INNER, OUTER etc. are dealt with), scripts everything to the "dbo" schema (even if it was in a different one), puts all identifiers into the square braces ([something]), uppercases all reserved words, does the indentation etc.
Besides versioning, the runtime part of the tool can diff the running DB and the CREATE scripts (DB source code) and apply updates automatically for all non-destructive changes (e.g. updating indexes, constraints, views, stored procedures, triggers, custom types, new tables etc.). Destructuve changes have to be scriped manually (table changes which then usually require data transformations). The runtime will make sure that all updates are performed in a transaction and rollback if the resulting DB doesn't match the CREATE scripts, therefore you get the safety of knowing that the DB is exactly on the version required by the application, even if it has been tampered with manually.
Also, multiple "modules" can be used in a single database. Each module is stored as a schema and independent of other schemas, thereby making it possible to add or remove modules from one single DB, and avoiding the need to create multiple databases for different parts of the application. Also, the use of schemas to do this makes sure that there are no name collisions.
It may be worth noting that the toolset has no dependency to the SMO, it is autonomous.
Save Your Database scripts at SVN. Here is the Refernce How to use SVN Tortoise
OR
Save your database script at VSS. Here is the reference What is VSS ? How can we use that ?
In both cases you can keep track of the changes done so that in future you can check the history which in saved in the form of versions.
You can use Red Gate product also
EDIT
How do you pull out what what has changed?
Use comparison feature to check the changes made in the previous versions.
How do I apply the changes to the live database server?
Download the latest file from server.
I hope you are not using the Drop statements for the Table in your consolidated script. As it will delete all records from the table.
Drop statements will take place for Stored Pro, View, Function etc.
Please note that you have to run the complete latest database script file on the production server with below mentioned action plans
1. Remove Drop Statement for Schema DDL
2. Add Drop/Create Statements for Stored Proc/Views
3. Include Alter statements DML of schema.
Hope this will definitely help you.

Sub version for database (I want something for data values in the database, not for the schema)

I am using github for maintaining versions and code synchronization.
We are team of two and we are located at different places.
How can we make sure that our databases are synchronized.
Update:--
I am rails developer. But these days i m working on drupal projects (where database is the center of variations). So i want to make sure that team must have a synchronized database. Also the values in various tables.
I need something which keep our data values synchronized.
Centralized database is a good solution. But things get disturbed when someone works offline
if you use visual studio then you can script your database tables, views, stored procedures and functions as .sql files from a database solution and then check those into version control as well - its what i currently do at my workplace
In you dont use visual studio then you can still script your sql as .sql files [but with more work] and then version control them as necessary
Have a look at Red Gate SQL Source Control - http://www.red-gate.com/products/SQL_Source_Control/
To be honest I've never used it, but their other software is fantastic. And if all you want to do is keep the DB schema in sync (rather than full source control) then I have used their SQL Compare product very succesfully in the past.
(ps. I don't work for them!)
You can use Sql Source Control together with Sql Data Compare to source control both: schema and data. Here is an article from redgate: Source controlling data.
These are some of the possibilities.
Using the same database. Set-up a central database where everybody can connect to. This way you are sure everybody uses the same database all the time.
After every change, export the database and commit it to the VCS. This option requires discipline and manual labor.
Use some kind of other definition of the schema. For example, Doctrine for php has the ability to build the database from a yaml definition which can be stored in the vcs. This can be easier automated then point 2.
Use some other software/script which updates the database.
I feel your pain. I had terrible trouble getting SQL Server to play nice with SVN. In the end I opted for a shared database solution. Every day I run an extensive script to backup all our schema definitions (specifically stored procedures) for version control into text files. Due to the limited number of changes this works well.
I now use this technique for our major project and personal projects too. The only negative is that it relies on being connected all the time. The other answers suggest that full database versioning is very time consuming and I tend to agree. For "live" upgrades we use the Red Gate tools, they do both schema and data compare and it works very well.
http://www.red-gate.com/products/SQL_Data_Compare/. We were using this tool for keeping databases in sync in our company. Later we had some specific demands so we had to write our own code for synchronization. Depends how complex is you database and how much changes is happening. It is much simpler if you have time when no one is working and you can lock database for syncronization.
Check out OffScale DataGrove.
This product tracks changes to the entire DB - schema and data. You can tag versions in any point in time, and return to older states of the DB with a simple command. It also allows you to create virtual, separate, copies of the same database so each team member can have his own separate DB. All the virtual copies are tracked into the same repository so it's super-easy to revert your DB to someone else's version (you simply check-out their version, just like you do with your source control). This means all your DBs can always be synchronized.
Regarding a centralized DB - just like you don't want to work on the same source code, you don't want to be working on the same DB. It means you'll constantly break each other's code and builds each time someone changes something in the DB.
I suggest that you go with a separate DB for each developer, and sync them using DataGrove.
Disclaimer - I work at OffScale :-)
Try Wizardby. This is my personal project, but I've used it in my several previous jobs with great deal of success.
Basically, it's a tool which lets you specify all changes to your database schema in a database-independent manner and then apply these changes to all your databases.

Since SQL Server doesn't have packages, what do programmers do to get around it?

I have a SQL Server database that has a huge proliferation of stored procedures. Large numbers of stored procedures are not a problem in my Oracle databases because of the Oracle "package" feature.
What do programmers do to get around the lack of a "package" feature like that of Oracle?
While SQL Server has nothing to offer by way of the "cool features" of encapsulation and package state like you are used to, you can organize your stored procedures into schemas.
In enterprise manager, these procs are still all listed together which makes for a HUGE treelist if you have hundreds of procs. I too miss the organization and cool features of Oracle packages. However, all platforms have their strengths.
NOTE: Writing stored procedures in the .NET language DOES give you encapsulation and state. It still does not however separate them in the EM treeview in any special way.
Come up with a good naming convention, use it, and enforce it.
Schemas may be used to organize stored procedures and other objects. Personally, I prefer to use schemas when they organize objects by functional area, and where those funcational areas correspond to security boundaries. An example of this is found in the AdventureWorks sample databse, which has schemas like "HumanResources" and "Sales". The theory being that a given user may need access to objects in "HumanResources", but may not need access to "Sales" information.
An alternative is to use a naming convention and enforce it, as James says above. I'll add that SQL Server Management Studio has a filter button that can be used to filter the list of objects displayed. For instance, one can click on the "Stored Procedures" folder and filter on Name contains "Add".
On my current project, I have pulled a number of SQL queries out of SSIS packages and into stored procedures. In order to distinguish between these stored procedures and those that should be of general use, I have prefixed the names with "ssis". It would certainly have been more pleasant if I could have created something similar to a namespace in C# or C++, and created "SSIS.SelectUserLookupData" instead of "ssis_SelectUserLookupData". It would be even nicer if these namespaces could be nested.
If this is one of the featues of Packages in Oracle, then perhaps someone would let me know.
I've worked with both SQL Server and Oracle so have seen the good and bad of both. As the above comments have beena bit heated I'll try and keep this as neutral as possible...
So, what's an Oracle Package? Think of it like a database class
The Package has two elements: a header file and a body file. The header file is your public interface, and contains the signature (name, params and return type if applicable) of all the stored procedures or functions (in Oracle a function returns a value, a stored proc doesn't) that are directly callable. The package body must implement all the procedure signatures in the package header file.
The body element of the package contains all the stored procs and logic that actually do the work. You may have a Save procedure declared in the package header that calls an insert or update proc that exists in the body. The developer can only see the "Save" proc. It's important to keep in mind that the package body can also implement procs or functions not declared in the package header, they're just not accessible outside of the package itself.
I found packages to be really useful for a number of reasons:
You've got the concept of a public interface that can be provided to other developers
Packages can mirror your compiled classes. My Orders.Save() C# method will call my Oracle Orders.SaveLineItem method to save each line item and an Oracle SaveOrder method to save the order summary details.
My procs are grouped together in a nice, logical way inside the packages
Personally, I would be love MS to implement some kind of package functionality as I think it makes for a cleaner database.
One additional feature of packages that was not mentioned is the ability to 'wrap' the body. The header is always public and can be viewed by anyone with permissions to execute the package. But that also allows them to view the code in the body. You can wrap the body, encrypting it, and prevent anyone from seeing what the code is actually doing. Its a nice feature where security is a big issue.
3) The best argument against oracle packages is that based on experience and research on the Ask Tom site, one can't update a package without taking it off line. This is unacceptable. With SQL Server, we can update stored procedures on the fly, without interrupting production operation.
I understand the frustration of this statement, but I would not call id "unacceptable". In a true production environment, changes should never be tested in production. Updates should be moved from a test environment to production in a scheduled and orderly manner. In a 24/7 system, then redundant production environment should handle down time while servers are updated. Not only does the package have to be taken off line, but the new package, if not compiled, will fail when placed back on line. There is a DBA element required for Oracle databases. However, I do miss the Oracle packages.
It is somewhat funny to see how emotional one can get over such a dry subject.
The fact that Oracle has a feature that SQL Server does not seems to generate all kinds of reactions towards the disputable characteristics of this feature.
For starters, the question was in the style: there is this feature in Oracle that is being missed in SQL Server and what is the recommended approach.
No need to get emotional about it.
For those who do not like the packaging feature in Oracle, -for whatever the reason-, they still can go about it the same way one can do with SQL Server.
Getting more into the detail, there could be a follow-up question in the style: when modifying a function or procedure within a package, the entire package is invalidated and this "sucks", what would be the recommended way to avoid the sucking aspect.
Personally, I have never seen anyone complaining about not being able to modify a statically linked library in an executable without having to relink it.
Like people have said, Schema's are a more logical and ANSI compliant way to organize database tables and procedures.
Software engineering best practices are that we should never make a change directly on any server. Since all database sprocs are scripted and under configuration control, we can arrange those scripts into any folder structure we want.
(outdated info sourced from AskTom has been removed)
I would thank my lucky stars that SQL Server doesn't have packages. Oracle packages suck.
Hmm, we need a way to take all these procedures and put them in one place. I know! Let's make developers create and maintain two files for each package. They will love us forever!
As long as MS never implements packages like Oracle did, it'll be a win in my book.
EDIT for commenters:
Oracle Packages are simply a way to organize your stored procedures into, well, packages so that you don't have 100 stored procedures sitting around, but maybe 5 packages. They're not stackable like packages in Java or C# code. All packages are at the same level.
A package requires two files: the headers file and the body file. This creates frustration when adding new procedures to an existing package, because you cannot add the body without adding the header, even though it contains the exact same information as is in the body.
For example, here is a snippet from the header file of one of my packages:
PROCEDURE bulk_approve_events
(
i_last_updated_by IN VARCHAR2,
o_event OUT NUMBER
);
And here's the corresponding procedure in the body:
PROCEDURE bulk_approve_events
(
i_last_updated_by IN VARCHAR2,
o_event OUT NUMBER
) IS
...
BEGIN
...
END;
No difference. The header file is useless and is simply another hurdle for the developer to step over when developing with packages. On my project, we have a convention that all the commented documentation for each procedure goes in the header, along with the details of when it was added and by whom, but that could just as easily be included in the body.

How to keep code base and database schema in synch?

So recently on a project I'm working on, we've been struggling to keep a solution's code base and the associated database schema in synch (Database = SQL Server 2008).
Database changes occur fairly regularly (adding columns, constraints, relationships, etc) and as a result it's not uncommon for people to do a 'Get Latest' from source control and
find that they also need to rebuild the database as well (and sometimes they forget to do the latter).
We're not using VSTS: Database Edition (DataDude) but the standard Visual Studio database project with a script (batch file) which tears down and recreates the database from T-SQL scripts. The solution is a .Net & ASP.net solution with LINQ to SQL underlying as the ORM.
Anyone have ideas on an approach to take (automated or not) which would keep everyone up to date with the latest database schema?
Continuous integration with MSBuild is an option, but only helps pick up any breaking changes committed, it doesn't really help in the scenario I highlighted above.
We are using Team Foundation Server, if that helps..
We try to work forward from the creation scripts.
i.e a change to the database is not authorised unless the script has been tested and checked into source control.
But this assumes that the database team is integrated with your app team which is usually not the case in a large project...
(I was tempted to answer this "with great difficulty")
EDIT: Tools won't help you if your process isn't right.
Ok although its not the entire solution, you should include an assertion in the Application code that links up to the database to assert the correct schema is being used, that way at least it becomes obvious, and you avoid silent bugs and people complaining that stuff went crazy all of the sudden.
As for the schema version, you could use some database specific functionality if available, but i personally prefer to declare a schema version table and keep the version number in there, that way its portable and can be checked with a simple select statement
have a look at DB Ghost - you can create a dbp using the scripter in seconds and then manage all your database code with the change manager. www.dbghost.com
This is exactly what DB Ghost was designed to handle.
We basically do things the way you are, with the generation script checked into source control as well. I'm the designated database master so all changes to the script itself are done through me. People send me scripts of the changes they have made, I update my master copy of the schema, run a generate scripts (SSMS) to produce the new DB script, and then check it in. I keep my copy of the code current with any changes that are being made elsewhere. We're a small shop so this works pretty well for us. I realize that it probably doesn't scale.
If you are not using Visual Studio Database Professional Edition, then you will need another tool that can break the database down into its elemental pieces so that they are managable and changeable in an easier manner.
I'd recommend seriously considering Redgate's SQL tools if you want to maintain sanity over all your database changes and updates.
SQL Packager
SQL Multi Script
SQL Refactor
Use a tool like RedGate SQL Compare to generate the change schema between any given version of the database. You can then check that file into source code control
Have a look at this question: dynamic patching of databases. I think it's similar enough to your problem to be helpful.
My solution to this problem is simple. Define everything as XML, and make sure that both the database, the ORM and the UI are generated from this XML, no exceptions. That way, you can use code generation tools to quickly regenerate the database creation script, which will alter your schema while (hopefully) preserving some data. It takes some effort to do, but the net result is well worth it.

Resources