Integration Branch or not? - sql-server

In an environment (SQL Server, TFS, SSDT, VS) with around a dozen TFS SQL Server Data Tools Database Projects per solution (with database projects only) with four to five simultaneous branches, would you recommend using an integration branch or not? And why (not)?
Is it worth the overhead?
If not, how do we best perform merges between branches?

It totally depends on your teams workflow on how your branches relate to your workflow and release strategy. Let me give 2 different examples of where you would want an integration branch, and where you might not.
Example #1
You have 5 Feature Branches, some or all of them will be integrated together to create the next release. In this case you clearly want an integration branch.
Example #2
You have 3 branches representing v1 - v3 of the software that are all under development simultaneously. In this case you might not need an integration branch. All v1 code should be merged into v2/v3 branches, but not the other way around.

Related

Source Control Branching Strategy

We are trying to determine the best source control branching strategy to use at work. We use a VSO frontend attached to a GIT backend. We have 4 database environments, DEV, QA, STAGE and PROD. At any given time we have many teams working on different features that often leapfrog each other, in addition to a lot of ongoing database cleanup work (adding Primary and Foreign Keys, setting columns to non nullable, etc)
My idea is to maintain four persistent branches, one for each database environment, that reflect their respective database environments. Any team working on a new feature will branch from Dev, and at the point the work is done merge back into the persistent DEV branch. When the work is ready to go to QA it will be merged to QA, at the point it is ready to move to STAGE it'll be merged to STAGE and so forth. Any non-breaking database updates not tied to a freature (like making columns non NULLABLE) can flow as change sets without needing a feature branch, but every potentially breaking change will need to work as a feature branch.
Has anyone used this strategy? Did it work?
Is there a better branching model you can recommend?
Well, since no one ended up answering I will use my own answer for an update. It appears we will end up using this branching strategy, more or less as I originally described. The main difference is that the PROD branch will be called master, and the feature branches will be branched from master rather than DEV.
This is because the master/PROD branch is considered more stable than DEV. The previous environment where I had branched from DEV successfully was a single release-train. Since features are expected to leapfrog each other here we can't do that.
Also, all development will need to be done in feature branches. This is due to a limitation of the VSO GIT plugin however, since the mechanism for linking GIT pushes to VSO tickets requires that work be done in feature branches.

Synchronizing multiple servers and local machines with the same data / source code

Background:
Building web app with a team of 2 developers. RESTful backend via Flask. Using Linux, Apache, Redis, and Postgres.
3 servers:
1 for production
1 for development
1 for UAT
4 databases:
1 for PROD/UAT server
1 for DEV server
1 for developer A on his local machine
1 for developer B on his local machine
2 local machines / developers
In addition to the 4 databases, the developers each have one additional database that serves testing. This testing database needs to be the exact same at all times between the two developers.
Developer B has his own fork of the data, sending pull requests to the master repo, which is worked on by developer A.
Problem:
We have no real protocols to easily transfer data between each of the databases. For example, the developers test databases are often different, which causes chaos. Moving data from DEV to UAT/PROD is done manually.
Developers work in different environments and on different forks. We use pull requests in github to transfer code to Developer A's main repo.
Question
What do you recommend as a solution to our database woes? Is there a better way to share data? Is there a better way for developer A and developer B to share their environment and source code?
I have been working on this problem for an environment of similar size this year. The technology is different, but in short:
2-3 developers
Each developer has a complete environment
Integration, UAT and production server pairs (all VMs).
Linux, Mongodb, Django and Angular
Code
After a few false starts we have settled on the feature branch approach and a good working proposition. See http://nvie.com/posts/a-successful-git-branching-model/. At the moment we have master for production and a development branch. Only one person works on a 'feature' (or we pair). Features should be short-lived. We might flip between multiple features when necessary. We can merge from one feature to another when we need to. It all comes out quite cleanly when we merge back into development. The overhead is minimal and we rarely step on each other's toes. When we do the diff makes it clear what we need to fix manually. A good GIT UI tool helps.
Database
You could use a shared database for development. We don't. Every developer has an independent environment. We may need collections, but usually from UAT or prod rather than each other. I have created a comprehensive admin page. On any environment we can export a collection set. There are mechanisms to send this export where needed. This is used to take prod data and place it in UAT for problem replication. We can then drop back to Integration and dev for repair. Similarly devs can share data using the same application admin page.
Release
Going up the chain from dev -> integration -> uat -> prod is also handled by the application admin page. Any system can export, but only up to the next stage. It is not automatically imported. The import is not automagic. The admin on the target environment is told that a release is available. They can import it from the admin page. We do the same for code and database collections.
It is useful to have it integrated rather than knowing which script to run. It could be a separate application if that worked batter in your environment.

TFS Branching and Merging Strategy - Multiple Applications, Multiple Development Teams, Monthly Releases (but need to support out of rels changes too)

We are looking for guidance regarding Branching and Merging for TFS and multiple applications (some are COTS; some internally developed) with multiple development teams. Note that we currently utilize monthly release windows but are going to quarterly. We also need to be able to support eFixe and non-release development efforts (ie: regulatory changes that have to be implemented outside of a window). Based on current research we are focusing efforts on research of the following 2 options:
Option 1) Release branching per major application where each application would have MAIN, RELEASE and PRODUCTION branches (the PRODUCTION branch would support an eFix branch which would support eFixes and off cycle changes).
Options 2) Release branching for entire organization - MAIN, RELEASE and PRODUCTION branches would contain ALL applications.
A number of helpful branching suggestions here:
Visual Studio Team Foundation Server Branching and Merging Guide
WRT to your question my view is
Option 1) Individual application release branching
The is if your applications does not have too many if any dependencies on each other.
Pros:
Builds would be quicker
More flexibility to Cherry pick deploy components
Better for Continuous integration builds as you can build
and deploy little and often
Sign off and lock indivdual applications based on your quality quality gates
Easier to redeploy critical fixes
Merges smaller and possibly more frequent
Cons:
Possible management overhead for maintaing builds/deploy scripts
If there are application dependencies could introduce risks when cherry picking components
Require solid deployment framework for maintaining the assembly versions and labels.
Multiple deployment asset resources will be involved when deploying (instead of 1 deployment package)
Options 2) All applications release branching
Pros:
Easier to identify the version of all apps at any time,
Can deploy using Teardown and full deploy,
Less build and deployment scripts to manage
Lesse deployment asset resources to mange.
Easier to track/visualise release level merges/changesets
Cons:
Builds may take longer
Less release flexibilty if an application is not fit for purpose or contains bugs after code freeze/sign off
Harder to cherrypick deploy components
Production critical fixed may take longer to get rebuilt and deployed
Rollback is all or nothing depending on the deployment framework
May require full regression tests for minor fix deploys (depending on the QA controls)

Migrating a branching strategy from ClearCase to TFS 2010

I am in an "internal" IT shop and we currently use ClearCase for version management. Our branching strategy is common for this with the main branch being reserved for live code and branching off main for project and hotfix type activities. Each project (and they overlap often) has a branch off main, we don't have multitiered branching.
We get the situation were we have to do merging between integration branches so that the release 4 branch picks up all of the release 3 changes (for example) before release 3 goes live and thus is baselined. And the number of times that a hotfix happens when a project is high and has to be supported.
However, this isn't really going to be possible within the TFS world as we don't want to have to drop to the command line to do baseless merging, however we need to have a highly flexible branching capability - something we have got really used to with ClearCase.
So ideally we want TFS branches to allow us to have a production baseline, to be able to branch off to do short term hotfixes, to be able to branch off to do projects - without actually knowing which of the branches will go live (and thus baselined) first. Having worked through all of the MS documents they all appear to be focused on product type environments - but we are mostly a support and enhancement shop.
I'm looking for recommendations/pointers - I've been a ClearCase admin and can quite happily juggle with branching mentally - but everything I come up with just doesn't look like it will fit with TFS - but this is most probably because my mental process is ClearCase-like and isn't in tune with TFS (yet!)
I haven't much experienced with TFS2010, but considering branches are now first class citizen with TFS2010, one practical solution would be to consider your enhancement as a "product" and create a patch branch accordingly.
I suppose you have read the TFS2010 Branching Guide.
It does include a branching scenario for addressing hot-fixes issues.
(from the "TFS Branching Guide - Scenarios 2010_20100330.pdf" document)

One DB per developer or not?

In a corporate development environment writing mostly administrative software, should every developer use their own database instance, or should they use a central database instance during development? What are the advantages and disadvantages of each approach? What about other environments and other products?
If you all share the same database, you might have some issues if someone make a structure change to the database and that the code is not "Synchronized" with it.
I highly recommend one DB per developer for the only reason that you don't want to do "write" test to see someone else override you right after. A simple exemple? You try to display product for a website. Everything works until all the products disappear. Problem? Another developer decided to play with the "Active" flag of the product to test something else. In cases like that, a transaction might not even work. End of the story, you spend time debugging for someone else action.
I highly recommend replicating the staging database to the developer database once in a while to synchronize the structure (or better, have a tool to rebuild a database from scratch).
Of course, we require scripts for changes to the database and EVERYTHING is in a Source Control.
The days when database environments should be scarce are long gone. I'm writing this posting on a XW9300 with 5x15k SCSI disks in it. This machine will run a substantial ETL job in a fairly reasonable length of time and (in mid-2007) cost me about £1,700 on ebay including the disks. From a developer's perspective, especially on database centric projects like data warehousing, the line between a developer and a DBA is quite blurred. As I write this I am building a partition management framework for a SQL Server 2005 data warehouse.
Developers should have one or more development databases of their own for (IMO) these reasons:
Requires people to keep stored procedures, patch scripts and schema definition files in source control. Applying the patches can be automated to a fairly large extent. There are even tools such as Redgate SQL Compare Pro that do much of the grunt work for this.
Encourages an application architecture that facilitates easy configuration management and deployment, as people have to deploy onto their own workstations. Many deployment wrinkles will get sorted out long before they hit production or people even realise they could have gone wrong.
Avoids developers tripping up on each other's work. On something like a data warehouse where people are working with ETL code this is an even bigger win.
It encourages a degree of responsibility as developers have to learn basic database administration. This also eliminates a lot of the requirements for operational support staff and some of the dev-vs. ops friction.
If you have your own database, there are no gatekeepers obstructing experimentation or other work on it. The politics around managing 'servers' disappear as there are no 'servers'.
This is a productivity win in an any environment with significant incumbent bureaucracy.
For small data volumes an ordinary PC is fast enough for this. Developer editions or licencing are available for most if not all database management systems and will run on a desktop O/S. If you're working with Linux or Unix this is even less of an issue. For larger data volumes, up to and including most MIS applications, a workstation like an HP XW9400 or Lenovo D10 can be outfitted with 5 15k disks for less than the cost of a lot of professional development tooling. (Yes, I know it's dual licence, but a commercial all-platform licence for QT is about £4000 a seat).
A machine like this will run an ETL process with 10's to 100's of millions of rows faster than you might think.
It facilitates setting up more than one environment for smoke testing or reconciliation purposes. As you have complete control over the machine, you have quite a lot of scope for mocking up conditions in a production environment. For example, I once made a simple emulator for Control-M by just bodging some of its runtime scripts.
Where you have this level of control and transparency over the environment you can produce a fairly robustly tested deployment process which does quite a lot to eliminate opportunities for finger-pointing in production deployment.
I've seen small teams working with 14 environments, and had 7 active on a workstation at the same time. On database heavy work such as ETL, where you're with with whole tables, working in a single dev environment is a recipe for time wastage or spending your time walking on eggshells.
Also, you can use single user development licences for database platforms, which can save you the cost of the workstations just in database licencing. Most developer licences (Microsoft and OTN are a couple of examples I'm familiar with) will let you use the system on a single workstation for a single developer free or for a nominal price.
Conversely, licencing terms on shared development servers are often somewhat murky and I've seen vendors try to shake customers down for licencing on dev servers on more than one occasion.
Each of our developers has a fully functional database. Changes are scripted and source controlled like any other code.
Ideally, yes, each developer should have a "sandbox" development environment, so they can test their code even before deploying it to a shared testing/staging environment.
Each developer's environment should run scripted tests that reset the database to a known state. This is impossible to do in a shared environment.
The cost of giving each developer their own instance is less than the cost of the chaos resulting from multiple developers trying to test volatile changes together in a shared environment.
On the other hand, in many IT shops the system uses complex infrastructure, involving multiple application servers or multiple physical nodes. Then the economics change; it's less expensive for people to cooperate and avoid stepping on each other's work than it would be to replicate it for each developer. Especially true if you integrate expensive third-party systems that don't give you licenses for multiple development environments.
So the answer is yes and no. :-) Do give each developer their own environment if that environment can be reproduced inexpensively.
My recommendation is to have 2 levels of development environment:
Each developer has their own personal development system, with its own dp, web servers, etc. This allows them to code against a known setup, write automated (system level) tests that initialize their database and systems to a known state, etc.
The development integration environment is shared by all developers and used to make sure everything is working together as expected before handing it off to QA. Code is checked out from source control and installed there, and there's only a single instance of any servers (db or otherwise).
This question hints at what a developer needs to do his/her job. Certainly a private DB instance should be provided. Equally important, I would make sure that the DB is the same product/version as what you intend to deploy to. Don't develop on MySQL 6.x and deploy to MySQL 5.x. (This goes for app servers, and web servers as well!)
Having a developer DB doesn't necessarily ean you need it hosted on your local machine. You could have a central DBMS host machine with all dev dbs located on it. The pros are the garauntee that you develop against the target DB. Less overhead on dev boxes, more space/horsepower for beefy IDEs and app servers. The cons are single point of failure for all devs. (The DBMS server goes down nobody can work.) Lack of dev exposure to setting up and administering the DBMS. Devs cannot experiment as easily with upcoming DB releases or alternate DB choices to solve tough problems.
Some of the pros can be cons and vice-versa depending on your organization and structure. Maybe you don't want devs administering the DBMS. Maybe you do plan to support varying DB platforms. The decision boils down to your organization as well as your target platform choices. If you plan to target a variety of DB/OS/app server combinations then each dev should not only have their own DB but should work in a unique combination. (MySQL/Tomcat/OSX for one DB2/Jetty/Linux for another Postegres/Geronimo/WinXP for a 3rd, etc.) If you setup an ASP (Application Service Provider) type shop on an iSeries on the other hand then of course you'll likely have a central host with all dev dbmses still each dev should have at least a separate db instance to allow structural changes to schema.
I have an instance of SQLServer Development Edition installed locally. We have a QA DB server, as well as multiple production servers. All development and integration testing is done using my local server (or other developers local servers). New releases are staged to the QA server. Each release, after acceptance by the customer, is put into production.
Since I mostly do web development, I use the web server bundled with VS2008 for development and local test, then publish the web app to a QA web server hosted on a VM. Once accepted by the customer, it is published to one of several different production web servers -- some virtual, some not, depending on the application.
My department at my company only has limited development environments, purely because of cost of support and hardware. We have a couple of environments which are based on t-1 nightly refreshes from production, and some static ones.
Ideally, everyone should have their own, but in many cases, this is going to be impractical when the following are true:
you have a large number of developers needing resources (our department has maybe 80)
each developer needs multiple resources (typically i use 4-5 different dbs each day)
up to date data is important (you just cant refresh them fast enough)
In these cases, shared instances and good communication are whats needed.
One advantage to one database per developer, each developer has a snapshot of their own data in a "known" state.
I like the idea of using a local version when a developer must be isolated - developing a schema change, performance testing, setting up specific scenarios, etc...
At other times use the shared version as to insure everything is in sync with each other.
I think there's a terminology problem here. It's been a while since I've worn my DBA hat (golly gee, almost 10 years) - so someone else can chime in and correct me.
I think everyone is in agreement that each developer should have his own sandbox schema set.
In MySQL and Sybase/MS SQLServer, each database engine can support multiple databases. Each database is (normally) fully independent of the other database. So you can have one database engine instance, and give each developer his database space to do as he wish. the only problem is if the developers are using tempdb -- there can be collisions there (I think -- this you will need to look up). Just be careful that cross-database queries with fixed database names are not used.
In Oracle, the database engine instance is tied to a particular schema set. If you have multiple developers on the same engine, they are all pointing to the same tables. In this case, yes, you will need to run multiple instances.
Each of our developers has a local database. We store the create script AND a dump of the "standard data" in our SVN repo. We have an extensive set of tests that must pass against this test data. We also have a "sandbox" database that is available for people to put data in that they want shared into the standard data. This works well for us and allows us to let developers modify their local copies of data to test things, but we control what gets passed to other developers. We also strictly control schema changes, so we don't encounter the problems that someone else mentioned.
It really depends on the nature of your application. If yours is a client-server architecture in a distributed environment, it is best to have a central database that everyone uses. If the product gives users an environment with local database instances, you can use that. It is best if your development mirrors the real world environment as closely as possible.
It is also dependent on what stage of development you are in. Probably in the early stages, you dont want to get bogged down by connectivity, network and distributed environment issues and just want to be up and running. In such a case, you can start with a database instance-per-user model before switching to the central model as the product reaches some level of maturity.
In my company we tend to copy the entire DB when working on non-trivial new features. The reasoning there is disk space is cheap, whereas accidental data loss (even if it's test data) isn't.
I've worked in both types of development environments. Personally, I prefer to have my own DB/app server. However, there may be some advantages to using a shared infrastructure for development.
The main one is that a shared environment more closely resembles a real-world scenario: you are more likely to uncover problems with locking or transactions when all developers share a DB. Giving each developer their own DB may lead to "it works on my DB" syndrome.
However, if you need to apply and test schema changes or optimisations, then I can see problems in this sort of set-up.
Maybe a compromise solution would work best: all developers share infrastructure, and if someone needs to test schema changes, they create their own temporary DB instance (maybe there is one just sitting there for this purpose?) until they are happy to commit the new schema to source control.
You do have your entire schema (and test data) in source control, right? Right???
I like the compromise solution (all developers share infrastructure, and if someone needs to test schema changes, they create their own temporary DB instance (maybe there is one just sitting there for this purpose?) until they are happy to commit the new schema to source control.)
One DB per developer. No question. But the issue really is how to script entire databases, "control data", and version them. My solution is here : http://dbsourcetools.codeplex.com/
Have fun. - Nathan.
The database schemas should be held in source control and developers should own the changesets checked in for code and db together. Prior to checkin the developer should be working on his own database. After checkin, an automated build (eg: on checkin, nightly, etc), should update a central integrated db, along with the apps themselves.
At developer instance level the data loaded should be appropriate for unit testing, at least. At integrated level, the shared db should hold data also appropriate for testing, but should not rely on production replication - this is just a slack substitute for managed test data.
In my experience the only reason that developers opt for a shared db is that they believe that developing and running on recent production data is somehow 'real' and means that they can put less effort into testing. They prefer to tread on each others toes and put up with a shared db that slowly corrupts before the next production refresh than write and manage proper tests. It's this kind of management practice that gives the IT world the poor reputation to deliver that it currently has.
I'd suggest to use one instance of the database. You don't want your database to be a moving target.

Resources