We are trying to determine the best source control branching strategy to use at work. We use a VSO frontend attached to a GIT backend. We have 4 database environments, DEV, QA, STAGE and PROD. At any given time we have many teams working on different features that often leapfrog each other, in addition to a lot of ongoing database cleanup work (adding Primary and Foreign Keys, setting columns to non nullable, etc)
My idea is to maintain four persistent branches, one for each database environment, that reflect their respective database environments. Any team working on a new feature will branch from Dev, and at the point the work is done merge back into the persistent DEV branch. When the work is ready to go to QA it will be merged to QA, at the point it is ready to move to STAGE it'll be merged to STAGE and so forth. Any non-breaking database updates not tied to a freature (like making columns non NULLABLE) can flow as change sets without needing a feature branch, but every potentially breaking change will need to work as a feature branch.
Has anyone used this strategy? Did it work?
Is there a better branching model you can recommend?
Well, since no one ended up answering I will use my own answer for an update. It appears we will end up using this branching strategy, more or less as I originally described. The main difference is that the PROD branch will be called master, and the feature branches will be branched from master rather than DEV.
This is because the master/PROD branch is considered more stable than DEV. The previous environment where I had branched from DEV successfully was a single release-train. Since features are expected to leapfrog each other here we can't do that.
Also, all development will need to be done in feature branches. This is due to a limitation of the VSO GIT plugin however, since the mechanism for linking GIT pushes to VSO tickets requires that work be done in feature branches.
We are in the process of obsoleting a huge amount of ClearCase UCM projects, streams, branches etc. Is there a benefit of also obsoleting activities within the streams?
The main advantage of a cleartool lock -obs is making sure the commands/GUI listing those objects don't show by default the obsolete ones.
(See FAQ)
That is why you would obsolete branches or streams, labels or baselines: the graphical lsvtree will be much snappier without having to list all those objects.
But activities? That would matter if those activities are still presented as a choice when you checkout a file. If they are not (because, for instance, their stream is already obsolete), then it won't have a huge impact.
See "What do we need to lock/obsolete to discontinue a project?"
Another consideration is the number of activities/streams/projects in question. If you're -obsoleting something like 90% of the activities, then you may want to retire the streams in question and create new ones. If you're doing that to 90% of the streams in a project, you might want to consider retiring the project and creating a new one.
an obsolete lock does not prevent the objects in question from being looked up (to get the list of activities in a stream, for example) it just prevents them from being displayed by default in the GUI or on the command line.
I'm working on a web based Java project that stores end user data in a MySql database. I'd like to implement something that allows the user to have functionality similar to what I have for my source code version control (e.g. Subversion). In other words, I'd like to implement code that allows the user to commit and rollback work and return to an existing branch. Is there an existing framework for this? It seems like putting the database data into version control and exposing the version control functionality to the end user (i.e. write code that allows the user to commit, rollback, etc.) could be a reasonable approach but it also seems their might be some problems with this approach. For example, how would you allow one user to view a rolled back version of the data (i.e. you can't just replace the data the database is pointing to if one user wants to look at a rolled back version of the data)? If given the choice of completely rebuilding the system using any persistence architecture what could be used to store the data that would make this type of functionality easy to implement?
There are 2 very common solutions for what you need:
http://www.liquibase.org/
https://flywaydb.org/
Branching and merging the user data
Your question is about solutions to version the user data in a application, to give your users capabilities such as branching and merging. You pondered about exposing a real version control such as svn.
The side-effects I can foresee are:
You will have to index things by directory and filename. Maybe using an abstraction of directories as entities and filenames as the primary key.
Operating systems (linux, mac and windows alike) does not handle well directories with millions of files. You will have to partition the entity. Usually hashing the ID (md5 for example) and taking the beginning of the hash to create an subdirectory. The number of digits to take from the hash depends on the expected size of the entity.
Operating systems (linux, mac and windows alike) are not prepared for huge quantity of files. I did a test on that. It took me days to backup and finally remove an file tree with hundreds of millions of files.
You will not be able to have additional indexes beyond the primary key, however you can work around that creating a data-mart, as I will describe below.
You will not have database constraints, but similar functionality can be implemented through git/svn/cvs triggers.
You will not have strong transactions, but similar functionality can be implemented through git/svn/cvs triggers.
You will have a working copy for each user, this will consume space depending on the size of the repositories. That way each user will be in a single point in time.
GIT is fast enough to switch from a branch to another, so go back in time and back will take only seconds (unless the user data is big, of course).
I saw a Linus interview where he warned about low performance in huge git repositories. Maybe it is best to have a repository to each user or other means to avoid your application having a single humongous repository.
Resolution of the changes. I bet that if you create gazillions of versions any version control will complaint. I do not what gazillions mean. You will have to test it.
Query database
A version control working copy will be limited to primary key queries using the "=" operator and sequential scans. This is not enough to make good reports and statistics on any usage pattern I can think off. That why you need to build a data-mart from your application data and you have two ways of doing that:
A batch process: that reads the whole repository history and builds cubes and other views to allow easier querying.
GIT/SVN/CVS triggers: can call programs made by you on file addition, modification, exclusion, branch creation and merging. This could be used to update the database when a change happen.
The batch is easier to implement but takes time to the reports and statistics be synchronized with the activity. You probably will want to go that way in the 1.0 version and in time moving to triggers to get things more dynamic.
Simulating constraints and transactions
GIT, SVN and CVS supports triggers that execute programs when a new version is submitted. Then the relationships and consistency can be checked to accept or not the change.
Alternative Solutions
Since you do not specified the kind of application you want, I will talk about blogs, content portals and online stores. For those kinds of applications I see no much reason to reinvent the wheel and build a custom database. Most of the versioning necessary can be predicted in the database model. A good event-oriented database design will be enough.
For example, a revision in a blog post could be modeled as marking the end date/time of the post and creating a new row for the revised post, increasing the version number and setting the previous version id. The same strategy can be used with sales and catalog of an online store. If you model your application with good logs you does not need version control.
Some developers also do a row level trigger that records everything that has changed on the database. This is a bit harder for an auditor that would need to reconstruct the past from bad designed logs. I personally do not like this way because is very difficult to index this kinds of queries. I prefer to make my whole applications around a good designed and meaningful log.
For example:
History Table
10/10/2010 [new process] process_id=1; name=john
11/10/2010 [change name] process_id=1; old_name=john; new_name=john doe
12/10/2010 [change name] process_id=1; old_name=john doe; new_name=john doe junior
Process Table after 12/10/2010.
proc_id=1 name=john doe junior
That way I can reconstruct almost everything on the past and still have my operational data in a easy-to-use format.
However, this is not close to the usage pattern you want (branching and merging)
Conclusion
The applicability of version control as a database seems to me very powerful on one hand and very limited and dangerous in another. It is very inspiring for auditing and error correction purposes. But my main concern would be scale and reliability.
It seems like you want version control for your data rather than the database schema. I could find two databases that implement most of the version control features such as fork, clone, branch, merge, push, and pull:
https://github.com/dolthub/dolt - SQL based
https://github.com/terminusdb/terminusdb - graph based
You mentioned Subversion, which is a Centralized Version Control System. But let us focus on Git, because of reasons. Git is a Decentralized Version Control System. A local copy of a Git repository is the same as a remote copy of the repository, if a remote copy exists at all (services such as GitLab and GitHub provide the remote housing and managing of Git projects). With Git you can have version control in an arbitrary directory in your machine. You can do whatever you are accustomed to doing with SVN, and more, in this arbitrary directory.
What I am getting at, is that you could possibly create per user directories/repositories in your server programmatically, and apply version control in these directories/repositories, keeping a separate repository per user (the specifics of the architecture would be decided later, though, depending on the structure of the user's "work"). Your application would be in charge of adding and removing files on behalf of the user (e.g. Biography, My Sample Project, etc.), editing files, committing the changes, presenting a file history, etc., essentially issuing Git commands. Your application would, thus, interface with the Git repository, exploiting the advanced version control that Git provides. Your database would just make sure that the user is linked to the directory/repository that contains their "work".
To provide a critical analogy, the GitLab project is an open source web-based Git repository manager with wiki and issue tracking features. GitLab is written in Ruby and uses PostgreSQL (preferably). It is a typical (as in Code - Database - Data directories and files) multiuser web-based application. Its purpose is to manage Git repositories. These Git repositories are stored in a designated directory in the server. Part of the code is responsible for accessing the Git repositories that the logged-in user is authorized to access (as the owner or as a collaborator). An interesting use case is of a user editing a file online, which will result in a commit in some branch in some repository. Another interesting use case is of a user checking the history of a file. A final interesting use case is of a user reverting a specific commit. All of these actions are performed online, via a web browser.
To provide an interesting real-world use case, Atlas by O'Reilly is an online platform for publishing-related collaboration using GitLab as the backend.
For Java there is JGit, a lightweight, pure Java library implementing the Git version control system. JGit is used by Eclipse for all actions related to managing Git repositories. Maybe you could look into it. It is an extremely active project, supported by many, Google included.
All of the above make sense, if the "work" you refer to is more than some fields in a database table, which the user will fill in and may later change the values of. For instance, it would make sense for structured text, HTML, etc.
If this "work" is not so large-scale, maybe doing something like what is described above is overkill. In that case, you could employ some of the version control concepts in your database design, such as calculating diffs and applying patches (also in reverse, for viewing past versions / rolling back). Your tables should allow for a tree-like structure, to store the diffs, so you could allow for branches. You could have the active version of a file readily available, as well as the active index (what Git calls HEAD), and navigate to another indexed/hashed/tagged version in the file's history by applying all patches sequentially, if moving forward, or applying patches in reverse, and in the reverse chronological order, if moving backwards. If this "work" is really small-scale, you could even ditch the diff concept, and store the whole version of the "work" in the tree-like structure.
Pure fun.
I was reading many articles about version control systems like SVN, Git and various branching models (feature based, release based and others) but none of them did not seem to fit our project requirements.
We (team) are going to develop a framework, which will be used as core for different applications. So, there will be one framework and more than one different applications built on that framework. Each application will have usual project cycle: builds, releases... Framework itself won't be released but may have tagged different versions. During the development of application, we want to commit some common features to the framework (if we see that feature is great and future applications should have it).
So each application is like a separate branch of framework, but it will never be fully merged back (because it's a separate application) and there is need do some commits to framework (trunk). Some online articles such commits (without merging whole branch to trunk) gives as negative examples, so we are confused.
What version control system and branching model do you recommend for such development cycle?
So each application is like a separate branch of framework, but it
will never be fully merged back (because it's a separate application)
and there is need do some commits to framework (trunk). Some online
articles such commits (without merging whole branch to trunk) gives as
negative examples, so we are confused.
This part scares me a bit. If you are going to have a framework, then you need to take care of it like any other lump of code, and you don't want multiple versions running around for any reason except maintenance of existing releases or work on future releases. So each of your "application" projects can have a branch where they modify the framework as required for the application, but I recommend the framework trunk be updated often so that it evolves in a way that best serves the needs of all of your applications. In general, when branching for code going forward, you want to sync up with the master and put code back into the master as quickly as possible to avoid lots of work handling merges and also give others the benefit of the work.
You should put your framework in a separate area (or repository if you are using a DVCS like git or hg) so that it's distinct and may have its own release cycle if necessary.
The DVCSs are all the rage these days, git and hg being the most popular, so you should look into them. They have different ways of handling branching. Their power lies in the fact that there is no centralized repository so it's more flexible and reliable for larger teams.
I am in an "internal" IT shop and we currently use ClearCase for version management. Our branching strategy is common for this with the main branch being reserved for live code and branching off main for project and hotfix type activities. Each project (and they overlap often) has a branch off main, we don't have multitiered branching.
We get the situation were we have to do merging between integration branches so that the release 4 branch picks up all of the release 3 changes (for example) before release 3 goes live and thus is baselined. And the number of times that a hotfix happens when a project is high and has to be supported.
However, this isn't really going to be possible within the TFS world as we don't want to have to drop to the command line to do baseless merging, however we need to have a highly flexible branching capability - something we have got really used to with ClearCase.
So ideally we want TFS branches to allow us to have a production baseline, to be able to branch off to do short term hotfixes, to be able to branch off to do projects - without actually knowing which of the branches will go live (and thus baselined) first. Having worked through all of the MS documents they all appear to be focused on product type environments - but we are mostly a support and enhancement shop.
I'm looking for recommendations/pointers - I've been a ClearCase admin and can quite happily juggle with branching mentally - but everything I come up with just doesn't look like it will fit with TFS - but this is most probably because my mental process is ClearCase-like and isn't in tune with TFS (yet!)
I haven't much experienced with TFS2010, but considering branches are now first class citizen with TFS2010, one practical solution would be to consider your enhancement as a "product" and create a patch branch accordingly.
I suppose you have read the TFS2010 Branching Guide.
It does include a branching scenario for addressing hot-fixes issues.
(from the "TFS Branching Guide - Scenarios 2010_20100330.pdf" document)