Version control, also known as revision control or source control, is an integral part of software development. Like chess, it is easy to learn the basic principles but takes a lifetime to master. Many teams end up in an impenetrable jungle of branches and merging.
Having a well thought-out branching strategy is crucial. This article covers the basics of branching, and suggests a branching strategy to use: branch by purpose.
What Is a Branch?
Merging: What Goes Up Must Come Down
Version control is important of several reasons. It allows sharing of the codebase within and between teams. It gives the ability to go back to earlier versions associated with specific releases, etc. It enables parallel development. And so on.
Version control, used properly, can also help keeping code tidy. For example, instead of commenting out blocks of code that “might come in handy” code can safely be deleted and brought back from the repository if need be.
Code that is commented out is only confusing for others that read the code, and others won’t dare to delete it. Clean it up!
As a project grows in complexity, the importance of maintaining control of software versions increases as well.
What a branch is technically depends on the version control software being used. Generally speaking, when branching, two separate codelines are created. This allows developers to work on different versions of the code simultaneously. Or, as Wikipedia puts it:
Branching, in revision control and software configuration management, is the duplication of an object under revision control (such as a source code file, or a directory tree) so that modifications can happen in parallel along both branches.
You always start out with a single master branch, also called trunk. From this multiple child branches can be created. These, in turn, can have their own children.
In most version control systems the codebase can also be tagged. The technical meaning of this also varies between products. For example, Subversion makes no difference between branches and tags.
Conceptually, a tag is just a label connected to a specific commit. Tags make it easy to bookmark and find points in the code’s history, for example releases.
Branches are depicted in branch diagrams. The figure below illustrates such a diagram with two branches (named “trunk” and “branch”) and a tag (named “tag” – imagine that). The diagram also shows a merge (the green dashed arrow) from “branch” to “trunk”.
Branches can be created for several reasons. For example when a team works on a specific feature, or when the code enters a release cycle.
I divide branches into two main categories: Temporary and permanent. The former is temporary in the sense that it will eventually be merged back to the parent branch and then deleted. The latter, though permanent, is often involved in merge operations as well.
Merging is the process of integrating two branches. The changes from one branch are incorporated into another branch.
While it is easy to create a branch (maybe too easy sometimes), merging on the other hand, can be tricky, cause headaches, and leave the resulting codebase in a volatile state. People can even get afraid of, and postpone, merging, so called “merge paranoia”. To quote Martin Fowler, one of the persons behind the agile manifesto, in his article about continuous integration:
One of the features of version control systems is that they allow you to create multiple branches, to handle different streams of development. This is a useful, nay essential, feature - but it's frequently overused and gets people into trouble. Keep your use of branches to a minimum.
As he points out, creating branches is an essential feature of version control systems, but you shouldn’t overuse it. On the other hand, some people may just accuse Mr. Fowler of suffering from a bad case of merge paranoia :-) Nevertheless, don’t create branches unnecessarily.
Orphan branches and branches that are not properly maintained will sooner or later cause problems.
One way to ease merging operations is to merge often. For example, in a temporary branch as the one below; merge regularly from the parent branch to the child. Then, right before assimilating the child branch (we are the Borg!); do a final merge to the child branch. In other words, the integration should be done in the child branch. You probably don’t want to break the parent codeline.
As mentioned, branches are created for many reasons. For example:
- The code enters a release cycle.
- A team is working on a feature that will take a long time to finish.
- Custom changes for a customer.
- To fix a bug.
- Personal branches to work on different problems.
- Experimental code and to try out ideas
In my opinion, these branches are not always justified and you risk ending up in “branch mania”. For example, it is better to incorporate customer specific changes in the main codeline. If you feel you can’t – the argument often being time related – you’re going down a dangerous path. But that’s a whole other story.
While the examples above may need to be covered, I wouldn’t call them branching strategies in their own rights. I prefer using the term branching strategy for the overall picture:
- Rules for creating and maintaining branches.
- Types of branches used.
- Naming conventions.
- Best practices and guidelines.
The branching strategy should contain information on how to handle releases, when it is appropriate to create temporary codelines and the rules for committing code. For example, in some branches (such as the branch used for normal development) it is ok to commit code as long as it builds and is tested while other branches may only allow approved changes.
I think a good branching strategy should have at least these three characteristics:
- Make life as easy as possible for users.
- Keep branches to a minimum.
- Ensure parallel development is supported.
I will now briefly describe two strategies for release branching: branch by release and branch by purpose, of which I advocate the latter. Partly because it works well with agile development. Feel free to disagree and convince me why some other strategy is better :-) Both models work in conjunction with other temporary branches, and a branching strategy needs to describe these types as well to be complete. However, I will not cover that here.
Branch by Release
In this model, a new branch is created for the next release (see figure below) while the parent branch follows the lifecycle of the current release. Development continues in the new branch, while the old branch contains the released version. This means developers need to switch branch when a new version of the software is released.
When a maintenance release is required, bugs are fixed in the branch belonging to the release in question. These fixes then need to be propagated to subsequent branches.
Another important point is that all checked out code needs to be committed before the codebase is branched off – otherwise developers risk committing code to a branch that has changed purpose and is now staging area for a release. (This is because you have to commit code to the branch from where it was checked out.) Not very user friendly in my opinion.
Branch by Purpose
With this model, the purpose of a branch never changes. Instead a new branch is created to host changed requirements (see figure below).
One of the big advantages of this strategy is that normal development is always done in the master branch (trunk). This makes life easy for developers. It also works very well with continuous integration. Martin Fowler again:
In particular have a mainline: a single branch of the project currently under development. Pretty much everyone should work off this mainline most of the time. (Reasonable branches are bug fixes of prior production releases and temporary experiments.)
When using branch by purpose, it may be necessary to introduce a feature freeze period during which no new functionality is added. Only fixes and unfinished work are done after this point. When the code is deemed ready for release a branch is created and handed over to release management.
To support parallel development, a bridge branch may be created where teams can start working on the next release (see figure below).
Another solution is to branch off immediately when all features are done. This, however, somewhat breaks the branch by purpose paradigm, as it means fixes that could be considered part of normal development are carried out in the release candidate branch. If there are a lot of fixes initially, this also means a lot of merging back down to the master branch (see figure below).
Mastering version control management is important to successfully develop code together. Outlining how to manage branches – and choosing a strategy that aids users and facilitates parallel development – becomes increasingly important as a project’s complexity grows.
Finally, branch by purpose is a great model that fulfills the requirements mentioned. Normal development is always done in the same branch. Parallel development is supported across release cycles.
And remember, merge often and merge well! Don’t create branches unless you are willing to maintain them.