For those how don’t know git, it is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. I use git daily, including for this blog. Have a look at Wikipedia for more background.
Although it requires some overhead, it saves a lot of time once you get the hang of it. Why? Because you have the confidence that you can go back to any point in the history of a project. So you can explore new things without risking to ruin everything. The new things don’t work out? Just go back to the last good point in the history and start over.
Each point in the history is called a
commit contains all essential information on what needs to change to recreate the current state starting from the previous
commit. It also contains useful metadata: who created the
commit, when and why1.
Git works great with plain text files like R scripts, RMarkdown files, data in txt or csv format, … You can add binary files (Word, Excel, pdf, jpg, …) to a git project, but not as efficient as plain text files and with less options. In case of a plain text file, git notes which lines in the file are removed and where a line was inserted. A change in a line is a combination of removing the old line and inserting the new line. Have a look a this commit if you want a real life example. Such granular approach is not available for binary files. Hence the old version is removed and the new version is added.
Target audience for this workflow
The workflow is useful for anyone with basic computer skills. The workflow does not use all whistles and bells available in git. Only the minimal functionality which is all accessible via either a graphical user interface (GUI) or a website. We target ecologists who often write R scripts and have no prior knowledge on version control systems.
This workflow seems to work for a team of scientists how work on the same project and have all write access to that project (
repository in git terminology).
repositoriesof git novices.
- Initial start of a
It is no longer valid as soon as more than one user commits to the
The basic workflow is just a simple linear history. The user makes a set of changes and commits those changes. This is repeated over and over until the project is finished. The resulting history will look like fig. 1.
One extra step is at least a daily
push to another machine. This creates (or updates) a copy of the entire project history to that other machine. And thus serves as a backup copy. Therefore this should be done at least daily. The easiest way is to use an on-line service like GitHub, Bitbucket, GitLab, … GitHub is free for public repositories and is popular for freeware open source projects. Bitbucket offers free private repositories but only for small teams (max. 5 users). Having the repository on an on-line platform has another benefit: it is easy to share your work and collaborate.
Branching workflow with pull requests
- Working with several people on the same repository
- More experienced git users
- Commits are only created in
feature branches, not in the
The basic workflow has a single
branch which is called
master. Git makes it easy to create new
branch starts from a specific commit. Each user should create a new
branch when he starts working on a new feature in the repository. Because each user works in his own branch, he is the only one writing to this part of the history. This avoids a lot of conflicts. Fig. 2 illustrates how the history looks like when a few branches are created.
Creating branches is fine, but they diverge the history of the repository. So we need a mechanism to
merge branches together. In this workflow we will work on a feature branch until it is finished. Then we merge it into the master branch. Fig. 3 illustrates the resulting history. This can be done locally using a
merge, but it is safer to do it on-line via a
pull request is a two step procedure. First you create the
pull request by indicating via the webapp which branches you would like to
merge. The second step is to
merge the pull request. Documentation on how to handle
pull requests can be found on the websites of GitHub, Bitbucket and GitLab.
Pull requests have several advantages over local merges
- It works only when the branches are pushed to the on-line copy of the repository. This ensures not only a backup but also gives access to the latest version to your collaborators.
- All pull requests are done against the common (on-line) master branch. Local merges would create diverging master branches which will create a lot of conflicts.
- Since the pull request is a two step procedure, one user can create the pull request and another (e.g. the project leader) can do the actual merge.
- The pull request gives an overview of the aggregated changes of all the commits in the pull request. This makes it easier to get a feeling on what has been changed within the range of the pull request.
- Most on-line tools allow to add comments and reviews to a pull request. This is useful to discuss a feature prior to merging it. In case additional changes are required, the user should update his feature branch. The pull request gets automatically updated.
Conflicts arise when a file is changed at the same location in two different branches and with different changes. Git cannot decide which version is correct and therefore blocks the merging of the pull request. It is up to the user to select the correct version and commit the required changes. See on-line tutorials on how to do this. Once the conflicts are resolved, you can go ahead and merge the pull request. This is illustrated in fig. 3. First
master is merged back into
feature B to handle the merge conflict and then
feature B is merged into
What if I choose the wrong version? Don’t panic, both versions remain in the history so you don’t loose any. So you can create a new branch starting for the latest commit with the correct version and merge that branch.
Here a a few flowcharts that illustrate several components of the branching workflow with pull requests. Fig. 4 illustrates the steps you need when you want to start working on a project. Once you have a local
clone of the repository you can
check out the required feature branch (fig. 5). The last flowchart handles working in a feature branch and merge it when finished (fig. 6).
Rules for collaboration
- Always commit into a feature branch, never in the master branch.
- Always start features branches for the master branch.
- Only work in your own branches.
- Never merge someone else’s pull request without their consent.
- Don’t wait too long for merging a branch. Keep the scope of a feature branch narrow.
Starting branches not from master
In case you want to apply a change to someone else’s branch. Create a new branch starting from the other’s branch, add commits and create a pull request. Ask the branch owner to merge the pull request. Basically you use someone else’s branch as the master branch.
Working with multiple users in the same branch
This is OK as long as users don’t work simultaneously in the branch.
- Person A create the branch
- Person A adds commits
- Person A pushes and notifies person B
- Person B adds commits
- Person B pushes and notifies the next person
- Person A creates a pull request
Assuming that the user entered a sensible commit message.↩