Grokking Git Merge Commits And Combined Diffs

“A poor workman blames his tools.”
— anon

The other day, I overheard two developers discussing pros and cons of various version control systems. I only caught this fragment: “… What sucks about Git is that when you look at a merge commit, you can’t really see what changed!”. It wasn’t the first time I heard such complaints. Time to debunk Git merge commits.

I’ll use this repository which contains two merge commits (labelled ‘M’):

In my initial commit I added a file called ‘song_of_the_bell.txt’ which contains the first stanza of an English translation of a famous poem by Friedrich Schiller:

The next commit (ddfebcd) just added blank lines after every other line and was done on a branch called ‘better_formatting’. You don’t see the branch ‘better_formatting’ (graphically as a branch) because the merge to ‘master’ was a so-called fast-forward merge.

Then, I made a change on ‘master’ that replaced the comma on the last line with a period:

and a concurrent change on branch ‘add_title’:

Afterwards, ‘add_title’ was merged into ‘master’. Since there were modifications on both branches, a fast-forward merge was not possible, so there’s an explicit merge commit (fd63230). What do you think you’ll see when you show this merge commit?

Nothing, or rather, not much!

You don’t see the typical diff-like line changes and that’s what the developers lamented about. Other version control systems would give them what they want, namely the delta between the merge commit and the previous commit on ‘master’. This would allow them to easily figure out what was changed on ‘master’. Git can do it as well, but for merge commits you have to be explicit, ‘git show’ doesn’t cut it:

fd63230^- is a shortcut and translates to “show the difference between the predecessor of commit fd63230 and the commit fd63230 itself”. In general, ^- is short for hash^-n where n defaults to 1 (the first parent, aka. merge-to parent). You can show the delta between the second (merge-from) parent and the merge commit like this:

There’s a reason why a regular ‘git show’ doesn’t show much. In order to understand, we need to talk about combined diffs first. Combined diffs show the delta between the merge commit and the merge commit’s both parents in a single diff. Let’s produce a combined diff for our merge commit by using the -c option:

First of all, notice the line containing “Merge: 69a7968 6be3af1”: The first hash is the hash of the first parent of the merge commit (aka. the merge-to parent, fd63230^1), while the second hash belongs to the second parent of the merge commit (aka. the merge-from parent, fd63230^2). Next comes the diff output, in which the first column is used to show the delta between the merge commit and the first parent, whereas the second column is used to display the delta between the merge commit and the second parent:

The +/- markers are in the first column (i. e. markers are not indented) which means that “SONG OF THE BELL” and a single blank line were added to the first parent (on master). This change must have come from the merge-from commit (the branch ‘add_title’). Conversely,

shows the difference between the second parent (on ‘add_title’) and the merge commit, since the +/- markers are in the second column (i. e. markers are indented by one space). These are the changes that were done on master.

There’s also a –cc option (think “compact combined”) that gives an even tighter output than -c in that it only shows modifications that occur on both parents in the same lines. In other words, it’s a combined diff showing only merge conflicts. Since the changes in the merge commit fd63230 are non-conflicting (they’re on different lines), –cc produces no diff at all:

Wait a minute! Isn’t this the same output that we got above when we executed a plain ‘git show fd63230’ without the –cc option? Precisely! When showing merge conflicts, ‘git show’ defaults to “compact combined format”, which displays only conflicts. That’s why most merge commits are empty and that’s why there’s so much whining. On the other hand, this little feature makes the life of an integrator much easier, as (s)he can focus on the parts of a merge commit that are criticial: conflicts.

Now let’s take a look at the other merge commit, the one at the top of the history:

Here you do have some output, which means there was a conflict. Again, the first column shows what changed between the first parent (merge-to parent) and the merged version, which is just the addition of the author name “Friedrich Schiller”. Obviously, this change originated from the ‘add_author’ branch. The second column shows what has changed between the second parent (merge-from parent) and the merged version. Clearly, the title “SONG OF THE BELL” was indented on ‘master’. But why is the author name “Friedrich Schiller” marked as a change in the second column as well? It shouldn’t appear, because this is the change that was done in the merge-from parent, right? As always, Git is right, it should. During the merge, as part of the conflict resolution, the author name “Friedrich Schiller” was indented (in the spirit of the change in the merge-to parent which indented the title). It’s this indentation that has changed in the merge commit compared to the merge-from parent.

Understanding combined diffs definitely takes a little getting used to. That’s the reason why most people only care about what changed between the merge-to commit and the merge commit. You already know how to obtain these changes painlessly: