January 2020 - Approxion

“A poor workman blames his tools.”
— anon
The other day, I overheard two developers discussing pros and cons of various version control systems. I only caught this fragment: “… What sucks about Git is that when you look at a merge commit, you can’t really see what changed!”. It wasn’t the first time I heard such complaints. Time to debunk Git merge commits.

I’ll use this repository which contains two merge commits (labelled ‘M’):


M     7b224b1  [master] Merge branch 'add_author'
|\
| o   7fee5d6  [add_author] Added author
o |   41c338a  Indented title
|/
M     fd63230  Merge branch 'add_title'
|\
| o   6be3af1  [add_title] Added title
o |   69a7968  Replace comma with period
|/
o     ddfebcd  [better_formatting] Inserted blank lines
|
o     f15c2b9  Inital version

M 7b224b1 [master] Merge branch 'add_author'

| o 7fee5d6 [add_author] Added author

o | 41c338a Indented title

M fd63230 Merge branch 'add_title'

| o 6be3af1 [add_title] Added title

o | 69a7968 Replace comma with period

o ddfebcd [better_formatting] Inserted blank lines

o f15c2b9 Inital version

In my initial commit I added a file called ‘song_of_the_bell.txt’ which contains the first stanza of an English translation of a famous poem by Friedrich Schiller:


$ git show f15c2b9
...
+ Walled in fast within the earth
+ Stands the form burnt out of clay.
+ This must be the bell’s great birth!
+ Fellows, lend a hand to-day.
+ Sweat must trickle now
+ From the burning brow,

$ git show f15c2b9

...

+ Walled in fast within the earth

+ Stands the form burnt out of clay.

+ This must be the bell’s great birth!

+ Fellows, lend a hand to-day.

+ Sweat must trickle now

+ From the burning brow,

The next commit (ddfebcd) just added blank lines after every other line and was done on a branch called ‘better_formatting’. You don’t see the branch ‘better_formatting’ (graphically as a branch) because the merge to ‘master’ was a so-called fast-forward merge.

Then, I made a change on ‘master’ that replaced the comma on the last line with a period:


$ git show 69a7968
...
@@ -5,4 +5,4 @@ This must be the bell’s great birth!
 Fellows, lend a hand to-day.

 Sweat must trickle now
-From the burning brow,
+From the burning brow.

$ git show 69a7968

...

@@ -5,4 +5,4 @@ This must be the bell’s great birth!

Fellows, lend a hand to-day.

Sweat must trickle now

-From the burning brow,

+From the burning brow.

and a concurrent change on branch ‘add_title’:


$ git show 6be3af1
...
@@ -1,3 +1,5 @@
+SONG OF THE BELL
+
 Walled in fast within the earth
 Stands the form burnt out of clay.

$ git show 6be3af1

...

@@ -1,3 +1,5 @@

+SONG OF THE BELL

Walled in fast within the earth

Stands the form burnt out of clay.

Afterwards, ‘add_title’ was merged into ‘master’. Since there were modifications on both branches, a fast-forward merge was not possible, so there’s an explicit merge commit (fd63230). What do you think you’ll see when you show this merge commit?

Nothing, or rather, not much!


$ git show fd63230
commit fd6323029cf0b3aa380d013cbc6305db0e029687
Merge: 69a7968 6be3af1

    Merge branch 'add_title'

$ git show fd63230

commit fd6323029cf0b3aa380d013cbc6305db0e029687

Merge: 69a7968 6be3af1

Merge branch 'add_title'

You don’t see the typical diff-like line changes and that’s what the developers lamented about. Other version control systems would give them what they want, namely the delta between the merge commit and the previous commit on ‘master’. This would allow them to easily figure out what was changed on ‘master’. Git can do it as well, but for merge commits you have to be explicit, ‘git show’ doesn’t cut it:


$ git diff 69a7968 f6d3230      # variant 1
$ git diff f6d3230^ f6d3230     # variant 2
$ git diff fd63230^-            # variant 3

...
@@ -1,3 +1,5 @@
+SONG OF THE BELL
+
 Walled in fast within the earth
 Stands the form burnt out of clay.

$ git diff 69a7968 f6d3230 # variant 1

$ git diff f6d3230^ f6d3230 # variant 2

$ git diff fd63230^- # variant 3

...

@@ -1,3 +1,5 @@

+SONG OF THE BELL

Walled in fast within the earth

Stands the form burnt out of clay.

fd63230^- is a shortcut and translates to “show the difference between the predecessor of commit fd63230 and the commit fd63230 itself”. In general, ^- is short for hash^-n where n defaults to 1 (the first parent, aka. merge-to parent). You can show the delta between the second (merge-from) parent and the merge commit like this:


$ git diff fd63230^-2

$ git diff fd63230^-2

There’s a reason why a regular ‘git show’ doesn’t show much. In order to understand, we need to talk about combined diffs first. Combined diffs show the delta between the merge commit and the merge commit’s both parents in a single diff. Let’s produce a combined diff for our merge commit by using the -c option:


$ git show -c fd63230
commit fd6323029cf0b3aa380d013cbc6305db0e029687
Merge: 69a7968 6be3af1
...
diff --combined song_of_the_bell.txt
index c26b26c,a52b0bf..14147a6
--- a/song_of_the_bell.txt
+++ b/song_of_the_bell.txt
@@@ -1,3 -1,5 +1,5 @@@
+ SONG OF THE BELL
+
  Walled in fast within the earth
  Stands the form burnt out of clay.

@@@ -5,4 -7,4 +7,4 @@@ This must be the bell’s great birth
  Fellows, lend a hand to-day.

  Sweat must trickle now
 -From the burning brow,
 +From the burning brow.

$ git show -c fd63230

commit fd6323029cf0b3aa380d013cbc6305db0e029687

Merge: 69a7968 6be3af1

...

diff --combined song_of_the_bell.txt

index c26b26c,a52b0bf..14147a6

--- a/song_of_the_bell.txt

+++ b/song_of_the_bell.txt

@@@ -1,3 -1,5 +1,5 @@@

+ SONG OF THE BELL

Walled in fast within the earth

Stands the form burnt out of clay.

@@@ -5,4 -7,4 +7,4 @@@ This must be the bell’s great birth

Fellows, lend a hand to-day.

Sweat must trickle now

-From the burning brow,

+From the burning brow.

First of all, notice the line containing “Merge: 69a7968 6be3af1”: The first hash is the hash of the first parent of the merge commit (aka. the merge-to parent, fd63230^1), while the second hash belongs to the second parent of the merge commit (aka. the merge-from parent, fd63230^2). Next comes the diff output, in which the first column is used to show the delta between the merge commit and the first parent, whereas the second column is used to display the delta between the merge commit and the second parent:


+ SONG OF THE BELL
+

+ SONG OF THE BELL

The +/- markers are in the first column (i. e. markers are not indented) which means that “SONG OF THE BELL” and a single blank line were added to the first parent (on master). This change must have come from the merge-from commit (the branch ‘add_title’). Conversely,


 -From the burning brow,
 +From the burning brow.

-From the burning brow,

+From the burning brow.

shows the difference between the second parent (on ‘add_title’) and the merge commit, since the +/- markers are in the second column (i. e. markers are indented by one space). These are the changes that were done on master.

There’s also a –cc option (think “compact combined”) that gives an even tighter output than -c in that it only shows modifications that occur on both parents in the same lines. In other words, it’s a combined diff showing only merge conflicts. Since the changes in the merge commit fd63230 are non-conflicting (they’re on different lines), –cc produces no diff at all:


$ git show --cc fd63230
commit fd6323029cf0b3aa380d013cbc6305db0e029687
Merge: 69a7968 6be3af1
 
    Merge branch 'add_title'

$ git show --cc fd63230

commit fd6323029cf0b3aa380d013cbc6305db0e029687

Merge: 69a7968 6be3af1

Merge branch 'add_title'

Wait a minute! Isn’t this the same output that we got above when we executed a plain ‘git show fd63230’ without the –cc option? Precisely! When showing merge conflicts, ‘git show’ defaults to “compact combined format”, which displays only conflicts. That’s why most merge commits are empty and that’s why there’s so much whining. On the other hand, this little feature makes the life of an integrator much easier, as (s)he can focus on the parts of a merge commit that are criticial: conflicts.

Now let’s take a look at the other merge commit, the one at the top of the history:


$ git show 7b224b1
commit 7b224b1c15b583369fd939ff83167a92fbc586ad (HEAD -> master)
Merge: 41c338a 7fee5d6
...
diff --cc song_of_the_bell.txt
index dde3e9d,c50a1c9..d0e5bd1
--- a/song_of_the_bell.txt
+++ b/song_of_the_bell.txt
@@@ -1,4 -1,5 +1,5 @@@
 -SONG OF THE BELL
 -(Friedrich Schiller)
 +        SONG OF THE BELL
++      (Friedrich Schiller)

  Walled in fast within the earth
  Stands the form burnt out of clay.

$ git show 7b224b1

commit 7b224b1c15b583369fd939ff83167a92fbc586ad (HEAD -> master)

Merge: 41c338a 7fee5d6

...

diff --cc song_of_the_bell.txt

index dde3e9d,c50a1c9..d0e5bd1

--- a/song_of_the_bell.txt

+++ b/song_of_the_bell.txt

@@@ -1,4 -1,5 +1,5 @@@

-SONG OF THE BELL

-(Friedrich Schiller)

+ SONG OF THE BELL

++ (Friedrich Schiller)

Walled in fast within the earth

Stands the form burnt out of clay.

Here you do have some output, which means there was a conflict. Again, the first column shows what changed between the first parent (merge-to parent) and the merged version, which is just the addition of the author name “Friedrich Schiller”. Obviously, this change originated from the ‘add_author’ branch. The second column shows what has changed between the second parent (merge-from parent) and the merged version. Clearly, the title “SONG OF THE BELL” was indented on ‘master’. But why is the author name “Friedrich Schiller” marked as a change in the second column as well? It shouldn’t appear, because this is the change that was done in the merge-from parent, right? As always, Git is right, it should. During the merge, as part of the conflict resolution, the author name “Friedrich Schiller” was indented (in the spirit of the change in the merge-to parent which indented the title). It’s this indentation that has changed in the merge commit compared to the merge-from parent.

Understanding combined diffs definitely takes a little getting used to. That’s the reason why most people only care about what changed between the merge-to commit and the merge commit. You already know how to obtain these changes painlessly:


$ git diff 7b224b1^-
...
@@ -1,4 +1,5 @@
         SONG OF THE BELL
+      (Friedrich Schiller)

 Walled in fast within the earth
 Stands the form burnt out of clay.

$ git diff 7b224b1^-

...

@@ -1,4 +1,5 @@

SONG OF THE BELL

+ (Friedrich Schiller)

Walled in fast within the earth

Stands the form burnt out of clay.

“The Limited Circle Is Pure”
— Franz Kafka

In systems programming, ring buffers are ubiquitous. Like regular queues, they’re first-in first-out data structures but contrary to classic queues they’re fixed in size and don’t grow dynamically. This is especially important in real-time contexts where time and space determinism is paramount.

In today’s circular episode I want to show you my go-to ring buffer implementation. I like it, because it’s short and decently efficient. I deliberately used only basic C++ syntax, so it should compile fine with even compilers from the previous millennium (i. e. C++98):


template<typename T, int N>
class ring_buffer {
public:
    ring_buffer() { clear(); }
    size_t capacity() const { return N; }
    bool empty() const { return head_ == tail_; }
    size_t size() const {
        return head_ >= tail_ ?
            head_ - tail_ :
            BUFSIZE - (tail_ - head_);
    }
    void add(const T& item) {
        buffer_[head_] = item;
        advance(head_);
        if (head_ == tail_) {
            advance(tail_); // Drop oldest entry, keep rest.
        }
    }
    const T& remove() {
        assert(!empty());
        size_t old_tail = tail_;
        advance(tail_);
        return buffer_[old_tail];
    }
    void clear() { tail_ = head_ = 0U; }

private:
    static const size_t BUFSIZE = N + 1U;
    void advance(size_t& value) { value = (value + 1) % BUFSIZE; }

    T buffer_[BUFSIZE];
    size_t head_;
    size_t tail_;
};

template<typename T, int N>

class ring_buffer {

public:

ring_buffer() { clear(); }

size_t capacity() const { return N; }

bool empty() const { return head_ == tail_; }

size_t size() const {

return head_ >= tail_ ?

head_ - tail_ :

BUFSIZE - (tail_ - head_);

}

void add(const T& item) {

buffer_[head_] = item;

advance(head_);

if (head_ == tail_) {

advance(tail_); // Drop oldest entry, keep rest.

}

const T& remove() {

assert(!empty());

size_t old_tail = tail_;

advance(tail_);

return buffer_[old_tail];

}

void clear() { tail_ = head_ = 0U; }

private:

static const size_t BUFSIZE = N + 1U;

void advance(size_t& value) { value = (value + 1) % BUFSIZE; }

T buffer_[BUFSIZE];

size_t head_;

size_t tail_;

};

You can specify the data type of ring buffer elements via template parameter T and the total number of elements to track via template parameter N, respectively. Here’s a little usage scenario:


ring_buffer<int, 10> rb;
assert(rb.capacity() == 10);

rb.add(123);
rb.add(42);
rb.add(23);
assert(rb.size() == 3);

while (!rb.empty()) {
    cout << rb.remove() << endl;
}
assert(rb.size() == 0);

ring_buffer<int, 10> rb;

assert(rb.capacity() == 10);

rb.add(123);

rb.add(42);

rb.add(23);

assert(rb.size() == 3);

while (!rb.empty()) {

cout << rb.remove() << endl;

}

assert(rb.size() == 0);

A common problem that you face when implementing a ring buffer is discriminating between empty and full states, as in both cases the head and tail index point to the same location. I solved it by allocating one extra element, which might waste a little bit of space but makes the code easier on the eye and probably faster compared to the alternative approach which requires you have to maintain an extra boolean flag.

When the buffer becomes full during an add operation (head == tail), the tail index is immediately advanced by one, thus dropping the oldest element. As this all happens within the add method, from a ring_buffer user’s point of view the head index is only ever equal to the tail index when the ring buffer is empty.

Contemporary compilers will replace the potentially costly modulo operation in the advance helper method with an AND operation for buffer sizes that are base-2 numbers. However, keep in mind that the advance method uses the internal buffer size (BUFSIZE), which is one greater than the requested ring buffer size (N), so this is most likely an efficient ring buffer:


ring_buffer<double, 15> rb;     // Internal buffer has 16 entries.
                                // index % 16 reduced to index & 15.

ring_buffer<double, 15> rb; // Internal buffer has 16 entries.

// index % 16 reduced to index & 15.

while this isn’t:


ring_buffer<double, 16> rb;     // Internal buffer with 17 entries.

ring_buffer<double, 16> rb; // Internal buffer with 17 entries.

[Update 2020-05-29: Part VIII of this series shows how to optimize/simplify ring_buffer::size().]

More circular adventures…

Approxion

Code – People – Everything

Monthly Archives: January 2020

Grokking Git Merge Commits And Combined Diffs

Circular Adventures VII: A Ring Buffer Implementation