Circular Adventures I: The Modulo Operation

“All things from eternity are of like forms and come round in a circle”
–Marcus Aurelius, AD 121-180

In computer science, just like in real life, many things go round in circles. There are ring buffers, circular lists, wrap-around counters and timers, just to name a few. In this multi-part article, I want to explore the theory behind circular behavior. Equipped with this knowledge, I attempt to provide solutions to “recurring” problems.

Let’s start with the very basics; that is, the movement of an index within a ring buffer.

Assume you have a ring buffer containing N = 10 elements and an index i:


0 1 2 3 4 5 6 7 8 9
      ^
      i

0 1 2 3 4 5 6 7 8 9

(For simplicity, I depict circular structures as a flat arrays — just imagine the first and last element are joined to form a ring.)

Question: How do you advance the index by a signed offset n, taking wrap-around into account?
Answer: i = (i + n) mod N

Many circular problems like ring buffer operations can be elegantly expressed by using the ‘mod’ operator. Alas, only theoretically, as in practice, there are at least two problems surrounding the ‘mod’ operator: portability and efficiency.

The ‘mod’ operator, as it is used in the equation above, is the mathematical ‘mod’ operator — the one that is used in number theory. Programming languages, however, sport many different flavors of the ‘mod’ operator, which vary in the way they treat negative operands.

All of them obey this equation:


a mod N = a - ( a div N ) * N

a mod N = a - ( a div N ) * N

Yet the results for negative operands depend on whether the ‘div’ operator rounds towards zero or negative infinity. If the ‘div’ operator rounds towards zero, the expression -5 div 2 yields -2; if it rounds towards negative infinity the result is -3.

The following example illustrates how this influences the calculation of -3 mod 8:


                          Round towards
                        zero  neg infinity
-3 div 8                  0      -1
(-3 div 8) * 8            0      -8
-3 - ((-3 div 8) * 8)    -3       5

Round towards

zero neg infinity

-3 div 8 0 -1

(-3 div 8) * 8 0 -8

-3 - ((-3 div 8) * 8) -3 5

This means that -3 mod 8 might be -3 or +5 depending on the style the implementers of your programming language have chosen. Some languages (like Ada) have even two modulo operators (rem and mod), while others allow to control the behavior at run-time (Perl: ‘use integer’). Still others (like C90 and C++98) leave it as ‘implementation-defined’. Have a look at this for a nice overview on how different programming languages implement the modulo operator.

(And just in case you haven’t guessed it already: C/C++/Java’s approximation of ‘mod’ is ‘%’ and ‘div’ is ‘/’.)

Now, if you only travel through a circle in positive direction (by adding positive offsets), either rounding style will do; however, if you intend to travel backwards (or calculate differences between indices that yield negative values), only the ‘negative-infinity’ modulus operator will do the job; that is, wrap (in the 10 element ring buffer example) from -1, -2, -3 … to 9, 8, 7 …, respectively.

How about efficiency? As can be seen in the table above, the calculation of the remainder is based on a division operation, which is, — alas — a rather expensive operation on many platforms. On most processors, the ‘div’ operation is many times slower than other primitive operations. As an example, some popular ARM processors don’t even have a ‘div’ instruction, so division is done through a software library which consumes up to 40 cycles. Compare this to the 5 cycle multiplication instruction and the other instructions that typically execute in a single cycle.

Due to these portability and efficiency issues, many developers shun the modulo operator in performance-critical code. Instead of


i = (i + n) mod N

i = (i + n) mod N

they use the computationally equivalent


i = i + n
while i >= N
    i = i - N
while i < 0
    i = i + N

i = i + n

while i >= N

i = i - N

while i < 0

i = i + N

or, if it guaranteed that our index is not more than N – 1 off the ends


if i >= N
    i = i - N
if i < 0
    i = i + N

if i >= N

i = i - N

if i < 0

i = i + N

But this approach is still not very efficient, since it uses branches and on modern multi-stage pipelined processors a branch might invalidate already prefetched and preprocessed instructions. Isn’t there a true mathematical ‘mod’ operator out there that is also efficient at the same time? You bet there is, but only if N happens to be a base-2 number.

If you are lucky and N is a base-2 number (like 64, 1024, or 4096) i mod N is computationally equivalent to


i and (N - 1)

i and (N - 1)

where ‘and’ is the binary and operator (‘&’ in C, C++, and Java). This works even if i is negative, but requires that your environment stores negative numbers in two’s complement fashion, which is the case for pretty much all systems that you will ever program for.

As an example, consider -2 mod 16. -2 is 0xFFFFFFFE in 32-bit two’s complement notation and 16 – 1 corresponds to 0x0000000F:


  0xFFFFFFFE
& 0x0000000F
  ----------
  0x0000000E

0xFFFFFFFE

& 0x0000000F

----------

0x0000000E

which is 14, exactly, what a mathematical ‘mod’ operator would yield.

The ‘and’ operation is easy to compute for any processor. In C/C++ we might define an optimized ‘mod’ function like this:


inline int mod_base2(int dividend, int divisor) {
    // Ensure divisor is positive.
    assert(divisor >= 0);
    // Ensure divisor is a base-2 number.
    assert((divisor & (divisor - 1)) == 0);
    return dividend & (divisor - 1);
}

inline int mod_base2(int dividend, int divisor) {

// Ensure divisor is positive.

assert(divisor >= 0);

// Ensure divisor is a base-2 number.

assert((divisor & (divisor - 1)) == 0);

return dividend & (divisor - 1);

}

It is a good idea to use this optimization whenever possible. Sometimes, it makes even sense to round-up the size of circular structures to a base-2 boundary, just to be able to use this kind of optimization.

A variant of this theme is casting a (potentially negative) value to an unsigned type in C/C++. Casting x to an uint8_t is equivalent to calculating x mod 256. While most optimizing compilers will generate the same code for x & 0xFF and (uint8_t)x, there is a certain likelihood that the latter might be a bit faster. The obvious disadvantage of casting to unsigned is, that this approach practically limits N to the value range of uint8_t, uint16_t, and uint32_t, which is 256, 65536, and 4294967296, respectively. Because of the fact that the performance gain in only hypothetical, it is usually much wiser to go for the ‘mod_base2’ optimization, though.

This concludes my first installment, which is mainly about some of the many facets of the ‘mod’ operator. Just like the constant PI appears in all circular problems in mathematics, some variant of the ‘mod’ operator appears in all circular problems in computer science. Next time, I will explore what it means to have two indices into a circular structure, which turns out to be the foundation of many interesting circular use cases.

More circular adventures…

Approxion

Code – People – Everything

Circular Adventures I: The Modulo Operation