Category Archives: C/C++/Embedded

Grokking Integer Overflow

Even experienced C/C++ programmers often mix-up the terms integer overflow and wrap-around. Likewise, they are confused about the ramifications. This post attempts to clear things up a little.

OVERFLOW AND UNDERFLOW

In C (and C++), integer types (like the signed and unsigned versions of char, short, and int) have a fixed bit-size. Due to this fact, integer types can only support certain value ranges. For unsigned int, this range is 0 to UINT_MAX, for (signed) int INT_MIN to INT_MAX. On a typical platform, these constants have the following values:


UINT_MAX    +2^32     (+4294967296)
INT_MIN     -2^31     (-2147483648)
INT_MAX     +2^31 - 1 (+2147483647)

UINT_MAX +2^32 (+4294967296)

INT_MIN -2^31 (-2147483648)

INT_MAX +2^31 - 1 (+2147483647)

The actual values depend on many factors, for instance, the native word size of a platform and whether 2’s complement represantation is used for negative values (which is almost universally the case), consult your compiler’s limits.h header file for details.

Overflow happens when an expression’s value is larger than the largest value supported by a type; conversely, underflow occurs if an expression yields a value that it is smaller than the smallest value representable by a type. For instance:


unsigned int ui = UINT_MAX;
++ui;   // Overflow.
ui = 0;
--ui;   // Underflow.

unsigned int ui = UINT_MAX;

++ui; // Overflow.

ui = 0;

--ui; // Underflow.

It’s common among programmers to use the term overflow for both, overrun and underrun of a type’s value range. And so shall I, for the rest of this discussion.

WRAP-AROUND

Now that we know what overflow is, we can tackle the question what happens on overflow. One possibility is what is conventionally referred to as wrap-around. Wrap-around denotes that an integer type behaves like a circle; that is, it has no beginning and no end. If you add one to the largest value, you arrive at the smallest; if you subtract one from the smallest value, you get the largest.

Wrap-around is, however, only one way to handle integer overflow. Other possibilities exist, ranging from saturation (the overflowing value is set to the largest/smallest value and stays there), to raising an exception, to doing whatever an implementation fancies.

AND THE C LANGUAGE SAYS…

If you want to find out how C (and C++) handles integer overflow, you have to take a look at chapter 6.7.5 “Types”, the following sentence in particular:

“A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type”

Which means in plain English:

0. Apparently, there is overflow (as defined above) and C-overflow (as used by the standard). C-overflow is more like an error condition that occurs if overflow behavior is not defined for a type.

1. Unsigned integer types wrap-around on overflow, “reduced modulo the number that is one greater than the largest value” is just a fancy name for it. Thus, unsigned integer overflow is well defined and not called overflow by the language standard.

2. Nothing is explicitly said about signed integer types. There are, however, various hints in the standard that signed integer overflow is undefined, for instance:

3.4.3 Undefined Behavior: An example of undefined behavior is the behavior on integer overflow.
J.2 Undefined behavior: The value of the result of an integer arithmetic or conversion function cannot be represented.

To sum it up: on overflow, unsigned integers wrap-around whereas signed integers “overflow” into the realm of undefined behavior (contrary to Java and C#, BTW, where signed integers are guaranteed to wrap around).

SIGNED OVERFLOW

You might have believed (and observed) that in C signed integers also wrap around. For instance, these asserts will hold on many platforms:


int i = INT_MIN;
assert(--i == INT_MAX);
assert(++i == INT_MIN);

int i = INT_MIN;

assert(--i == INT_MAX);

assert(++i == INT_MIN);

Both asserts hold when I compiled this code on my machine with gcc 7.4; the following only if optimizations are disabled (-O0):


i = INT_MAX;
assert(i + 42 < i);

i = INT_MAX;

assert(i + 42 < i);

From -O2 on, gcc enables the option -fstrict-overflow, which means that it assumes that signed integer expressions cannot overflow. Thus, the expression i + 42 < i is considered false, regardless of the value i. You can control signed integer overflow in gcc, check out the options -fstrict-overflow, -fwrapv, and -ftrapv. For maximum portability, however, you should always stay clear of signed integer overflow and never assume wrap-around.

SIGNED OVERFLOW THAT ISN’T

What about this code? Does this summon up undefined behavior, too? Doesn’t the resulting sum overflow the value range of the short type?


short x = SHRT_MAX;
short y = SHRT_MAX;
short sum = x + y;

short x = SHRT_MAX;

short y = SHRT_MAX;

short sum = x + y;

The short (pun intended!) answer is: it depends!

It depends because before adding x and y, a conforming C compiler promotes both operands to int. Thus, to the compiler, the code looks like this:


int isum = (int)x + (int)y;
short sum = (short)(isum);

int isum = (int)x + (int)y;

short sum = (short)(isum);

Adding two integers that hold a value of SHRT_MAX doesn’t overflow, unless — and that’s why it depends — you are hacking away on an ancient 16-bit platform where sizeof(short) == sizeof(int).

But even on a typical 32- or 64-bit platform, what about the assignment of the large integer result to the short variable sum. This surely overflows, doesn’t it? Doesn’t this yield undefined behavior? The answer in this case is a clear ‘no’. It’s rather ‘implementation specified’. Let’s see.

SIGNED INTEGER TYPE CONVERSIONS

In the previous example, a larger signed type is converted into a smaller signed type. This is what the C99 standard has to say about it:

6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

Otherwise, if the new type is unsigned […]

Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.

What an implementation will choose to do, in practice, is wrap-around.


int i = SHRT_MAX + 1;
short s = i;
assert(s == SHRT_MIN);

int i = SHRT_MAX + 1;

short s = i;

assert(s == SHRT_MIN);

Why does a compiler behaves like this? You can find an explanation by Linus Torvalds himself:

“Bit-for-bit copy of a 2’s complement value. Anything else would be basically impossible for an optimizing compiler to do unless it actively _tried_ to screw the user over.”

To sum it up:

1. Unsigned integers wrap around on overflow. 100 percent guaranteed.
2. Signed integer overflow means undefined behavior. Don’t rely on wrap-around.
3. Type conversions to smaller signed types will very likely wrap-around. Let’s call this “Torvalds-defined behavior”.

Pointers in C, Part VII: Being Relaxed About The Strict Aliasing Rule

“I am free, no matter what rules surround me. If I find them tolerable, I tolerate them; if I find them too obnoxious, I break them. I am free because I know that I alone am morally responsible for everything I do.”
― Robert A. Heinlein

The largely unknown “Strict Aliasing Rule” (SAR) has the potential to send tears to the eyes of even the most seasoned C/C++ developers. Why? Because of it, a lot of the code they have written over the years belongs to the realm of “undefined behavior”.

Despite its name, the term “undefined behavior” itself is well-defined by the C language standard: it’s “behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements“. Which means anything can happen: your program could randomly crash or even send suggestive emails to your boss.

THE PROBLEM

Let’s start with the code snippet that I used in my original post on SAR:


struct measurements_t {
    uint8_t level;
    uint16_t temperature;
    uint32_t force;
};

void convert(const uint8_t* data, struct measurements_t* measurements) {
    /* Fill measurements object with raw data. */
    *measurements = *((struct measurements_t*) &data[0]);
}

struct measurements_t {

uint8_t level;

uint16_t temperature;

uint32_t force;

};

void convert(const uint8_t* data, struct measurements_t* measurements) {

/* Fill measurements object with raw data. */

*measurements = *((struct measurements_t*) &data[0]);

}

Here, data that has been received into a buffer (‘data’) is converted into a high-level data structure (‘measurements’). From the compiler’s point of view, what ‘data’ refers to is just a single ‘uint8_t’ but we access it through a pointer to type ‘struct measurements_t’. What we’ve got here is a clear violation of SAR, which entails undefined behavior.

SAFE ALTERNATIVES

“But, Ralf”, you might respond, “this can’t be true. I write code like this every day and it works flawlessly, even in safety-critical systems like medical devices!”

This doesn’t surprise me in the least. “Undefined behavior” can — get this — also mean “works flawlessly”. But there are no guarantees, whatsoever. It might work on one platform, with a particular compiler or compiler version, but might fail on another platform, or with a different compiler version. Hence, to err on the truly safe side (which you should, especially if you work on safety-critical systems), you should use truly safe alternatives.

One obvious and time-proven approach is to do such so-called type punning through unions. It works by storing data via a member of one type and reading it via another member of a different type:


union receive_buffer_t {
    uint8_t data[BUFSIZE];
    struct measurements_t measurements;
} receive_buffer;

union receive_buffer_t {

uint8_t data[BUFSIZE];

struct measurements_t measurements;

} receive_buffer;

The receiving function would store byte-wise into the ‘receive_buffer.data’ array, while high-level functions would use the ‘receive_buffer.measurements’ member. This will work reliably in any version of C, but it might fail in C++.

Bulletproof type-punning, one that works in both, C and C++, uses ‘memcpy’. ‘memcpy’!? ‘memcpy’, that’s right:


void convert_memcpy(const uint8_t* data, struct measurements_t* measurements) {
    /* Fill measurements object with raw data. */
    memcpy(measurements, &data[0], sizeof(*measurements))
}

void convert_memcpy(const uint8_t* data, struct measurements_t* measurements) {

/* Fill measurements object with raw data. */

memcpy(measurements, &data[0], sizeof(*measurements))

}

Believe it or not, there’s a high probability that your compiler will optimize-out the call to ‘memcpy’. I’ve observed this, among others, with ‘gcc’ and ‘clang’, but I’ve also seen compilers always call ‘memcpy’, even for the smallest amounts of data copied, regardless of the optimization level used (Texas Instruments ARM C/C++ compiler 19.6, for instance). Nevertheless, this is my go-to type-punning technique these days, unless performance is paramount. (You first have to prove that your code really impacts overall performance by profiling. Otherwise, your optimizations are like buying Dwayne Johnson an expensive hair brush — it doesn’t really harm, but it’s not of much use, either.)

BUT I REEELLY, REEELLY MUST USE CASTS

Sometimes, you have to use SAR-breaking casts, if only to maintain social peace in your team. So how likely is it that your compiler will do something obscene?

VERY unlikely, at least in this example. Let me explain.

First of all, compiler vendors know that most developers either haven’t heard about SAR or at least don’t give a foo about it. Therefore, they usually don’t aggressively optimize such instances. This is particularly true for compilers that are part of toolchains used in deeply (bare-metal) embedded systems. However, ‘gcc’ as well as ‘clang’, which are used in all kinds of systems, take advantage of SAR from optimization level 2 on. (You can explicitly disable SAR-related optimizations regardless of the optimization level by passing the ‘-fno-strict-aliasing’ option.)

Second, what ‘convert’ is doing is pretty much well-behaved. Sure, it aliases the ‘data’ and ‘measurements’ pointers, but it never accesses them concurrently. Once the ‘measurements’ pointer has been created, the ‘data’ pointer is not used anymore. If the caller (or the whole call-chain) are equally well-behaved, I don’t see a problem (don’t trust me!).

Third, there’s no aliased read/write access. Even if ‘data’ and ‘measurements’ were used concurrently, it wouldn’t be a problem, as long as both are only used for reading data (don’t trust me on this one, either!). By contrast, this I consider harmful:


uint8_t* data = get_receive_buffer();
if (data[0] == 0) {
    struct measurements_t* measurements = (struct measurements_t*) data;
    measurements.level = 42;   // aliased write.
    assert(data[0] == 42);     // aliased read.
}

uint8_t* data = get_receive_buffer();

if (data[0] == 0) {

struct measurements_t* measurements = (struct measurements_t*) data;

measurements.level = 42; // aliased write.

assert(data[0] == 42); // aliased read.

}

To the compiler ‘data’ and ‘measurements’ are two totally unrelated pointers to unrelated memory areas. The original value of ‘data[0]’ might be cached in a register and not refetched from memory, hence the ‘assert’ might fail. In general, this is what will most likely happen when SAR is violated in contexts where it does matter: instead of suggestive emails being sent to your boss, you are much more likely got get stale values (which of course could lead to crashes later on).

NO PUN INTENDED

Let’s get real about SAR. Here are some relaxed, pragmatic rules on how to deal with the Strict Aliasing Rule:

0. Fully understand SAR
1. Try hard to adhere to SAR
2. Type-pun using ‘memcpy’
3. If you can’t, disable SAR-related compiler optimizations
4. If you can’t, avoid concurrent, aliased read/write access

But don’t assume that just because you didn’t get a ticket for speeding in the past, you will never ever get a ticket for speeding. What you’re doing is against the law. If you get busted someday, don’t whine and don’t complain I didn’t warn you. Rather own your failures and move on.

Breakin’ rocks in the hot sun
I fought the law and the law won
I fought the law and the law won
I USED SOME CASTS FOR TYPE PUN
I fought the law and the law won
I fought the law and the law won

(with apologies to Sonny Curtis)

Bug Hunting Adventures #14: Bitmap [BM]adness (Solution)

It’s a given fact of life that something that’s deemed totally safe in one environment may be totally unsafe in another. Every German who has ever used an American sauna knows what I’m talking about.

Similar (but far less embarrassing!) traps lurk in situations where you reuse perfectly working C++ code in a C environment. Some time ago, I integrated a little home-grown C++ library into a plain C project. However, instead of the expected, proven functionality I got plenty of core dumps. After some assembly-level debugging, I came to the conclusion that I had found a compiler bug. Code along these lines


    p += sizeof('A');
    return *p;

p += sizeof('A');

return *p;

was compiled to this:


    mov rax, QWORD PTR p[rip]
    add rax, 4                 ; 4?
    mov QWORD PTR p[rip], rax
    mov rax, QWORD PTR p[rip]

mov rax, QWORD PTR p[rip]

add rax, 4 ; 4?

mov QWORD PTR p[rip], rax

mov rax, QWORD PTR p[rip]

Why the heck did the compiler insert an offset of 4 instead of 1?

The answer to this question, which is also the answer to our bug hunting adventure, can be found here.

Bug Hunting Adventures #14: Bitmap [BM]adness

“What’s the meaning of goodness if there isn’t a little badness to overcome?”
― Anne Revere

The code below is part of a C graphics processing library, which parses data in the venerable bitmap (BMP) file format. A bitmap file consists of a two parts: a header and the pixel data block. More specifically, a bitmap file is laid-out like this:

Offset	Size	Content
0	1	Character ‘B’
1	1	Character ‘M’
2	4	Size of the bitmap file
6	4	Reserved
10	4	Offset to the first byte of the pixel data (ofs)
14	n	Info block
ofs	m	Pixel data

All multi-byte integer values (like the bitmap file size and the offset to the pixel data) are stored in little-endian format.

The function ‘bmp_pixel_data’ takes a pointer to a bitmap file data and returns a pointer to the bitmap’s pixel data area within the bitmap. The size of the pixel data area is returned via the ‘size’ out parameter. In case the provided bitmap file data is malformed, a NULL pointer is returned and the ‘size’ out parameter is set to zero.

As always, the code compiles cleanly without warnings (at ‘-W -Wall’), but when the function ‘bmp_pixel_data’ was put to use, it failed miserably. Where did the programmer goof?


/* First magic byte. */
#define BMP_MAGIC_BYTE1 'B'
/* Second magic byte. */
#define BMP_MAGIC_BYTE2 'M'
/* Offset of first magic byte. */
#define BMP_MAGIC_BYTE1_OFS 0
/* Offset of second magic byte. */
#define BMP_MAGIC_BYTE2_OFS (BMP_MAGIC_BYTE1_OFS + sizeof(BMP_MAGIC_BYTE1))
/* Offset to 4-byte bitmap file size, little-endian. */
#define BMP_FILE_SIZE_OFS (BMP_MAGIC_BYTE2_OFS + sizeof(BMP_MAGIC_BYTE2))
/* Offset to 4-byte pixel data offset, little-endian. */
#define BMP_OFFSET_OFS (BMP_FILE_SIZE_OFS + sizeof(uint32_t) + sizeof(uint32_t))
/* Offset to bitmap info block. */
#define BMP_OFFSET_INFO_BLOCK (BMP_OFFSET_OFS + sizeof(uint32_t))

static inline uint32_t uint32_from_little_endian(const uint8_t* data) {
    assert(data != NULL);
    return ((data[3] << 24U) + (data[2] << 16U) + (data[1] << 8U) + data[0]);
}

const uint8_t* bmp_pixel_data(const uint8_t* bitmap, uint32_t* size) {
    assert(bitmap != NULL);
    assert(size != NULL);

    const uint8_t* p = NULL;

    if (bitmap[BMP_MAGIC_BYTE1_OFS] == BMP_MAGIC_BYTE1 &&
        bitmap[BMP_MAGIC_BYTE2_OFS] == BMP_MAGIC_BYTE2) {
        uint32_t file_size = 
            uint32_from_little_endian(&bitmap[BMP_FILE_SIZE_OFS]);
        uint32_t offset = 
            uint32_from_little_endian(&bitmap[BMP_OFFSET_OFS]);
        if (offset <= file_size)
        {
            *size = file_size - offset;
            p = &bitmap[BMP_MAGIC_BYTE1_OFS + offset];
        }
    }
    if (p == NULL) {
        *size = 0;
    }
    return p;
}

/* First magic byte. */

#define BMP_MAGIC_BYTE1 'B'

/* Second magic byte. */

#define BMP_MAGIC_BYTE2 'M'

/* Offset of first magic byte. */

#define BMP_MAGIC_BYTE1_OFS 0

/* Offset of second magic byte. */

#define BMP_MAGIC_BYTE2_OFS (BMP_MAGIC_BYTE1_OFS + sizeof(BMP_MAGIC_BYTE1))

/* Offset to 4-byte bitmap file size, little-endian. */

#define BMP_FILE_SIZE_OFS (BMP_MAGIC_BYTE2_OFS + sizeof(BMP_MAGIC_BYTE2))

/* Offset to 4-byte pixel data offset, little-endian. */

#define BMP_OFFSET_OFS (BMP_FILE_SIZE_OFS + sizeof(uint32_t) + sizeof(uint32_t))

/* Offset to bitmap info block. */

#define BMP_OFFSET_INFO_BLOCK (BMP_OFFSET_OFS + sizeof(uint32_t))

static inline uint32_t uint32_from_little_endian(const uint8_t* data) {

assert(data != NULL);

return ((data[3] << 24U) + (data[2] << 16U) + (data[1] << 8U) + data[0]);

}

const uint8_t* bmp_pixel_data(const uint8_t* bitmap, uint32_t* size) {

assert(bitmap != NULL);

assert(size != NULL);

const uint8_t* p = NULL;

if (bitmap[BMP_MAGIC_BYTE1_OFS] == BMP_MAGIC_BYTE1 &&

bitmap[BMP_MAGIC_BYTE2_OFS] == BMP_MAGIC_BYTE2) {

uint32_t file_size =

uint32_from_little_endian(&bitmap[BMP_FILE_SIZE_OFS]);

uint32_t offset =

uint32_from_little_endian(&bitmap[BMP_OFFSET_OFS]);

if (offset <= file_size)

{

*size = file_size - offset;

p = &bitmap[BMP_MAGIC_BYTE1_OFS + offset];

}

if (p == NULL) {

*size = 0;

}

return p;

}

Solution

Bug Hunting Adventures #13: Prime Sums (Solution)

The challenge suffers from what I call a “chain of blunders”, where one blunder leads to another. Here are the exact details, in the traditional format.

The first who got close to the true nature of this bug was reader Shlomo who commented directly on the post, but I held back his comment in order not to spoil the fun for others. (Unfortunately, I couldn’t tell him, because he used a bogus email address—boo!). Christian Hujer, hacker extraordinaire, gave the most precise and extensive account on LinkedIn. While many found the blunder in the Makefile (Joe Nelson was the first), it was apparently such a good smokescreen that many people didn’t look any further. To me, the root blunder that started the chain of blunders is in the C language itself, which should have never allowed implicit zero-initialization of constants in the first place (which was corrected in C++).

Some believed that the preincrement of the loop-counter was the culprit as it would skip the first prime, but that’s not the case. The expression after the second semicolon gets evaluated always at the end of the loop body:


for (...; ...; <e>) {
    <body>
}

for (...; ...; <e>) {

<body>

}

is equivalent to


for (...; ...;) {
    <body>
    <e>;
}

for (...; ...;) {

<body>

<e>;

}

Substitute ++i or i++ for <e> — there’s no difference!

On a general note, guys, please register by entering your email address in the top right corner to ensure that you will get automatic notifications for new posts as soon as they’re published. I also (usually) announce new posts on LinkedIn, but mostly hours if not days later. Nevertheless, connecting with me on LinkedIn is always a good idea and highly encouraged. Your subscriptions, likes, praise, and criticism keep me motivated to carry on, so don’t hold back!

Bug Hunting Adventures #13: Prime Sums

“Why, yes; and not exactly that either. The fact is, we have all been a good deal puzzled because the affair is so simple, and yet baffles us altogether.”
― Edgar Allan Poe, The Purloined Letter

Below, you find a little C project that doesn’t do what it’s supposed to do, namely print the sum of the first 10 prime numbers. The program builds cleanly with gcc and clang; that is, without any warnings even when using -Wextra -Wall -pedantic -ansi as compiler options. It’s well-formed and doesn’t crash.

What’s the root cause of this bug? What’s the output of the program? Here are the files, you can also find them on GitHub:

prime_table.h:


#ifndef PRIME_TABLE_H
#define PRIME_TABLE_H

const unsigned int PRIME_TABLE[10];

#endif

#ifndef PRIME_TABLE_H

#define PRIME_TABLE_H

const unsigned int PRIME_TABLE[10];

#endif

prime_table.c:


#include "prime_table.h"

const unsigned int PRIME_TABLE[10] = {
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29,
};

#include "prime_table.h"

const unsigned int PRIME_TABLE[10] = {

2, 3, 5, 7, 11, 13, 17, 19, 23, 29,

};

prime_sum.c:


#include <stdio.h>
#include "prime_table.h"

#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))

int main(void) {
    unsigned int sum = 0, i;
    for (i = 0; i < ARRAY_SIZE(PRIME_TABLE); ++i) {
        sum += PRIME_TABLE[i];
    }
    printf("%u\n", sum);
    return 0;
}

#include <stdio.h>

#include "prime_table.h"

#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof(arr[0]))

int main(void) {

unsigned int sum = 0, i;

for (i = 0; i < ARRAY_SIZE(PRIME_TABLE); ++i) {

sum += PRIME_TABLE[i];

}

printf("%u\n", sum);

return 0;

}

Makefile:


CFLAGS := -Wextra -Wall -pedantic -ansi

run: prime_sum
	./prime_sum

prime_sum.o : prime_sum.c prime_table.h
prime_table.o : prime_table.c prime_table.h
prime_sum : prime_sum.o prime_table.h

clean:
	rm -rf prime_sum *.o

CFLAGS := -Wextra -Wall -pedantic -ansi

run: prime_sum

./prime_sum

prime_sum.o : prime_sum.c prime_table.h

prime_table.o : prime_table.c prime_table.h

prime_sum : prime_sum.o prime_table.h

clean:

rm -rf prime_sum *.o