Category Archives: Pointers in C

Pointers in C, Part VII: Being Relaxed About The Strict Aliasing Rule

“I am free, no matter what rules surround me. If I find them tolerable, I tolerate them; if I find them too obnoxious, I break them. I am free because I know that I alone am morally responsible for everything I do.”
― Robert A. Heinlein

The largely unknown “Strict Aliasing Rule” (SAR) has the potential to send tears to the eyes of even the most seasoned C/C++ developers. Why? Because of it, a lot of the code they have written over the years belongs to the realm of “undefined behavior”.

Despite its name, the term “undefined behavior” itself is well-defined by the C language standard: it’s “behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements“. Which means anything can happen: your program could randomly crash or even send suggestive emails to your boss.

THE PROBLEM

Let’s start with the code snippet that I used in my original post on SAR:


struct measurements_t {
    uint8_t level;
    uint16_t temperature;
    uint32_t force;
};

void convert(const uint8_t* data, struct measurements_t* measurements) {
    /* Fill measurements object with raw data. */
    *measurements = *((struct measurements_t*) &data[0]);
}

struct measurements_t {

uint8_t level;

uint16_t temperature;

uint32_t force;

};

void convert(const uint8_t* data, struct measurements_t* measurements) {

/* Fill measurements object with raw data. */

*measurements = *((struct measurements_t*) &data[0]);

}

Here, data that has been received into a buffer (‘data’) is converted into a high-level data structure (‘measurements’). From the compiler’s point of view, what ‘data’ refers to is just a single ‘uint8_t’ but we access it through a pointer to type ‘struct measurements_t’. What we’ve got here is a clear violation of SAR, which entails undefined behavior.

SAFE ALTERNATIVES

“But, Ralf”, you might respond, “this can’t be true. I write code like this every day and it works flawlessly, even in safety-critical systems like medical devices!”

This doesn’t surprise me in the least. “Undefined behavior” can — get this — also mean “works flawlessly”. But there are no guarantees, whatsoever. It might work on one platform, with a particular compiler or compiler version, but might fail on another platform, or with a different compiler version. Hence, to err on the truly safe side (which you should, especially if you work on safety-critical systems), you should use truly safe alternatives.

One obvious and time-proven approach is to do such so-called type punning through unions. It works by storing data via a member of one type and reading it via another member of a different type:


union receive_buffer_t {
    uint8_t data[BUFSIZE];
    struct measurements_t measurements;
} receive_buffer;

union receive_buffer_t {

uint8_t data[BUFSIZE];

struct measurements_t measurements;

} receive_buffer;

The receiving function would store byte-wise into the ‘receive_buffer.data’ array, while high-level functions would use the ‘receive_buffer.measurements’ member. This will work reliably in any version of C, but it might fail in C++.

Bulletproof type-punning, one that works in both, C and C++, uses ‘memcpy’. ‘memcpy’!? ‘memcpy’, that’s right:


void convert_memcpy(const uint8_t* data, struct measurements_t* measurements) {
    /* Fill measurements object with raw data. */
    memcpy(measurements, &data[0], sizeof(*measurements))
}

void convert_memcpy(const uint8_t* data, struct measurements_t* measurements) {

/* Fill measurements object with raw data. */

memcpy(measurements, &data[0], sizeof(*measurements))

}

Believe it or not, there’s a high probability that your compiler will optimize-out the call to ‘memcpy’. I’ve observed this, among others, with ‘gcc’ and ‘clang’, but I’ve also seen compilers always call ‘memcpy’, even for the smallest amounts of data copied, regardless of the optimization level used (Texas Instruments ARM C/C++ compiler 19.6, for instance). Nevertheless, this is my go-to type-punning technique these days, unless performance is paramount. (You first have to prove that your code really impacts overall performance by profiling. Otherwise, your optimizations are like buying Dwayne Johnson an expensive hair brush — it doesn’t really harm, but it’s not of much use, either.)

BUT I REEELLY, REEELLY MUST USE CASTS

Sometimes, you have to use SAR-breaking casts, if only to maintain social peace in your team. So how likely is it that your compiler will do something obscene?

VERY unlikely, at least in this example. Let me explain.

First of all, compiler vendors know that most developers either haven’t heard about SAR or at least don’t give a foo about it. Therefore, they usually don’t aggressively optimize such instances. This is particularly true for compilers that are part of toolchains used in deeply (bare-metal) embedded systems. However, ‘gcc’ as well as ‘clang’, which are used in all kinds of systems, take advantage of SAR from optimization level 2 on. (You can explicitly disable SAR-related optimizations regardless of the optimization level by passing the ‘-fno-strict-aliasing’ option.)

Second, what ‘convert’ is doing is pretty much well-behaved. Sure, it aliases the ‘data’ and ‘measurements’ pointers, but it never accesses them concurrently. Once the ‘measurements’ pointer has been created, the ‘data’ pointer is not used anymore. If the caller (or the whole call-chain) are equally well-behaved, I don’t see a problem (don’t trust me!).

Third, there’s no aliased read/write access. Even if ‘data’ and ‘measurements’ were used concurrently, it wouldn’t be a problem, as long as both are only used for reading data (don’t trust me on this one, either!). By contrast, this I consider harmful:


uint8_t* data = get_receive_buffer();
if (data[0] == 0) {
    struct measurements_t* measurements = (struct measurements_t*) data;
    measurements.level = 42;   // aliased write.
    assert(data[0] == 42);     // aliased read.
}

uint8_t* data = get_receive_buffer();

if (data[0] == 0) {

struct measurements_t* measurements = (struct measurements_t*) data;

measurements.level = 42; // aliased write.

assert(data[0] == 42); // aliased read.

}

To the compiler ‘data’ and ‘measurements’ are two totally unrelated pointers to unrelated memory areas. The original value of ‘data[0]’ might be cached in a register and not refetched from memory, hence the ‘assert’ might fail. In general, this is what will most likely happen when SAR is violated in contexts where it does matter: instead of suggestive emails being sent to your boss, you are much more likely got get stale values (which of course could lead to crashes later on).

NO PUN INTENDED

Let’s get real about SAR. Here are some relaxed, pragmatic rules on how to deal with the Strict Aliasing Rule:

0. Fully understand SAR
1. Try hard to adhere to SAR
2. Type-pun using ‘memcpy’
3. If you can’t, disable SAR-related compiler optimizations
4. If you can’t, avoid concurrent, aliased read/write access

But don’t assume that just because you didn’t get a ticket for speeding in the past, you will never ever get a ticket for speeding. What you’re doing is against the law. If you get busted someday, don’t whine and don’t complain I didn’t warn you. Rather own your failures and move on.

Breakin’ rocks in the hot sun
I fought the law and the law won
I fought the law and the law won
I USED SOME CASTS FOR TYPE PUN
I fought the law and the law won
I fought the law and the law won

(with apologies to Sonny Curtis)

Pointers in C, Part VI: Faking ‘restrict’

Pointers in C

“”But then acting is all about faking. We’re all very good at faking things that we have no competence with.”
— John Cleese

As so often in life, people love to do the exact opposite of what you advise. In my previous post I claimed that the ‘restrict’ feature isn’t that all-important and guess what? I received questions on whether it was possible to achieve the (promised) effects of ‘restrict’ even if a particular C dialect doesn’t support it (in case you are using C++ or some C version predating C99, for example).

YES WE CAN!

We just need to creatively combine the wisdom from the previous “Pointers in C” installments. In particular:

Pointer access involving multiple pointers can be optimized if the pointers point to incompatible types.
structs with different tag names constitute different, incompatible types, regardless of whether the struct members are identical or not.
A pointer can be converted to a pointer to a different type as long as the resulting pointer is suitably aligned for the target type.
A pointer to struct points to its initial member.

Do you ‘C’ it?

SPOILER ALERT!

Again, I use the ‘silly’ example to demonstrate the technique. The idea is to wrap the original types (‘int’) in structs with different tag names thus yielding different (incompatible) types:


struct intx { int value; };
struct inty { int value; };

int silly4(struct intx* x, struct inty* y) {
    x->value = 0;
    y->value = 1;
    return x->value;
}

struct intx { int value; };

struct inty { int value; };

int silly4(struct intx* x, struct inty* y) {

x->value = 0;

y->value = 1;

return x->value;

}

Believe it or not — it works:


gcc -O2 -std=c89 silly4.c -masm=intel -S && cat silly4.s

gcc -O2 -std=c89 silly4.c -masm=intel -S && cat silly4.s


silly4:
    mov     DWORD PTR [rdi], 0
    mov     DWORD PTR [rsi], 1
    xor     eax, eax
    ret

silly4:

mov DWORD PTR [rdi], 0

mov DWORD PTR [rsi], 1

xor eax, eax

ret

Why? To the compiler, the pointer types are different (item 2) and hence it doesn’t assume that they point to the same memory (item 1). Because of item 3 and 4 in the list above, the calling code can still pass plain ‘int’ arrays/pointers, just like in the original code. However, nasty casts are required when calling ‘silly4’ from C++ code:


int a, b;
assert(0 == silly4((struct intx*)(&a), (struct inty*)(&b)))

int a, b;

assert(0 == silly4((struct intx*)(&a), (struct inty*)(&b)))

Contrary to C++, C is only weakly typed so such casts are not necessary officially, unless you want to get rid of compiler warnings, which you always want to, don’t you?

You could argue that these strange ‘struct int’ pointers that are used in the signature of ‘silly4’ kind of document the fact that the passed pointers are “restricted” and mustn’t overlap. However, I prefer to keep the original ‘int*’ interface and hide this struct business from callers altogether:


// silly5.h
#if __STDC_VERSION__ >= 199901L /* >= C99 */
#define RESTRICT restrict
#else /* pre-C99 or C++ */
#define RESTRICT /* just for documentation purposes. */
#endif

// silly5.c
int silly5(int* RESTRICT x, int* RESTRICT y) {
    struct intx { int value; };
    struct inty { int value; };
    struct intx* px = (struct intx*)x;
    struct inty* py = (struct inty*)y;
    px->value = 0;
    py->value = 1;
    return px->value;
}

// silly5.h

#if __STDC_VERSION__ >= 199901L /* >= C99 */

#define RESTRICT restrict

#else /* pre-C99 or C++ */

#define RESTRICT /* just for documentation purposes. */

#endif

// silly5.c

int silly5(int* RESTRICT x, int* RESTRICT y) {

struct intx { int value; };

struct inty { int value; };

struct intx* px = (struct intx*)x;

struct inty* py = (struct inty*)y;

px->value = 0;

py->value = 1;

return px->value;

}

The generated code is identical to the one derived from ‘silly4’.

I still stick with my original recommendation that the ‘restrict’ feature is not that useful as it’s a legally binding contract for the caller while the compiler is free to ignore this plea for performance. If you want to use ‘restrict’ regardless of my advice, even if your compiler doesn’t support it, you now know how to emulate it in a portable fashion.

Pointers in C, Part V: The ‘restrict’ Qualifier

Pointers in C

“Le vrai est trop simple, il faut y arriver toujours par le compliqué.”
(“The truth is too simple: one must always get there by a complicated route.”)
― George Sand, Letter to Armand Barbès, 12 May 1867”

Exactly one year ago, I started this series on pointers, but what I really wanted to blog about originally was a rather arcane and rarely used keyword that first appeared in the C99 language standard: the ‘restrict’ qualifier. But after trying to digest the formal definition in chapter 6.7.3.1 I decided that taking a little detour would make my and my reader’s life much easier.

Let me set the stage for ‘restrict’ by summarizing what I wrote in episode 3 about the “strict aliasing rule”:

1. The compiler might optimize code involving multiple pointers, provided the pointers are not aliased; that is, they don’t point to the same object or memory.

2. The compiler assumes that pointers to incompatible types never alias.

3. The compiler assumes that pointers to compatible types (same types, apart from CV-qualification and signedness) potentially alias.

Therefore, a function with this signature is eligible for compiler optimization:


void transform(const int* input, double* output, size_t nvals);

void transform(const int* input, double* output, size_t nvals);

whereas this one is not:


void transform(const double* input, double* output, size_t nvals);

void transform(const double* input, double* output, size_t nvals);

This is unfortunate, because most likely, the arrays passed to the second version of ‘transform’ are in completely different, non-overlapping memory regions. But the compiler doesn’t know and hence stubbornly adheres to the strict aliasing rule.

The ‘restrict’ qualifier, which — contrary to the ‘const’ and ‘volatile’ qualifiers — can only be applied to pointers, is a promise given by the programmer to the compiler that pointers don’t alias even though they point to objects of the same type. Therefore, this version of ‘transform’ can be optimized by the compiler:


void transform(
    const double* restrict input,
    double* restrict output,
    size_t nvals);

void transform(

const double* restrict input,

double* restrict output,

size_t nvals);

Let’s put this to the test with the ‘silly’ example from episode 3:


int silly(int* x, int* y) {
    *x = 0;
    *y = 1;
    return *x;
}

int silly(int* x, int* y) {

*x = 0;

*y = 1;

return *x;

}

Before knowing about the strict aliasing rule, we were surprised to see that the memory access to ‘x’ in the return statement was not replaced with a simple ‘return 0’. After having learned about the strict alias rule, it’s clear: since ‘x and ‘y’ point to the same type, the compiler must assume that they may point to the same memory location and hence it loads the value pointed to by ‘x’ from memory afresh:


$ gcc -O2 -masm=intel silly.c -S && cat silly.s

$ gcc -O2 -masm=intel silly.c -S && cat silly.s


silly:
        mov     DWORD PTR [rdi], 0
        mov     DWORD PTR [rsi], 1
        mov     eax, DWORD PTR [rdi] ; '*x' fetched from memory.
        ret

silly:

mov DWORD PTR [rdi], 0

mov DWORD PTR [rsi], 1

mov eax, DWORD PTR [rdi] ; '*x' fetched from memory.

ret

Now, if we tell the compiler that ‘x’ and ‘y’ never point to the same memory location, optimization is possible:


int silly3(int* restrict x, int* restrict y) {
    *x = 0;
    *y = 1;
    return *x;
}

int silly3(int* restrict x, int* restrict y) {

*x = 0;

*y = 1;

return *x;

}


$ gcc -O2 -std=c99 -masm=intel silly3.c -S && cat silly3.s

$ gcc -O2 -std=c99 -masm=intel silly3.c -S && cat silly3.s


silly3:
        mov     DWORD PTR [rdi], 0
        mov     DWORD PTR [rsi], 1
        xor     eax, eax            ; equivalent to mov eax, 0
        ret

silly3:

mov DWORD PTR [rdi], 0

mov DWORD PTR [rsi], 1

xor eax, eax ; equivalent to mov eax, 0

ret

Nice, isn’t it?

If you use the ‘restrict’ qualifier on a pointer, you promise that — at least for the lifetime of the restricted pointer — the object pointed to is only accessed through this pointer. Break that promise and you get undefined behavior. (In the ‘silly3’ example, the lifetime of the pointers ‘x’ and ‘y’ end once the call to ‘silly3’ returns.)

In the C99 language standard, many functions from the standard library have been revised and now make use of the ‘restrict’ keyword. Take ‘memcpy’, for instance:


void *memcpy(void* restrict dst, const void* restrict src, size_t n);

void *memcpy(void* restrict dst, const void* restrict src, size_t n);

As everybody knows, ‘memcpy’ can only copy non-overlapping blocks of memory and this fact is nicely highlighted by the use of the ‘restrict’ keyword: during the call to ‘memcpy’ the memory regions src[0] to src[n] as well as dst[0] to dst[n] are exclusively owned and may not be accessed by other pointers. Since ‘memmove’ can copy overlapping blocks of memory (with a little speed penalty, of course), ‘memmove’ consequently doesn’t declare restricted pointers:


void *memmove(void* dst, const void* src, size_t n);

void *memmove(void* dst, const void* src, size_t n);

Please be aware that ‘restrict’ is not supported by the C++ language standard and it’s unclear whether it ever will be. If you mix C99 and C++ code, you might have to strip the ‘restrict’ keyword from C99 headers to avoid compilation errors:


// MyClass.cpp

extern "C" {
#define restrict
#include "MyC99Library.h"
}

// MyClass.cpp

extern "C" {

#define restrict

#include "MyC99Library.h"

}

In general, I’m not a big fan of optimization features that the compiler is free to ignore. If utmost performance is important, you want dependable performance. Most likely, your routine is not on the performance critical path, anyway. If you think it is, carefully profile your code and after you proved that it is, you’d better code that part in assembly language. Without such evidence, sprinkling your code with ‘restrict’ is little short of premature optimization. (I complained here about the unnecessarily overused ‘inline’ keyword for the same reason.)

What I do like about the ‘restrict’ keyword, though, is that by unraveling it, we’ve made a beautiful journey through important everyday programming topics like “pointers vs. arrays”, “type qualifiers”, “pointer conversion rules”, and the “strict aliasing rule”. The journey was the destination.

Pointers in C, Part IV: Pointer Conversion Rules

Pointers in C

“Failure is the key to success; each mistake teaches us something.
— Morihei Ueshiba

Sometimes, someone walks up to you and claims that there is a bug in your well-crafted code. Then, after having successfully proved that individual wrong, it occurs to you that there is indeed a bug—albeit a different one! Those are quite humbling experiences, but experiences that we should be most grateful for.

SETTING THE STAGE

This episode was triggered by feedback that I received from a reader regarding a “Dangerously Confusing Interfaces” post. In said post, I advise that instead of accepting a pointer to “uncopied” memory like this:


void WriteAsync(const void* data, size_t len);

void WriteAsync(const void* data, size_t len);

‘WriteAsync’ should rather take a pointer to an opaque data structure named ‘uncopied_memory’:


typedef struct {
    void* dummy;
} uncopied_memory;

void WriteAsync(const uncopied_memory* data, size_t len);

typedef struct {

void* dummy;

} uncopied_memory;

void WriteAsync(const uncopied_memory* data, size_t len);

“uncopied” memory means that for the sake of efficiency, the called function doesn’t copy the provided data but instead expects you to keep it alive and unchanged while the called function is executed asynchronously. Since the suggested interface change requires an explicit cast to an ‘uncopied_memory’ pointer, it’s a lot less likely that a temporary buffer allocated from the stack is passed accidentally. The idea of the proposed approach is that every call to ‘WriteAsync’ requires an explicit cast that acts as a reminder to the programmer that the buffer’s contents must be preserved.

For instance, if you wanted to pass a structure that I used in the previous installment of this series to ‘WriteAsync’, you would do it like this:


typedef struct {
    uint8_t level;
    uint16_t temperature;
    uint32_t force;
} measurements_t;

extern struct measurements_t my_measurements;
...
WriteAsync((uncopied_memory*) &my_measurements, sizeof(my_measurements));

typedef struct {

uint8_t level;

uint16_t temperature;

uint32_t force;

} measurements_t;

extern struct measurements_t my_measurements;

...

WriteAsync((uncopied_memory*) &my_measurements, sizeof(my_measurements));

But back to the question. What the reader was worried about is that since ‘measurements_t’ and ‘uncopied_memory’ are by no means compatible, wouldn’t a cast to an ‘uncopied_memory’ pointer constitute a violation of the “strict aliasing rule“?

Actually, when it comes to the “strict aliasing rule,” the fact that these structs have incompatible members doesn’t really matter—even if you accessed the stored value through a pointer to a struct with an identical set of members you would be in trouble; if the tag names of the structs are different, it already counts as a violation of the “strict aliasing rule.”

The key word here is access. If you just create a pointer to incompatible types, everything is fine. Within ‘WriteAsync’ you just cast the received ‘uncopied_memory’ pointer into a ‘uint8_t’ pointer and access the provided data byte-wise, which is always safe, as you know (if you didn’t know, go back and read my previous post).

So far, so good. We don’t access stored memory through incompatible pointers; we only do pointer conversion, which is always safe, isn’t it? I replied to my reader that everything was fine, there was no violation of the “strict aliasing rule.”

Nevertheless, I couldn’t rid myself of this nagging feeling about whether the conversion/cast is really always safe.

POINTER CONVERSION RULES

The venerable book “The C Programming Language” by Brian Kernighan and Dennies Ritchie has this to say on pointer conversions:

A pointer to one type may be converted to a pointer to another type. The resulting pointer may cause addressing exceptions if the subject pointer does not refer to an object suitably aligned in storage. It is guaranteed that a pointer to an object may be converted to a pointer to an object whose type requires less or equally strict storage alignment and back again without change; the notion of alignment” is implementation-dependent, but objects of the char types have least strict alignment requirements. As described in Par.A.6.8, a pointer may also be converted to type void * and back again without change.

Let me paraphrase: pointer conversion is safe provided the alignment requirements of the target type are less or equal to the alignment requirements of the source type. The converted pointer can be converted back to the original pointer without problems.

Though, the statement “The resulting pointer may cause addressing exceptions” is not clear to me. What does it mean? If the target type has stricter alignment requirements, do you get “addressing exceptions” when you create the pointer or when you access memory through it? Let’s assume that we are on a typical platform where objects of type ‘double’ are aligned on an 8-byte boundary and ‘chars’ have no alignment requirements (‘chars’ are aligned on a 1-byte boundary, so to speak.):


double PI = 3.1415927;

char* pc = (char*) &PI;          // (1)
char byte0 = *pc;                // (2)

double* pd = (double*) &byte0;   // (3)
double d = *pd;                  // (4)

double PI = 3.1415927;

char* pc = (char*) &PI; // (1)

char byte0 = *pc; // (2)

double* pd = (double*) &byte0; // (3)

double d = *pd; // (4)

The conversion (1) is 100% safe and so is the corresponding read-access (2): the alignment requirements of type ‘char’ are less than the alignment requirements of type ‘double’. (4) is 100% unsafe, but what about (3)? Aren’t we just creating a pointer? To find out, I had to dig deep into my copy of the C99 language standard. Eventually, I found what I call the “pointer conversion rule”:

6.3.2.3/7 A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object.

There you have it and much more precise than the paragraph from “The C Programming Language.” Believe it or not—statement (3), the sheer pointer conversion already gets you into the realm of undefined behavior. Who knew?

So what does this mean regarding the conversion/cast from a ‘measurements_t’ pointer to an ‘uncopied_memory’ pointer? As we know from the standard, it would be safe if the alignment requirements for ‘uncopied_memory’ were less or equal to the alignment requirements of ‘measurements_t’.

In the previous example, we had to deal with primitive types (‘char’, ‘double’) whose alignment requirements can easily be determined. In order to find out about the alignment requirements for structs, we need to dive once more into the C99 standard document:

6.7.2.1/13 A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.

Meditate on this for a while. Paraphrased, this all means that the alignment requirements of a struct are the same as the alignment requirements of a struct’s first member. So the question boils down to this: Are the alignment requirements of a ‘void’ pointer (‘uncopied_memory’s first member) less or equal to the alignment requirements of a ‘char’ (‘measurements_t’s first member)?

Of course, they’re not! A pointer type (like void*) is more or less just an integer type in disguise that is capable of holding all the addresses of your system and as such, pointer types have the same alignment requirements as regular integer types. On a 32-bit platform, pointers typically comprise 4 bytes. Thus, on typical 32-bit platforms, they will need to be aligned on 4-byte boundaries.

By contrast, a character (like the first element of measurements_t) comprises exactly one byte and thus has no alignment requirement—it can be stored at any address in memory.

Since the alignment requirements of the first element of ‘uncopied_memory’ are stronger than the alignment requirements of ‘measurements_t’, we can conclude that my advice to cast to ‘uncopied_memory’ may yield undefined behavior. Not because of the “strict aliasing rule,” but because of a violation of the “pointer conversion rules.”

To solve the problem, the type of the ‘dummy’ member of ‘uncopied_memory’ needs to be changed to ‘char’, a type that has the weakest alignment requirements. I have updated the “Dangerously Confusing Interfaces” post accordingly.

Pointers in C, Part III: The Strict Aliasing Rule

“Know the rules well, so you can break them effectively.”
— Dalai Lama XIV

One of the lesser-known secrets of the C programming language is the so-called “strict aliasing rule”. This is a shame, because failing to adhere to it takes you (along with your code) straight into the realm of undefined behavior. As no one in their right mind wants to go there, let’s shed some light on it!

POINTER ALIASING DEFINED

First of all, we have to clarify what “aliasing” really means, or rather aliasing of pointers. Take a look at this example:


int value;

int* p1 = &value;   // p1 points to 'value'.
int* p2 = &value;   // p2 as well...

int value;

int* p1 = &value; // p1 points to 'value'.

int* p2 = &value; // p2 as well...

Here, ‘p1’ and ‘p2’ are aliased to the same object ‘value’; that is, they point to the same object. If you update ‘value’ through ‘p1’:


*p1 = 42;

*p1 = 42;

a read through ‘p2’ will reflect this change:


assert((*p1 == *p2) && (value == *p2)); // So true...

assert((*p1 == *p2) && (value == *p2)); // So true...

Because of the possibility of aliasing, a C compiler is prevented from applying certain optimizations. Consider:


int silly(int* x, int* y) {
    *x = 0;
    *y = 1;
    return *x;
}

int silly(int* x, int* y) {

*x = 0;

*y = 1;

return *x;

}

You might think that any decent compiler would generate simplified code equivalent to this:


int silly(int* x, int* y) {
    *x = 0;
    *y = 1;
    return 0;   // *x was previously set to 0, so don't load from memory again.
}

int silly(int* x, int* y) {

*x = 0;

*y = 1;

return 0; // *x was previously set to 0, so don't load from memory again.

}

It’s not a matter of decency — the compiler just can’t do this optimization! Here’s the assembly output that clearly shows that the return value is loaded from memory:


$ gcc -O2 -masm=intel silly.c -S && cat silly.s

$ gcc -O2 -masm=intel silly.c -S && cat silly.s


silly:
        mov     DWORD PTR [rdi], 0
        mov     DWORD PTR [rsi], 1
        mov     eax, DWORD PTR [rdi] ; '*x' fetched from memory.
        ret

silly:

mov DWORD PTR [rdi], 0

mov DWORD PTR [rsi], 1

mov eax, DWORD PTR [rdi] ; '*x' fetched from memory.

ret

The optimization is not possible because the caller could call ‘silly’ like so:


int value;
silly(&value, &value);

int value;

silly(&value, &value);

In this case, ‘x’ and ‘y’ are aliased to the same ‘value’, which means ‘silly’ must return 1 not 0. Consequently, ‘*x’ must be read from memory, every time. Period.

ROOM FOR IMPROVEMENT

If you think about it, even though it may happen, pointer aliasing won’t happen very often in practice. Why waste so much potential for optimization for the uncommon case? Most likely, the folks from the C standards committee had the same line of thinking. They introduced rules that state when pointer aliasing must not happen. Enter the strict aliasing rule.

To facilitate compiler optimization, the strict aliasing rule demands that (in simple words) pointers to incompatible types never alias. Pointers to compatible types (like the two ‘int’ pointers ‘x’ and ‘y’ in ‘silly’) are assumed to (potentially) alias. Let’s make the pointer types incompatible (‘short*’ vs. ‘int*’):


int silly2(short* x, int* y) {
    *x = 0;
    *y = 1;
    return *x;
}

int silly2(short* x, int* y) {

*x = 0;

*y = 1;

return *x;

}


$ gcc -O2 -masm=intel silly2.c -S && cat silly2.s

$ gcc -O2 -masm=intel silly2.c -S && cat silly2.s


silly2:
        mov     WORD PTR [rdi], ax
        mov     DWORD PTR [rsi], 1
        xor     eax, eax            ; equivalent to mov eax, 0
        ret

silly2:

mov WORD PTR [rdi], ax

mov DWORD PTR [rsi], 1

xor eax, eax ; equivalent to mov eax, 0

ret

As you can see, this time no load from memory is performed — 0 is returned instead. The optimization is possible because the compiler assumes that aliasing is not allowed in this case.

VIOLATIONS

But what happens if pointers to incompatible types nevertheless alias? After all, this can happen quite easily. Maybe not in the ‘silly’ example, but in real-world production code:


struct measurements_t {
    uint8_t level;
    uint16_t temperature;
    uint32_t force;
};

void convert(const uint8_t* data, struct measurements_t* measurements) {
    /* Fill measurements object with raw data. */
    *measurements = *((struct measurements_t*) &data[0]);
}

struct measurements_t {

uint8_t level;

uint16_t temperature;

uint32_t force;

};

void convert(const uint8_t* data, struct measurements_t* measurements) {

/* Fill measurements object with raw data. */

*measurements = *((struct measurements_t*) &data[0]);

}

In an attempt to convert data stored in a buffer (maybe read over a network connection) into a high-level structure, a pointer to ‘struct measurements_t’ is aliased with a pointer to a ‘uint8_t’. Since both types are incompatible (pointer to struct vs. pointer to ‘uint8_t’) this code is a violation of the strict aliasing rule. Experienced C developers most likely recognized immediately that this code yields undefined behavior, but they would have probably attributed it to struct padding and alignment issues. The real reason, as we know by now, is a violation of the strict aliasing rule.

THE FINE PRINT

So what exactly is the strict aliasing rule and what does “type compatibility” mean? Here’s an excerpt from the ISO C99, standard, chapter 6.5:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

a type compatible with the effective type of the object,

a qualified version of a type compatible with the effective type of the object,

a type that is the signed or unsigned type corresponding to the effective type of the object,

a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,

an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

a character type.

Such Standardeese is often hard to digest, so let me try to clarify it a bit. Aliased pointer access is fine if:

1. The pointed-at types are identical. Note that typedefs are just type aliases and don’t introduce new types:


typedef int INT;
INT* p = ...
int x = *((int*) p);    // Fine and cast not really necessary!

typedef int INT;

INT* p = ...

int x = *((int*) p); // Fine and cast not really necessary!

2. The pointed-at types are identical apart from the “signed-ness” (e. g. ‘int’ vs. ‘unsigned int’).
3. The pointed-at types are identical apart from qualification (e. g. ‘const int’ vs. ‘int’).
4. The rule “an aggregate or union type that includes one of the aforementioned types among its members” is highly confusing and probably doesn’t mean much. Check this out for details.
5. The pointed-at types are different, but the pointed-at type through which the access is made is a pointer to character:


float f = 3.1415;
unsigned char* p = (unsigned char*) &f;
unsigned char a1 = p[0];   // First byte of 'f'.
unsigned char a2 = p[1];   // :
unsigned char a3 = p[2];   // :
unsigned char a4 = p[3];   // Last byte of 'f'.

float f = 3.1415;

unsigned char* p = (unsigned char*) &f;

unsigned char a1 = p[0]; // First byte of 'f'.

unsigned char a2 = p[1]; // :

unsigned char a3 = p[2]; // :

unsigned char a4 = p[3]; // Last byte of 'f'.

Conversely, aliased pointer access is not defined if the pointed-at types are fundamentally different. Note that this includes pointers to structs that are identically defined but have different tag names:


struct S1 { int x; }; // tag 'S1'.
struct S2 { int x; }; // tag 'S2'.

S1* s1;
S2 = *((S2*) s1);     // Undefined behavior!

struct S1 { int x; }; // tag 'S1'.

struct S2 { int x; }; // tag 'S2'.

S1* s1;

S2 = *((S2*) s1); // Undefined behavior!

CONCLUSION

The strict aliasing rule was introduced to give the compiler vendors some leeway regarding optimizations. By default, the compiler assumes that pointers to (loosely speaking) incompatible types never alias. As a consequence, you, the programmer, have to make sure that this rule is obeyed.

Here’s some disquieting news: a lot of existing code isn’t conforming to the strict aliasing rule, but the code works (or appears to work) fine anyway. As an example, the ‘convert’ function above, which aliases a struct to an array of bytes might work fine on an Intel x86-based platform, which supports unaligned memory access. However, if you use ‘convert’ on an (older) ARM-based platform, you might get a “bus error” exception that could crash your system. In other cases, nonconforming code just works by coincident, with a particular compiler, or a particular compiler version at a particular optimization level.

To me, knowing about the strict aliasing rule is as important for every systems developer as knowing about the other systems programming “secrets” like alignment, struct padding, and endianness.

Pointers in C, Part II: CV-Qualifiers

“A teacher is never a giver of truth; he is a guide, a pointer to the truth that each student must find for himself.”
— Bruce Lee

In part I of this series, I explained what pointers are in general, how they are similar to arrays, and — more importantly — where, when, and why they are different to arrays. Today, I’ll shed some light on the so-called ‘cv qualifiers’ which are frequently encountered in pointer contexts.

CV-QUALIFIER BASICS

CV-qualifiers allow you to supplement a type declaration with the keywords ‘const’ or ‘volatile’ in order to give a type (or rather an object of a certain type) special treatment. Take ‘const’, for instance:


const double PI = 3.1415927;
PI = 1.23;  // Error, PI is constant.
PI += 1;    // dito.

const double PI = 3.1415927;

PI = 1.23; // Error, PI is constant.

PI += 1; // dito.

‘const’ is a guarantee that a value isn’t (inadvertently) changed by a developer. On top of that, it gives the compiler some leeway to perform certain optimizations, like placing ‘const’ objects in ROM/non-volatile memory instead of (expensive) RAM, or even not storing the object at all and instead ‘inline’ the literal value whenever it’s needed.

‘volatile’, on the other hand, prevents optimizations. It’s a hint to the compiler that the value of an object can change in ways not known by the compiler and thus the value must never be cached in a processor register (or inlined) but instead always loaded from memory. Apart from this ‘don’t optimize’ behavior, there’s little that ‘volatile’ guarantees. In particular — and contrary to common belief — it’s no cure for typical race condition problems — It’s mostly used in signal handlers and to access memory-mapped hardware devices.

Even if it sounds silly at first, it’s possible to combine ‘const’ and ‘volatile’. The following code declares a constant that shall not be inlined/optimized:


const volatile int MAX_SENSORS = 4;
...
for (int i = 0; i < MAX_SENSORS; ++i) {  // Always load MAX_SENSORS
                                         // value from memory.
    sum += sensors[i].value;
}

const volatile int MAX_SENSORS = 4;

...

for (int i = 0; i < MAX_SENSORS; ++i) { // Always load MAX_SENSORS

// value from memory.

sum += sensors[i].value;

}

Using both ‘const’ and ‘volatile’ together makes sense when you want to ensure that developers can’t change the value of a constant and at the same time retain the possibility to update the value through some other means, later. In such a setting, you would place ‘MAX_SENSORS’ in a dedicated non-volatile memory section (ie. flash or EEPROM) that is independent of the code, eg. a section that only hosts configuration values^*. By combining ‘const’ and ‘volatile’ you ensure that the latest configuration values are used and that these configuration values cannot be altered by the programmer (ie. from within the software).

To sum it up, ‘const’ means “not modifiable by the programmer” whereas ‘volatile’ denotes “modifiable in unforeseeable ways”.

CV-QUALIFIERS COMBINED WITH POINTERS

Like I stated in the intro, cv-qualifiers often appear in pointer declarations. However, this poses a problem because we have to differentiate between cv-qualifying the pointer and cv-qualifying the pointed-to object. There are “pointers to ‘const'” and “‘const’ pointers”, two terms that are often confused. Here’s code involving a pointer to a constant value:


const int MAX_RATE = 200;
const int MIN_RATE = 10;
int default_rate = 42;

const int* rate;
rate = &MAX_RATE;    // Point to memory containing MAX_RATE.
rate = &MIN_RATE;    // Now point to memory containing MIN_RATE.

*rate = 1000;        // Error: pointer-to-const cannot modify
                     // pointed-to object.

rate = &default_rate // Point to non-const value.
*rate = 1000;        // Error: pointer-to-const cannot modify
                     // pointed-to object.

const int MAX_RATE = 200;

const int MIN_RATE = 10;

int default_rate = 42;

const int* rate;

rate = &MAX_RATE; // Point to memory containing MAX_RATE.

rate = &MIN_RATE; // Now point to memory containing MIN_RATE.

*rate = 1000; // Error: pointer-to-const cannot modify

// pointed-to object.

rate = &default_rate // Point to non-const value.

*rate = 1000; // Error: pointer-to-const cannot modify

// pointed-to object.

Since the pointer is declared as pointing to ‘const’, no changes through this pointer are possible, even if it points to a mutable object in reality.

Constant pointers, on the other hand, behave differently. Have a look at this example:


int default_rate = 42;  // Non-const value.
int current_rate = 19;  // dito.

int* const p;                   // Error: const pointers must be 
                                // initialized.
int* const p = &current_rate;   // Fine, point to a non-const value.
*p = 50;                        // Indirectly update current rate.
p = &default_rate               // Error: const pointers can't be 
                                // bound to another object.
++p;                            // dito.

int default_rate = 42; // Non-const value.

int current_rate = 19; // dito.

int* const p; // Error: const pointers must be

// initialized.

int* const p = &current_rate; // Fine, point to a non-const value.

*p = 50; // Indirectly update current rate.

p = &default_rate // Error: const pointers can't be

// bound to another object.

++p; // dito.

The takeaway is this: if the ‘const’ keyword appears to the left of the ‘*’, the pointed-to value is ‘const’ and hence we are dealing with a pointer to ‘const’; if the ‘const’ keyword is to the right of the ‘*’, the pointer itself is ‘const’. Of course, it’s possible to have the ‘const’ qualifier on both sides at the same time:


const int * const rate = &MAX_RATE;
*rate = 42;                     // Error: pointer to const can't 
                                // modify value.
++rate;                         // Error: const pointer can't 
                                // point elsewhere.

const int * const rate = &MAX_RATE;

*rate = 42; // Error: pointer to const can't

// modify value.

++rate; // Error: const pointer can't

// point elsewhere.

The same goes for multi-level pointers:


const int * const * v;

const int * const * v;

Here, ‘v’ is a regular (non-‘const’) pointer to ‘const’ pointer to a pointer to a ‘const’ integer.

Yuck! Sometimes, I really wish the inventors of C had used ‘<-‘ instead of ‘*’ for pointer declarations — the resulting code would have been easier on the eyes! Consider:


int* p;

int* p;

versus


int <- p;    // say: "p is a POINTER TO int"

int <- p; // say: "p is a POINTER TO int"


const int <- const <- v;

const int <- const <- v;

would read from right to left as “v is a POINTER TO const POINTER TO const int”. Life would be some much simpler… but let’s face reality and stop day-dreaming!

Everything I said about ‘const’ equally applies to pointers to ‘volatile’ and ‘volatile’ pointers: pointers to ‘volatile’ ensure that the pointed-to value is always loaded from memory whenever a pointer is dereferenced; with ‘volatile’ pointers, the pointer itself is always loaded from memory (and never kept in registers).

Things really get complicated when there is a free mix of ‘volatile’ and ‘const’ keywords with pointers involving more than two levels of indirection:


volatile int * const volatile * volatile * p;

volatile int * const volatile * volatile * p;

Let’s better not go there! If you are in multi-level pointer trouble, remember that there’s a little tool called ‘cdecl‘ which I showcased in the previous episode. But now let’s move on to the topic of how and when cv-qualified pointers can be assigned to each other.

ASSIGNMENT COMPATIBILITY I

Pointers are assignable if the pointer on the left hand side of the ‘=’ sign is not more capable than the pointer on the right hand side. In other words: you can assign a less constrained pointer to a more constrained pointer, but not vice versa. If you could, the promise made by the constrained pointer would be broken:


const int* pc;
int* p;

pc = p;     // OK, since 'p' is a read/write pointer and
            // 'pc' is a read-only pointer.
p = pc;     // Error: 'pc' is more constrained than 'p'.

const int* pc;

int* p;

pc = p; // OK, since 'p' is a read/write pointer and

// 'pc' is a read-only pointer.

p = pc; // Error: 'pc' is more constrained than 'p'.

If the previous statement was legal, a programmer could suddenly get write access to a read-only variable:


const int VALUE = 42;
const int* pc = &VALUE;     // Equal restrictiveness on both 
                            // sides (ie. const).
*pc = 43;                   // Error: no write access.
int* p = pc;                // Let's pretend this was legal...
*p = 43;                    // const value updated!

const int VALUE = 42;

const int* pc = &VALUE; // Equal restrictiveness on both

// sides (ie. const).

*pc = 43; // Error: no write access.

int* p = pc; // Let's pretend this was legal...

*p = 43; // const value updated!

Again, the same restrictions hold for pointers to ‘volatile’. In general, pointers to cv-qualified objects are more constrained than their non-qualified counterparts and hence may not appear on the right hand side of an assignment expression. By the same token, this is not legal:


const volatile int* pcv;
const* pc;
pc = pcv;               // Error: right hand side is more constrained...
pcv = pc                // OK.

const volatile int* pcv;

const* pc;

pc = pcv; // Error: right hand side is more constrained...

pcv = pc // OK.

ASSIGNMENT COMPATIBILITY II

The rule which requires that the right hand side must not be more constrained than the left hand side might lead you to the conclusion that the following code is perfectly kosher:


int value = 100;
int* p = &value;
int** pp = &p;

const int** ppc = pp;   // Error: incompatible assignment.

int value = 100;

int* p = &value;

int** pp = &p;

const int** ppc = pp; // Error: incompatible assignment.

However, it’s not, and for good reason, as I will explain shortly. But it’s far from obvious and it’s a conundrum to most — even seasoned — C developers. Why is it possible to assign a pointer to non-const to a pointer to ‘const’:


const int *pc;
int* p;
pc = p;             // OK.

const int *pc;

int* p;

pc = p; // OK.

but not a pointer to a pointer to non-const to a pointer to a pointer to ‘const’?


const int** ppc;
int** pp;
ppc = pp;           // Error.

const int** ppc;

int** pp;

ppc = pp; // Error.

Here is why. Imagine this example:


const int VALUE = 42;
int* p;
const int** ppc;
ppc = &p;           // Error, but let's pretend this was legal.

const int VALUE = 42;

int* p;

const int** ppc;

ppc = &p; // Error, but let's pretend this was legal.

Graphically, our situation is this. ‘ppc’ points to ‘p’ which in turn points to some random memory location, as it hasn’t been initialized yet:


VALUE       0x00B00010: 00 00 00 2A     // 42
:           :
p           0x00004220: ?? ?? ?? ??     // Points to random location
ppc         0x00004224: 00 00 42 20     // Points to 'p'

VALUE 0x00B00010: 00 00 00 2A // 42

: :

p 0x00004220: ?? ?? ?? ?? // Points to random location

ppc 0x00004224: 00 00 42 20 // Points to 'p'

Now, when we dereference ‘ppc’ one time, we get to our pointer ‘p’. Let’s point it to ‘VALUE’:


*ppc = &VALUE;

*ppc = &VALUE;

It shouldn’t surprise you that this assignment is valid: the right hand side (pointer to const int) is not less constrained than the left hand side (also pointer to const int). The resulting picture is this:


VALUE       0x00B00010: 00 00 00 2A     // 42
:           :
p           0x00004220: 00 B0 00 10     // Now points to 'VALUE'
ppc         0x00004224: 00 00 42 20     // Points to 'p'

VALUE 0x00B00010: 00 00 00 2A // 42

: :

p 0x00004220: 00 B0 00 10 // Now points to 'VALUE'

ppc 0x00004224: 00 00 42 20 // Points to 'p'

Everything looks safe. If we attempt to update ‘VALUE’, we won’t succeed:


**ppc = 666; // Error: can't update through pointer to 'const'.

**ppc = 666; // Error: can't update through pointer to 'const'.

But we are far from safe. Remember that we also (indirectly) updated ‘p’ which was declared as pointing to a non-const int and ‘p’ was declared as pointing to non-const? The compiler would happily accept the following assignment:


*p = 666;

*p = 666;

which leads to undefined behavior, as the C language standard calls it.

This example should have convinced you that it’s a good thing that the compiler rejects the assignment from ‘int**’ to ‘const int**’: it would open-up a backdoor for granting write access to more constrained objects. Finding the corresponding words in the C language standard is not so easy, however and requires some digging. If you feel “qualified” enough (sorry for the pun), look at chapter “6.5.16.1 Simple assignment”, which states the rules of objects assignability. You probably also need to have a look at “6.7.5.1 Pointer declarators” which details pointer type compatibility as well as “6.7.3 Type qualifiers” which specifies compatibility of qualified types. Putting this all into a cohesive picture is left as an exercise to the diligent reader.

________________________________
^{*) Separating code from configuration values is generally a good idea in embedded context as it allows you to replace either of them independently.↩}

Approxion

Code – People – Everything

Category Archives: Pointers in C

Pointers in C, Part VII: Being Relaxed About The Strict Aliasing Rule

Pointers in C, Part VI: Faking ‘restrict’

Pointers in C, Part V: The ‘restrict’ Qualifier

Pointers in C, Part IV: Pointer Conversion Rules

Pointers in C, Part III: The Strict Aliasing Rule

Pointers in C, Part II: CV-Qualifiers