Strict aliasing violations in C and how to write conformant code?

Strict aliasing violations in C and how to write conformant code? - c

The code seen below is typical of what is seen in some arena implementations. One such
example can be found here (blog article on an example impl.).
#include<stdio.h>
#include<stdint.h>
#include<stdalign.h>
struct thing {
int a;
int b;
};
char buffer[128];
int main ()
{
uintptr_t p1 = (uintptr_t)buffer;
if (p1 % alignof(struct thing)) return 1;
struct thing *t1 = (void*)buffer;
t1->a = 10;
t1->b = 20;
uintptr_t p2 = (uintptr_t)(buffer + sizeof(struct thing));
if (p2 % alignof(struct thing)) return 1;
struct thing *t2 = (void*)(buffer + sizeof(struct thing));
t2->a = 30;
t2->b = 40;
printf("%d\n",t1->a);
printf("%d\n",t2->a);
return 0;
}
edited code: Made the program return 1 if any pointer lacks proper alignment
Is this an instance of a strict aliasing violation, and ...
Is the only way to place such structures in a buffer and to retrieve a safe to use pointer to the structure to do for example:
struct thing *t1 = memcpy(buffer,&((struct thing){10,20}),sizeof(struct thing));

Is this an instance of a strict aliasing violation
Yes. t1->a etc access the character array through a different type than the "effective type" (char).
Is the only way to place such structures in a buffer and to retrieve a safe to use pointer to the structure to do for example:
You can also create a union of a raw character array and the type you wish to convert to. Example:
typedef union
{
struct thing t;
char buf[128];
} strict_aliasing_hack;
...
strict_aliasing_hack thing t1 = *(strict_aliasing_hack*)buffer;
This is ok because strict_aliasing_hack is "an aggregate or union type that includes one a type compatible with the effective type of the object among its members" (C17 6.5/7).
Naturally, it is best to stay clear of fishy conversions like this entirely. For example the chunk of data returned from malloc has no effective type. So the original code is much better written as:
struct thing *t1 = malloc(128);
And now you can lvalue access *t1 in any way you like.

The problem is that per standard only dynamic memory can be used that way.
Clause 5 Expressions says (ref n1570 for C11):
§6 The effective type of an object for an access to its stored value is the declared type of the object, if any.
If you use:
void * buffer = malloc(128);
then buffer is guaranteed to have an alignment compatible with any standard type and has no declared type.
In that case you can safely store a thing object in buffer without triggering any strict aliasing violation. But in your example code, buffer has a declared type which is char. Whatever the alignment, using a different type is then a strict aliasing violation.

IMO memcpy is always the safest way. It will produce the optimized enough output for the particular platform.
typedef struct thing {
int a;
int b;
}thing;
int geta(const void *buff, const size_t offset)
{
const unsigned char *chbuff = buff;
thing t;
memcpy(&t, chbuff + offset, sizeof(t));
return t.a;
}
int geta1(const void *buff, const size_t offset)
{
const unsigned char *chbuff = buff;
int a;
memcpy(&a, chbuff + offset + offsetof(thing, a), sizeof(a));
return a;
}
https://godbolt.org/z/x8e96ezW9

Related

Does the stb lib violate Strict Aliasing rules in C?

I'm interested in the technique used by Sean Barrett to make a dynamic array in C for any type. Comments in the current version claims the code is safe to use with strict-aliasing optimizations:
https://github.com/nothings/stb/blob/master/stb_ds.h#L332
You use it like:
int *array = NULL;
arrput(array, 2);
arrput(array, 3);
The allocation it does holds both the array data + a header struct:
typedef struct
{
size_t length;
size_t capacity;
void * hash_table;
ptrdiff_t temp;
} stbds_array_header;
The macros/functions all take a void* to the array and access the header by casting the void* array and moving back one:
#define stbds_header(t) ((stbds_array_header *) (t) - 1)
I'm sure Sean Barrett is far more knowledgeable than the average programmer. I'm just having trouble following how this type of code is not undefined behavior because of the strict aliasing rules in modern C. If this does avoid problems I'd love to understand why it does so I can incorporate it myself (maybe with a few less macros).

Lets follow the expansions of arrput in https://github.com/nothings/stb/blob/master/stb_ds.h :
#define STBDS_REALLOC(c,p,s) realloc(p,s)
#define arrput stbds_arrput
#define stbds_header(t) ((stbds_array_header *) (t) - 1)
#define stbds_arrput(a,v) (stbds_arrmaybegrow(a,1), (a)[stbds_header(a)->length++] = (v))
#define stbds_arrmaybegrow(a,n) ((!(a) || stbds_header(a)->length + (n) > stbds_header(a)->capacity) \
? (stbds_arrgrow(a,n,0),0) : 0)
#define stbds_arrgrow(a,b,c) ((a) = stbds_arrgrowf_wrapper((a), sizeof *(a), (b), (c)))
#define stbds_arrgrowf_wrapper stbds_arrgrowf
void *stbds_arrgrowf(void *a, size_t elemsize, size_t addlen, size_t min_cap)
{
...
b = STBDS_REALLOC(NULL, (a) ? stbds_header(a) : 0, elemsize * min_cap + sizeof(stbds_array_header));
//if (num_prev < 65536) prev_allocs[num_prev++] = (int *) (char *) b;
b = (char *) b + sizeof(stbds_array_header);
if (a == NULL) {
stbds_header(b)->length = 0;
stbds_header(b)->hash_table = 0;
stbds_header(b)->temp = 0;
} else {
STBDS_STATS(++stbds_array_grow);
}
stbds_header(b)->capacity = min_cap;
return b;
}
how this type of code is not undefined behavior because of the strict aliasing
Strict aliasing is about accessing data that has different effective type than data stored there. I would argue that the data stored in the memory region pointed to by stbds_header(array) has the effective type of the stbds_array_header structure, so accessing it is fine. The structure members are allocated by realloc and initialized one by one inside stbds_arrgrowf by stbds_header(b)->length = 0; lines.
how this type of code is not undefined behavior
I think the pointer arithmetic is fine. You can say that the result of realloc points to an array of one stbds_array_header structure. In other words, when doing the first stbds_header(b)->length = inside stbds_arrgrowf function the memory returned by realloc "becomes" an array of one element of stbds_array_header structures, as If a value is stored into an object having no declared type through an lvalue having a type that is not a character type, then the type of the lvalue becomes the effective type of the object for that access from https://port70.net/~nsz/c/c11/n1570.html#6.5p6 .
int *array is assigned inside stbds_arrgrow to point to "one past the last element of an array" of one stbds_array_header structure. (Well, this is also the same place where an int array starts). ((stbds_array_header *) (array) - 1) calculates the address of the last array element by subtracting one from "one past the last element of an array". I would rewrite it as (char *)(void *)t - sizeof(stbds_array_header) anyway, as (stbds_array_header *) (array) sounds like it would generate a compiler warning.
Assigning to int *array in expansion of stbds_arrgrow a pointer to (char *)result_of_realloc + sizeof(stbds_array_header) may very theoretically potentially be not properly aligned to int array type, breaking If the resulting pointer is not correctly aligned for the referenced type, the behavior is undefined from https://port70.net/~nsz/c/c11/n1570.html#6.3.2.3p7 . This is very theoretical, as stbds_array_header structure has size_t and void * and ptrdiff_t members, in any normal architecture it will have good alignment to access int (or any other normal type) after it.
I have only inspected the code in expansions of arrput. This is a 2000 lines of code, there may be other undefined behavior anywhere.

Need help to resolve warning: dereferencing type-punned pointer will break strict-aliasing rules

I am working on a set of C code to optimize it. I came across a warning while fixing a broken code.
The environment is Linux, C99, compiling with -Wall -O2 flags.
Initially a struct text is defined like this:
struct text {
char count[2];
char head[5];
char textdata[5];
}
The code is to return pointer T1 and T2 to expected head and textdata strings:
int main(void) {
struct text *T1;
char *T2;
char data[] = "02abcdeabcde";
T1 = (struct text *)data;
T2 = T1->textdata;
gettextptr((char *)T1, T2);
printf("\nT1 = %s\nT2 = %s\n", (char *)T1, T2);
return (0);
}
void gettextptr(char *T1, char *T2) {
struct text *p;
int count;
p = (struct text *)T1;
count = (p->count[0] - '0') * 10 + (p->count[1] - '0');
while (count--) {
if (memcmp(T2, T1, 2) == 0) {
T1 += 2;
T2 += 2;
}
}
}
This wasn't working as expected. It was expected to return the addresses of first 'c' and last 'e'. Through GDB, I found that, once execution pointer returns from gettextptr() to parent function, it doesn't keep the address of T1 and T2. Then I tried another approach to 'Call by reference' by using double pointer:
int main(void) {
struct text *T1;
char *T2;
char data[] = "02abcdeabcde";
T1 = (struct text *)data;
T2 = T1->textdata;
gettextptr((char **)&T1, &T2);
printf("\nT1 = %s\nT2 = %s\n", (char *)T1, T2);
return (0);
}
void gettextptr(char **T1, char **T2) {
struct text *p;
int count;
p = (struct text *)(*T1);
count = (p->count[0] - '0') * 10 + (p->count[1] - '0');
while (count--) {
if (memcmp(*T2, *T1, 2) == 0) {
*T1 += 2;
*T2 += 2;
}
}
}
When I compile this code with -Wall -O2, I am getting the following GCC warning:
pointer.c: In function ‘main’:
pointer.c:23: warning: dereferencing type-punned pointer will break strict-aliasing rules
So:
Was the code calling by value in first case?
Isn't (char **) permitted for casting while keeping strict aliasing rules?
What am I missing to resolve this warning?

The strict aliasing rule is paragraph 6.5/7 of the Standard. It says basically that you may access an object only through an lvalue of compatible type, possibly with additional qualifiers; the corresponding signed / unsigned type; an array, structure, or union type with one of those among its members, or a character type. The diagnostic you received is saying that your code violates that rule, and it does, multiple times.
You get yourself in trouble very early with:
T1 = (struct text *)data;
That conversion is allowed, though the resulting pointer is not guaranteed to be correctly aligned, but there's not much you can then do with T1 without violating the strict aliasing rule. In particular, if you dereference it with * or -> -- which is in fact the very next thing you do -- then you access a char array as if it were a struct text. That is not allowed, though the reverse would be a different story.
Converting T1 to a char * and accessing the pointed to array through that pointer, as you do later, are some of the few things you may do with it.
gettextexpr() is the same (both versions). It performs the same kind of conversion described above, and dereferences the converted pointer when it accesses p->count. The resulting behavior violates the strict aliasing rule, and is therefore undefined. What GCC is actually complaining about in the second case, however, is probably accessing *T1 as if it were a char *, when it is really a struct text * -- another, separate, strict aliasing violation.
So, in response to your specific questions:
Was the code calling by value in first case?
C has only pass by value, so yes. In the first case, you pass two char pointers by value, which you could then use to modify the caller's char data. In the second case, you pass two char * pointers by value, which you can and do use to modify the caller's char * variables.
Isn't (char **) permitted for casting while keeping strict aliasing rules?
No, absolutely not. Casting to char * (not char **) can allow you to access an object's representation through the resulting pointer, because dereferencing a char * produces an lvalue of character type, but there is no type that can generically be converted from without strict-aliasing implications.
What am I missing to resolve this warning?
You are missing that what you are trying to do is fundamentally disallowed. C does not permit access a char array as if it were a struct text, period. Compilers may nevertheless accept code that does so, but its behavior is undefined.
Resolve the warning by abandoning the cast-to-structure approach, which is providing only a dusting of syntactic sugar, anyway. It's actually simpler and clearer to get rid of all the casting and write:
count = ((*T1)[0] - '0') * 10 + ((*T1)[1] - '0');
It's perhaps clearer still to get rid of all the casting use sscanf:
sscanf(*T1, "%2d", &count);
Note also that even if it were allowed, your specific access pattern seems to make assumptions about the layout of the structure members that are not justified by the language. Implementations may use arbitrary padding between members and after the last member, and your code cannot accommodate that.

Copying char arr[64] to char arr[] can cause a segmentation fault?

typedef struct {
int num;
char arr[64];
} A;
typedef struct {
int num;
char arr[];
} B;
I declared A* a; and then put some data into it. Now I want to cast it to a B*.
A* a;
a->num = 1;
strcpy(a->arr, "Hi");
B* b = (B*)a;
Is this right?
I get a segmentation fault sometimes (not always), and I wonder if this could be the cause of the problem.
I got a segmentation fault even though I didn't try to access to char arr[] after casting.

This defines a pointer variable
A* a;
There is nothing it is cleanly pointing to, the pointer is non-initialised.
This accesses whatever it is pointing to
a->num = 1;
strcpy(a->arr, "Hi");
Without allocating anything to the pointer beforehand (by e.g. using malloc()) this is asking for segfaults as one possible consequence of the undefined behaviour it invokes.

This is an addendum to Yunnosch's answer, which identifies the problem correctly. Let's assume you do it correctly and either write just
A a;
which gives you an object of automatic storage duration when declared inside a function, or you dynamically allocated an instance of A like this:
A *a = malloc(sizeof *a);
if (!a) return -1; // or whatever else to do in case of allocation error
Then, the next thing is your cast:
B* b = (B*)a;
This is not correct, types A and B are not compatible. Here, it will probably work in practice because the struct members are compatible, but beware that strange things can happen because the compiler is allowed to assume a and b point to different objects because their types are not compatible. For more information, read on the topic of what's commonly called the strict aliasing rule.
You should also know that an incomplete array type (without a size) is only allowed as the very last member of a struct. With a definition like yours:
typedef struct {
int num;
char arr[];
} B;
the member arr is allowed to have any size, but it's your responsibility to allocate it correctly. The size of B (sizeof(B)) doesn't include this member. So if you just write
B b;
you can't store anything in b.arr, it has a size of 0. This last member is called a flexible array member and can only be used correctly with dynamic allocation, adding the size manually, like this:
B *b = malloc(sizeof *b + 64);
This gives you an instance *b with an arr of size 64. If the array doesn't have the type char, you must multiply manually with the size of your member type -- it's not necessary for char because sizeof(char) is by definition 1. So if you change the type of your array to something different, e.g. int, you'd write this to allocate it with 64 elements:
B *b = malloc(sizeof *b + 64 * sizeof *(b->arr));

It appears that you are confusing two different topics. In C99/C11 char arr[]; as the last member of a structure is a Flexible Array Member (FAM) and it allows you to allocate for the structure itself and N number of elements for the flexible array. However -- you must allocate storage for it. The FAM provides the benefit of allowing one-allocation and one-free where there would normally be two required. (In C89 a similar implementation went by the name struct hack, but it was slightly different).
For example, B *b = malloc (sizeof *b + 64 * sizeof *b->arr); would allocate storage for b plus 64-characters of storage for b->arr. You could then copy the members of a to b using the proper '.' and '->' syntax.
A short example can illustrate:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NCHAR 64 /* if you need a constant, #define one (or more) */
typedef struct {
int num;
char arr[NCHAR];
} A;
typedef struct {
int num;
char arr[]; /* flexible array member */
} B;
int main (void) {
A a = { 1, "Hi" };
B *b = malloc (sizeof *b + NCHAR * sizeof *b->arr);
if (!b) {
perror ("malloc-b");
return 1;
}
b->num = a.num;
strcpy (b->arr, a.arr);
printf ("b->num: %d\nb->arr: %s\n", b->num, b->arr);
free (b);
return 0;
}
Example Use/Output
$ ./bin/struct_fam
b->num: 1
b->arr: Hi
Look things over and let me know if that helps clear things up. Also let me know if you were asking something different. It is a little unclear exactly where you confusion lies.

Is accessing members through offsetof well defined?

When doing pointer arithmetic with offsetof, is it well defined behavior to take the address of a struct, add the offset of a member to it, and then dereference that address to get to the underlying member?
Consider the following example:
#include <stddef.h>
#include <stdio.h>
typedef struct {
const char* a;
const char* b;
} A;
int main() {
A test[3] = {
{.a = "Hello", .b = "there."},
{.a = "How are", .b = "you?"},
{.a = "I\'m", .b = "fine."}};
for (size_t i = 0; i < 3; ++i) {
char* ptr = (char*) &test[i];
ptr += offsetof(A, b);
printf("%s\n", *(char**)ptr);
}
}
This should print "there.", "you?" and "fine." on three consecutive lines, which it currently does with both clang and gcc, as you can verify yourself on wandbox. However, I am unsure whether any of these pointer casts and arithmetic violate some rule which would cause the behavior to become undefined.

As far as I can tell, it is well-defined behavior. But only because you access the data through a char type. If you had used some other pointer type to access the struct, it would have been a "strict aliasing violation".
Strictly speaking, it is not well-defined to access an array out-of-bounds, but it is well-defined to use a character type pointer to grab any byte out of a struct. By using offsetof you guarantee that this byte is not a padding byte (which could have meant that you would get an indeterminate value).
Note however, that casting away the const qualifier does result in poorly-defined behavior.
EDIT
Similarly, the cast (char**)ptr is an invalid pointer conversion - this alone is undefined behavior as it violates strict aliasing. The variable ptr itself was declared as a char*, so you can't lie to the compiler and say "hey, this is actually a char**", because it is not. This is regardless of what ptr points at.
I believe that the correct code with no poorly-defined behavior would be this:
#include <stddef.h>
#include <stdio.h>
#include <string.h>
typedef struct {
const char* a;
const char* b;
} A;
int main() {
A test[3] = {
{.a = "Hello", .b = "there."},
{.a = "How are", .b = "you?"},
{.a = "I\'m", .b = "fine."}};
for (size_t i = 0; i < 3; ++i) {
const char* ptr = (const char*) &test[i];
ptr += offsetof(A, b);
/* Extract the const char* from the address that ptr points at,
and store it inside ptr itself: */
memmove(&ptr, ptr, sizeof(const char*));
printf("%s\n", ptr);
}
}

Given
struct foo {int x, y;} s;
void write_int(int *p, int value) { *p = value; }
nothing in the Standard would distinguish between:
write_int(&s.y, 12); //Just to get 6 characters
and
write_int((int*)(((char*)&s)+offsetof(struct foo,y)), 12);
The Standard could be read in such a way as to imply that both of the above violate the lvalue-type rules since it does not specify that the stored value of a structure can be accessed using an lvalue of a member type, requiring that code wanting to access as structure member be written as:
void write_int(int *p, int value) { memcpy(p, value, sizeof value); }
I personally think that's preposterous; if &s.y can't be used to access an
lvalue of type int, why does the & operator yield an int*?
On the other hand, I also don't think it matters what the Standard says. Neither clang nor gcc can be relied upon to correctly handle code that does anything "interesting" with pointers, even in cases that are unambiguously defined by the Standard, except when invoked with -fno-strict-aliasing. Compilers that make any bona fide effort to avoid any incorrect aliasing "optimizations" in cases which would be defined under at least some plausible readings of the Standard will have no trouble handling code that uses offsetof in cases where all accesses that will be done using the pointer (or other pointers derived from it) precede the next access to the object via other means.

How can I get/set a struct member by offset

Ignoring padding/alignment issues and given the following struct, what is best way to get and set the value of member_b without using the member name.
struct mystruct {
int member_a;
int member_b;
}
struct mystruct *s = malloc(sizeof(struct mystruct));
Put another way; How would you express the following in terms of pointers/offsets:
s->member_b = 3;
printf("%i",s->member_b);
My guess is to
calculate the offset by finding the sizeof the member_a (int)
cast the struct to a single word pointer type (char?)
create an int pointer and set the address (to *charpointer + offset?)
use my int pointer to set the memory contents
but I get a bit confused about casting to a char type or if something like memset is more apropriate or if generally i'm aproching this totally wrong.
Cheers for any help

The approach you've outlined is roughly correct, although you should use offsetof instead of attempting to figure out the offset on your own. I'm not sure why you mention memset -- it sets the contents of a block to a specified value, which seems quite unrelated to the question at hand.
Here's some code to demonstrate how it works:
#include <stdio.h>
#include <stdlib.h>
#include <stddef.h>
typedef struct x {
int member_a;
int member_b;
} x;
int main() {
x *s = malloc(sizeof(x));
char *base;
size_t offset;
int *b;
// initialize both members to known values
s->member_a = 1;
s->member_b = 2;
// get base address
base = (char *)s;
// and the offset to member_b
offset = offsetof(x, member_b);
// Compute address of member_b
b = (int *)(base+offset);
// write to member_b via our pointer
*b = 10;
// print out via name, to show it was changed to new value.
printf("%d\n", s->member_b);
return 0;
}

The full technique:
Get the offset using offsetof:
b_offset = offsetof(struct mystruct, member_b);
Get the address of your structure as a char * pointer.
char *sc = (char *)s;
Add the add the offset to the structure address, cast the value to a pointer to the appropriate type and dereference:
*(int *)(sc + b_offset)

Ignoring padding and alignment, as you said...
If the elements you're pointing to are entirely of a single type, as in your example, you can just cast the structure to the desired type and treat it as an array:
printf("%i", ((int *)(&s))[1]);

It's possible calculate the offset based on the struct and NULL as reference pointer
e.g " &(((type *)0)->field)"
Example:
struct my_struct {
int x;
int y;
int *m;
int *h;
};
int main()
{
printf("offset %d\n", (int) &((((struct my_struct*)0)->h)));
return 0;
}

In this particular example, you can address it by *((int *) ((char *) s + sizeof(int))). I'm not sure why you want that, so I'm assuming didactic purposes, therefore the explanation follows.
The bit of code translates as: take the memory starting at address s and treat it as memory pointing to char. To that address, add sizeof(int) char-chunks - you will get a new address. Take the value that the address thus created and treat it as an int.
Note that writing *(s + sizeof(int)) would give the address at s plus sizeof(int) sizeof(mystruct) chunks
Edit: as per Andrey's comment, using offsetof:
*((int *) ((byte *) s + offsetof(struct mystruct, member_b)))
Edit 2: I replaced all bytes with chars as sizeof(char) is guaranteed to be 1.

It sounds from your comments that what you're really doing is packing and unpacking a bunch of disparate data types into a single block of memory. While you can get away with doing that with direct pointer casts, as most of the other answers have suggested:
void set_int(void *block, size_t offset, int val)
{
char *p = block;
*(int *)(p + offset) = val;
}
int get_int(void *block, size_t offset)
{
char *p = block;
return *(int *)(p + offset);
}
The problem is that this is non-portable. There's no general way to ensure that the types are stored within your block with the correct alignment, and some architectures simply cannot do loads or stores to unaligned addresses. In the special case where the layout of your block is defined by a declared structure, it will be OK, because the struct layout will include the necessary padding to ensure the right alignment. However since you can't access the members by name, it sounds like this isn't actually what you're doing.
To do this portably, you need to use memcpy:
void set_int(void *block, size_t offset, int val)
{
char *p = block;
memcpy(p + offset, &val, sizeof val);
}
int get_int(void *block, size_t offset)
{
char *p = block;
int val;
memcpy(&val, p + offset, sizeof val);
return val;
}
(similar for the other types).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Strict aliasing violations in C and how to write conformant code? - c

Related

Does the stb lib violate Strict Aliasing rules in C?

Need help to resolve warning: dereferencing type-punned pointer will break strict-aliasing rules

Copying char arr[64] to char arr[] can cause a segmentation fault?

Is accessing members through offsetof well defined?

How can I get/set a struct member by offset

Categories

Resources