Does dereferencing a cast to an anonymous structure pointer violate strict aliasing? - c

I have heard conflicting things about the extent to which the C standards guarantee structure layout consistency. Arguments for a limited extent have mentioned strict aliasing rules. For example, compare these two answers: https://stackoverflow.com/a/3766251/1306666 and https://stackoverflow.com/a/3766967/1306666.
In the following code I assume in all structures foo, bar, and struct { char *id; } that char *id is in the same place, making it safe to cast between them if it is the only member accessed.
Regardless of whether the cast will ever result in an error, does it violate strict aliasing rules?
#include <string.h>
struct foo {
char *id;
int a;
};
struct bar {
char *id;
int x, y, z;
};
struct list {
struct list *next;
union {
struct foo *foop;
struct bar *barp;
void *either;
} ptr;
};
struct list *find_id(struct list *l, char *key)
{
while (l != NULL) {
/* cast to anonymous struct and dereferenced */
if (!strcmp(((struct { char *id; } *)(l->ptr.either))->id, key))
return l;
l = l->next;
}
return NULL;
}
gcc -o /dev/null -Wstrict-aliasing test.c
Note gcc gives no errors.

Yes, there are multiple aliasing-related issues in your program. The use of the lvalue with anonymous structure type, which does not match the type of the underlying object, results in undefined behavior. It could be fixed with something like:
*(char**)((char *)either + offsetof(struct { ... char *id; ... }, id))
if you know the id member is at the same offset in all of them (e.g. they all share same prefix). But in your specific case where it's the first member you can just do:
*(char**)either
because it's always valid to convert a pointer to a struct to a pointer to its first member (and back).
A separate issue is that your use of the union is wrong. The biggest issue is that it assumes struct foo *, struct bar *, and void * all have the same size and representation, which is not guaranteed. Also, it's arguably undefined to access a member of the union other than the one which was previously stored, but as a result of interpretations in defect reports, it's probably safe to say it's equivalent to a "reinterpret cast". But that gets you back to the issue of wrongly assuming same size/representation.
You should just remove the union, use a void * member, and convert the value (rather than reinterpret the bits) to the right pointer type to access the pointed-to structure (struct foo * or struct bar *) or its initial id field (char *).

Related

Type Punning with Unions and Heap

I have read a lot about type punning and how it is not good to just use a cast.
oldType* data = malloc(sizeof(oldType));
((newtype*)data)->newElement;
This results in undefined behavior. So the solution is to use union so that the compiler knows that these two pointers are linked to one another so it doesn't do funny things with strict aliasing.
That being said the unions also looked like:
union testing
{
struct test1 e;
struct test2 f;
}
Is it defined behavior if pointers are used in the union?
union testing
{
struct test1* e;
struct test2* f;
}
Here is a full example:
#include <stdio.h>
#include <stdlib.h>
struct test1
{
int a;
char b;
};
struct test2
{
int c;
char d;
};
union testing
{
struct test1* e;
struct test2* f;
};
void printer(const struct test2* value);
int main()
{
struct test1* aQuickTest = malloc(sizeof(struct test1));
aQuickTest->a = 42;
aQuickTest->b = 'a';
printer(((union testing)aQuickTest).f);
((union testing)aQuickTest.f)->c = 111; // using -> not .
return 0;
}
void printer(const struct test2* value)
{
printf("Int: %i Char: %c",value->c, value->d);
}
Or would I need to use unions without pointers. And then use printer(&(((union testing)aQuickTest).f)); (with the &) to get the address of f.
It is non-conforming to cast to a union type, as your code does:
printer(((union testing)aQuickTest).f);
For that reason, your code does have undefined behavior as far as the Standard is concerned.
More directly to the point, however, no, your approach of putting pointers into a union does not avoid strict aliasing violations with respect to the pointed-to types, even without the casting issue. In your case, the effect is that where your union testing is in scope, implementations cannot assume that objects of type struct test1 ** and struct test2 ** do not alias each other. That does not prevent undefined defined behavior resulting from accessing an object with effective type struct test1 through an lvalue of type struct test2.
Suppose you want to type pun types X and Y, you should use the union -
typedef union {
X x;
Y y;
}X_Y;
This allows you to share the bit representation of X with Y and vice versa.
If you use -
typedef union {
X* x;
Y* y;
}X_Y_p;
you are sharing the bit representations for the pointer. For a system that uses the same bit representation for all pointer, you are essentially casting pointer of X to pointer of Y, which you identified causes Undefined Behaviour.
It is not illegal to have something X_Y_p because X* and Y* are types by themselves. But they achieve something different. They let you type pun pointers, which is not what you want to do (and not necessary in most cases, because pointers share representation on most systems). A cast should be fine there.

C: casting to structure with different size

Quick simple question;
Does this
typedef struct {int a; int b;} S1;
typedef struct {int a;} S2;
((S2*)(POINTER_TO_AN_S1))->a=1;
Always return (and assign) the member a of the structure? Or is it undefined behavior?
In a conforming compiler, if both structure types appear within the complete definition of a union type which is visible where the structure is accessed, and if the target of the pointer happened to be an instance of that union type, behavior would be defined. Note that the Standard does not require that the compiler have any way of knowing that the target of the pointer is actually an object of that union type--merely that the declaration of the complete union type be visible.
Note, however, that gcc does not abide by the Standard here, unless the -fno-strict-aliasing flag is used. Even in cases where the complete union type is visible, and a compiler can see that it is in fact working with objects of the union type, gcc ignores the aliasing. For example, given:
struct s1 {int x;};
struct s2 {int x;};
union u { struct s1 s1; struct s2 s2;};
int read_s1_x(struct s1 *p) { return p->x; }
int read_s2_x(struct s2 *p) { return p->x; }
int write_s1_x(struct s1 *p, int value) { p->x = value; }
int write_s2_x(struct s2 *p, int value) { p->x = value; }
int test(union u *u1, union u *u2)
{
write_s2_x(&u2->s2, 0);
if (!read_s1_x(&u1->s1))
write_s2_x(&u2->s2, 1);
return read_s1_x(&u1->s1);
}
a compiler will decide that it no doesn't need to re-read the value of
u1->s1.x after it writes u2->s2.x, even though the complete union type
is visible and even though a compiler can see that both u1 and u2 are
pointers to objects of the union type. I'm not quite sure what the
authors of gcc think the address-of operator is supposed to mean when
applied to a union type if the resulting pointer can't even be used to
immediately access an object of that member type.

Cast between struct pointer in C

Please consider the following code.
typedef struct{
int field_1;
int field_2;
int field_3;
int field_4;
uint8_t* data;
uint32_t data_size;
} my_struct;
void ext_function(inalterable_my_struct* ims, ...);
I want to allow ext_function (written by a third party) to modify only field_3and field_4 in my_struct. So I do the following:
typedef struct{
const int field_1;
const int field_2;
int field_3;
int field_4;
const uint8_t* data;
const uint32_t data_size;
} inalterable_my_struct;
void ext_function(inalterable_my_struct* ims, ...);
Is it safe to cast pointers between my_struct and inalterable_my_struct before calling ext_function (as shown after)?
void call_ext_function(my_struct* ms){
inalterable_my_struct* ims = (inalterable_my_struct*)ms;
ext_function(ims, ...);
}
I don't think this is a good idea.
The called function can always cast away any const:ness, and modify the data if it wants to.
If you can control the callpoints, it would be better to create a copy and call the function with a pointer to the copy, then copy back the two fields you care about:
void call_ext_function(my_struct* ms)
{
my_struct tmp = *ms;
ext_function(&tmp, ...);
ms->field_3 = tmp.field_3;
ms->field_4 = tmp.field_4;
}
much cleaner, and unless you do this thousands of times a second the performance penalty should really be minor.
You might have to fake the pointer-based data too, if the function touches it.
According to the C99 standard, two structs would not have compatible types even if their declarations were identical. From the section 6.7.7.5:
EXAMPLE 2 After the declarations
typedef struct s1 { int x; } t1, *tp1;
typedef struct s2 { int x; } t2, *tp2;
type t1 and the type pointed to by tp1 are compatible. Type t1 is also compatible with type struct s1, but not compatible with the types struct s2, t2, the type pointed to by tp2, or int.
Moreover, two types with different qualifiers are not considered compatible:
For two qualified types to be compatible, both shall have the identically qualified version
of a compatible type; the order of type qualifiers within a list of specifiers or qualifiers
does not affect the specified type.
A cleaner approach would be to hide your struct altogether, replace it with an obscure handle (a typedef on top of void*) and provide functions for manipulating the elements of the struct. This way you would retain full control over the structure of your struct: you would be able to rename its fields at will, change the layout as much and as often as you wish, change underlying types of the fields, and do other things that you normally avoid when the inner layout of the struct is known to your clients.
I don't think it's a good idea, because it is hard to track whether the structure has been cast or not (especially if the code is large). Also casting it into const does not guarantee that it won't be cast to a non-const structure later.
The solution provided by unwind is a very good one. An alternate (and more obvious) solution would be to split the structure into two smaller parts.
typedef struct{
const int field_1;
const int field_2;
const uint8_t* data;
const uint32_t data_size;
} inalterable_my_struct;
typedef struct{
int field_3;
int field_4;
} my_struct;
void ext_function(const inalterable_my_struct* const ims, my_struct* ms ...);
I have made the pointer also constant in the above call, but that is not necessary.
It will probably work on most compliers even though the standard doesn't say anything about it. You can probably even do something more portable with a union if you really have to. Except it won't change anything.
This is why it won't change anything:
$ cat foo.c
struct foo {
const int a;
int b;
};
void
foo(struct foo *foo)
{
foo->a = 1;
}
$ cc -c foo.c
foo.c: In function ‘foo’:
foo.c:9: error: assignment of read-only member ‘a’
$ cc -Dconst= -c foo.c
$

getting C error: conversion to non-scalar type requested

Hey I am getting this error:
error: conversion to non-scalar type requested
Here are my structs:
typedef struct value_t value;
struct value{
void* x;
int y;
value* next;
value* prev;
};
typedef struct key_t key;
struct key{
int x;
value * values;
key* next;
key* prev;
};
Here is the code that is giving me problems:
struct key new_node = (struct key) calloc(1, sizeof(struct key));
struct key* curr_node = head;
new_node.key = new_key;
struct value head_value = (struct value) calloc(1, sizeof(struct value))
Am I not suppose to use calloc on structs? Also, I have a struct that I have created and then I want to set that to a pointer of that same struct type but getting an error. This is an example of what I am doing:
struct value x;
struct value* y = *x;
this gives me this error
error: invalid type argument of ‘unary *’
When I do y = x, I get this warning:
warning: assignment from incompatible pointer type
You are trying to assign a pointer expression (the return type of malloc() and friends is void*) to a struct type (struct new_node). That is nonsense. Also: the cast is not needed (and possibly dangerous, since it can hide errors)
struct key *new_node = calloc(1, sizeof *new_node);
the same problem with the other malloc() line:
struct value *head_value = calloc(1, sizeof *head_value);
More errors: You are omitting the 'struct' keyword (which is allowed in C++, but nonsense in C):
struct key{
int x;
struct value *values;
struct key *next;
struct key *prev;
};
UPDATE: using structs and pointers to struct.
struct key the_struct;
struct key other_struct;
struct key *the_pointer;
the_pointer = &other_struct; // a pointer should point to something
the_struct.x = 42;
the_pointer->x = the_struct.x;
/* a->b can be seen as shorthand for (*a).b :: */
(*thepointer).x = the_struct.x;
/* and for the pointer members :: */
the_struct.next = the_pointer;
the_pointer->next = malloc (sizeof *the_pointer->next);
I don't think you've correctly understood typedefs.
The common idiom with using typedefs for convenience naming is this:
struct foo {
int something;
};
typedef struct foo foo_t;
Then you use the type foo_t instead of the less convenient struct foo.
For convenience, you can combine the struct declaration and the typedef into one block:
typedef struct {
int something;
} foo_t;
This defines a foo_t just like the above.
The last token on the typedef line is the name you're assigning. I have no idea what the code you wrote is actually doing to your namespace, but I doubt it's what you want.
Now, as for the code itself: calloc returns a pointer, which means both your cast and your storage type should be struct key* (or, if you fix your naming, key_t). The correct line is struct key* new_node = (struct key*)calloc(1, sizeof(struct key));
For your second, independent, issue, the last line should be struct value* y = &x;. You want y to store the address of x, not the thing at address x. The error message indicates this - you are misusing the unary star operator to attempt to dereference a non-pointer variable.
struct key new_node = (struct key) calloc(1, sizeof(struct key));
calloc returns a pointer value (void *), which you are trying to convert and assign to an aggregate (IOW, non-scalar) type (struct key). To fix this, change the type of new_node to struct key * and rewrite your allocation as follows:
struct key *new_node = calloc(1, sizeof *new_node);
Two things to note. First of all, ditch the cast expression. malloc, calloc, and realloc all return void *, which can be assigned to any object pointer type without need for a cast1. In fact, the presence of a cast can potentially mask an error if you forget to include stdlib.h or otherwise don't have a declaration for malloc in scope2.
Secondly, note that I use the expression *new_node as the argument to sizeof, rather than (struct key). sizeof doesn't evaluate it's operator (unless it's a variable array type, which this isn't); it just computes the type of the expression. Since the type of the expression *new_node is struct key, sizeof will return the correct number of bytes to store that object. It can save some maintenance headaches if your code is structured like
T *foo;
... // more than a few lines of code
foo = malloc(sizeof (T))
and you change the type of foo in the declaration, but forget to update the malloc call.
Also, it's not clear what you're trying to accomplish with your typedefs and struct definitions. The code
typedef struct value_t value;
struct value{
void* x;
int y;
value* next;
value* prev;
};
isn't doing what you think it is. You're creating a typedef name value which is a synonym for an as-yet-undefined type struct value_t. This value type is different from the struct value type you create later (typedef names and struct tags live in different namespaces). Rewrite your structs to follow this model:
struct value_t {
void *x;
int y;
struct value_t *next;
struct value_t *prev;
};
typedef struct value_t value;
Also, life will be easier if you write your declarations so that the * is associated with the declarator, not the type specifier3. A declaration like T* p is parsed as though it were written T (*p). This will save you the embarrassment of writing int* a, b; and expecting both a and b to be pointers (b is just a regular int).
1 - This is one area where C and C++ differ; C++ does not allow implicit conversions between void * and other object pointer types, so if you compile this as C++ code, you'll get an error at compile time. Also, before the 1989 standard was adopted, the *alloc functions returned char *, so in those days a cast was required if you were assigning to a different pointer type. This should only be an issue if you're working on a very old system.
2 - Up until the 1999 standard, if the compiler saw a function call without a preceding declaration, it assumed the function returned an int (which is why you still occasionally see examples like
main()
{
...
}
in some tutorials; main is implicitly typed to return int. As of C99, this is no longer allowed). So if you forget to include stdlib.h and call calloc (and you're not compiling as C99), the compiler will assume the function returns an int and generate the machine code accordingly. If you leave the cast off, the compiler will issue a diagnostic to the effect that you're trying to assign an int value to a pointer, which is not allowed. If you leave the cast in, the code will compile but the pointer value may be munged at runtime (conversions of pointers to int and back to pointers again is not guaranteed to be meaningful).
3 - There are some rare instances, limited to C++, where the T* p style can make code a little more clear, but in general you're better off following the T *p style. Yes, that's a personal opinion, but one that's backed up by a non-trivial amount of experience.
calloc(3) returns a pointer to the memory it allocates.
struct key new_node = (struct key) calloc(1, sizeof(struct key));
should be
struct key* new_node = calloc(1, sizeof(struct key));
You should not assign a pointer to a non-pointer variable. Change new_node to be a pointer.
Also, to use the address of variable, you need &, not *, so change it to struct value* y = &x;
Edit: your typedefs are wrong too. reverse them.
For the second problem, you want to use an ampersand & instead of an astrisk "*`. An astrisk dereferences a pointer, an ampersand gives you the pointer from the value.

typecheck for return value

I have a list in which i want to be able to put different types. I have a function that returns the current value at index:
void *list_index(const List * list, int index) {
assert(index < list->size);
return list->data[index];
}
In the array there are multiple types, for example:
typedef struct structA { List *x; char *y; List *z; } structA;
typedef struct structB { List *u; char *w; } structB;
Now in order to get data from the array:
structA *A;
structB *B;
for(j=0... ) {
A = list_index(list, j);
B = list_index(list, j);
}
But now how do I find out the type of the return value? Is this possible with typeof (I'm using GCC btw)?
And is this even possible or do i have to make some sort of different construction?
You'll have to use unions like shown here.
The best way to solve this would be to use unions.
Another way would be to memcpy() the list item to an actual struct (i.e., not a pointer) of the appropriate type. This would have the advantage of making each List item as small as possible.
A third way would be to just cast the pointer types as in type punning. C allows this as long as the object is dereferenced with its either its correct type or char.
Either way, you will need to put a code in each structure that identifies the type of object. There is no way the compiler can figure out what a pointer points to for you. And even if you could use typeof, you shouldn't. It's not C99.
Technically, if you don't use a union, you will have a problem making a legal C99 access to the type code, because you will need to make a temporary assumption about the type and this will violate the rule that objects must be dereferenced as their actual type, via a union, or via a char *. However, since the type code must by necessity be in the same position in every type (in order to be useful) this common technical violation of the standard will not actually cause an aliasing optimization error in practice.
Actually, if you make the type code a char, make it the first thing in the struct, and access it via a char *, I think you will end up with code that is a bit confusing to read but is perfectly conforming C99.
Here is an example, this passes gcc -Wall -Wextra
#include <stdio.h>
#include <stdlib.h>
struct A {
char typeCode;
int something;
};
struct B {
char typeCode;
double somethingElse;
};
void *getMysteryList();
int main()
{
void **list = getMysteryList();
int i;
for (i = 0; i < 2; ++i)
switch (*(char *) list[i]) {
case 'A':
printf("%d\n", ((struct A *) list[i])->something);
break;
case 'B':
printf("%7.3f\n", ((struct B *) list[i])->somethingElse);
break;
}
return 0;
}
void *getMysteryList()
{
void **v = malloc(sizeof(void *) * 2);
struct A *a = malloc(sizeof(struct A));
struct B *b = malloc(sizeof(struct B));
a->typeCode = 'A';
a->something = 789;
b->typeCode = 'B';
b->somethingElse = 123.456;
v[0] = a;
v[1] = b;
return v;
}
C handles types and typing entirely at compile time (no dynamic typing), so once you've cast a pointer to a 'void *' its lost any information about the original type. You can cast it back to the original type, but you need to know what that is through some other method.
The usual way to do this is with some kind of type tag or descriptor in the beginning of all the objects that might be stored in your list type. eg:
typedef struct structA { int tag; List *x; char *y; List *z; } structA;
typedef struct structB { int tag; List *u; char *w; } structB;
enum tags { structAtype, structBtype };
You need to ensure that every time you create a structA or a structB, you set the tag field properly. Then, you can cast the void * you get back from list_index to an int * and use that to read the tag.
void *elem = list_index(list, index)
switch (*(int *)elem) {
case structAtype:
/* elem is a structA */
:
case structBtype:
/* elem is a structB */
Make the elements you want to put into the list all inherit from a common base class. Then you can have your base class contain members that identify the actual type.
class base {
public:
typedef enum {
type1,
type2,
type3
} realtype;
virtual realtype whatAmI()=0;
};
class type_one : public base {
public:
virtual base::realtype whatAmI() { return base::type1; };
};
class type_two : public base {
public:
virtual base::realtype whatAmI() { return base::type2; };
};
After that, you'd declare your list type like:
std::list<base *> mylist;
and you can stuff pointers to any of the derived types into the list. Then when you take them out, you can just call 'whatAmI()' to find out what to cast it to.
Please note: Trying to do this in C++ means you are doing something in a way that's not a good match for C++. Any time you deliberately evade the C++ type system like this, it means you're giving up most of the usefulness of C++ (static type checking), and generally means you're creating large amounts of work for yourself later on, not only as you debug the first iteration of this app, but especially at maintenance time.
You have some choices. Keep in mind that C is basically not a dynamically typed language.
You Make a common base for the structs, and put a simple type indicator of your own in it.
struct base {
int type_indication:
};
then
struct structA {
struct base base;
...
};
and then you can cast the pointer to (struct base *).

Resources