C11 code like the following is undefined behavior:
// test.c
#include <stdio.h>
struct point2d {
int x, y;
};
struct point3d {
int x, y, z;
};
typedef struct point2d point2d;
typedef struct point3d point3d;
int foo(point2d *p, point3d *q) {
p->x = -1;
p->y = -2;
q->x = 1;
q->y = 2;
q->z = 3;
return p->x;
}
int main(void) {
point3d r;
int n = foo((point2d *) &r, &r);
printf("%d\n", n);
return 0;
}
And indeed, it is:
wrc#raspberrypi:~ $ gcc -O0 test.c -o test; ./test
1
wrc#raspberrypi:~ $ gcc -O3 test.c -o test; ./test
-1
The C11 standard says (6.5/7):
An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.
My question is about this kind of technique, where you try to hide information by casting a pointer to a larger struct into a pointer to a smaller struct, that does not contain all members of the larger one.
Here a summary of the relevant struct definitions:
struct _GRealArray
{
guint8 *data;
guint len;
guint alloc;
guint elt_size;
guint zero_terminated : 1;
guint clear : 1;
gint ref_count;
GDestroyNotify clear_func;
};
struct _GArray
{
gchar *data;
guint len;
};
typedef struct _GRealArray GRealArray;
typedef struct _GArray GArray;
Would such a technique already be a violation of the standard? If no: What is the difference? If yes: why doesn't it matter here? Are there some practical guidelines that allow you to de-facto violate the standard in this case with no bad consequences (contrary to the test.c example above)?
The Garray example would be subject to the same rules, if it would stress them, but it seems to be carefully avoiding that.
For user code there can never be aliasing violation between the two types, because, if I see this correctly, the GRealArray type is not visible, so all these aliasing problems can't occur in user code with these two types.
For the internal code of the implementation, there seems no point where aliasing can occur for the simple reason that there always is only one array visible in each function. or to say it more directly,
the aliasing rules only apply if there is potential aliasing.
And even if it would, if that implementation consistently uses the same type for two different pointers that it handles, the compiler still would have to assume that two such pointers can point to the same object.
BTW, this only answers your direct question not if the behavior of the code that your are pointing to is well defined. This would ask for a deep review of that code.
Related
I have read a lot about type punning and how it is not good to just use a cast.
oldType* data = malloc(sizeof(oldType));
((newtype*)data)->newElement;
This results in undefined behavior. So the solution is to use union so that the compiler knows that these two pointers are linked to one another so it doesn't do funny things with strict aliasing.
That being said the unions also looked like:
union testing
{
struct test1 e;
struct test2 f;
}
Is it defined behavior if pointers are used in the union?
union testing
{
struct test1* e;
struct test2* f;
}
Here is a full example:
#include <stdio.h>
#include <stdlib.h>
struct test1
{
int a;
char b;
};
struct test2
{
int c;
char d;
};
union testing
{
struct test1* e;
struct test2* f;
};
void printer(const struct test2* value);
int main()
{
struct test1* aQuickTest = malloc(sizeof(struct test1));
aQuickTest->a = 42;
aQuickTest->b = 'a';
printer(((union testing)aQuickTest).f);
((union testing)aQuickTest.f)->c = 111; // using -> not .
return 0;
}
void printer(const struct test2* value)
{
printf("Int: %i Char: %c",value->c, value->d);
}
Or would I need to use unions without pointers. And then use printer(&(((union testing)aQuickTest).f)); (with the &) to get the address of f.
It is non-conforming to cast to a union type, as your code does:
printer(((union testing)aQuickTest).f);
For that reason, your code does have undefined behavior as far as the Standard is concerned.
More directly to the point, however, no, your approach of putting pointers into a union does not avoid strict aliasing violations with respect to the pointed-to types, even without the casting issue. In your case, the effect is that where your union testing is in scope, implementations cannot assume that objects of type struct test1 ** and struct test2 ** do not alias each other. That does not prevent undefined defined behavior resulting from accessing an object with effective type struct test1 through an lvalue of type struct test2.
Suppose you want to type pun types X and Y, you should use the union -
typedef union {
X x;
Y y;
}X_Y;
This allows you to share the bit representation of X with Y and vice versa.
If you use -
typedef union {
X* x;
Y* y;
}X_Y_p;
you are sharing the bit representations for the pointer. For a system that uses the same bit representation for all pointer, you are essentially casting pointer of X to pointer of Y, which you identified causes Undefined Behaviour.
It is not illegal to have something X_Y_p because X* and Y* are types by themselves. But they achieve something different. They let you type pun pointers, which is not what you want to do (and not necessary in most cases, because pointers share representation on most systems). A cast should be fine there.
I have heard conflicting things about the extent to which the C standards guarantee structure layout consistency. Arguments for a limited extent have mentioned strict aliasing rules. For example, compare these two answers: https://stackoverflow.com/a/3766251/1306666 and https://stackoverflow.com/a/3766967/1306666.
In the following code I assume in all structures foo, bar, and struct { char *id; } that char *id is in the same place, making it safe to cast between them if it is the only member accessed.
Regardless of whether the cast will ever result in an error, does it violate strict aliasing rules?
#include <string.h>
struct foo {
char *id;
int a;
};
struct bar {
char *id;
int x, y, z;
};
struct list {
struct list *next;
union {
struct foo *foop;
struct bar *barp;
void *either;
} ptr;
};
struct list *find_id(struct list *l, char *key)
{
while (l != NULL) {
/* cast to anonymous struct and dereferenced */
if (!strcmp(((struct { char *id; } *)(l->ptr.either))->id, key))
return l;
l = l->next;
}
return NULL;
}
gcc -o /dev/null -Wstrict-aliasing test.c
Note gcc gives no errors.
Yes, there are multiple aliasing-related issues in your program. The use of the lvalue with anonymous structure type, which does not match the type of the underlying object, results in undefined behavior. It could be fixed with something like:
*(char**)((char *)either + offsetof(struct { ... char *id; ... }, id))
if you know the id member is at the same offset in all of them (e.g. they all share same prefix). But in your specific case where it's the first member you can just do:
*(char**)either
because it's always valid to convert a pointer to a struct to a pointer to its first member (and back).
A separate issue is that your use of the union is wrong. The biggest issue is that it assumes struct foo *, struct bar *, and void * all have the same size and representation, which is not guaranteed. Also, it's arguably undefined to access a member of the union other than the one which was previously stored, but as a result of interpretations in defect reports, it's probably safe to say it's equivalent to a "reinterpret cast". But that gets you back to the issue of wrongly assuming same size/representation.
You should just remove the union, use a void * member, and convert the value (rather than reinterpret the bits) to the right pointer type to access the pointed-to structure (struct foo * or struct bar *) or its initial id field (char *).
I have a working C code when compiled using GCC, but I am trying to find out if the code works because of pure luck or because GCC handles this code as I expect by design.
NOTE
I am not trying to "fix" it. I am trying to understand the compiler
Here is what I have:
iexample.h
#ifndef IEXAMPLE_H_
#define IEXAMPLE_H_
/* The interface */
struct MyIf
{
int (* init)(struct MyIf* obj);
int (* push)(struct MyIf* obj, int x);
void (* sort)(struct MyIf* obj);
};
/* The object, can be in different header */
struct Obj1
{
struct MyIf myinterface;
int val1;
int val2;
};
struct Obj1* newObj1();
#endif
iexample.c
#include <stdio.h>
#include <stdlib.h>
#include "iexample.h"
/* Functions here are "equivalent" to methods on the Obj1 struct */
int Obj1_init(struct Obj1* obj)
{
printf("Obj1_init()\n");
return 0;
}
int Obj1_push(struct Obj1* obj, int x)
{
printf("Obj1_push()\n");
return 0;
}
void Obj1_sort(struct Obj1* obj)
{
printf("Obj1_sort()\n");
}
struct Obj1* newObj1()
{
struct Obj1* obj = malloc(sizeof(struct Obj1));
obj->myinterface.init = Obj1_init;
obj->myinterface.push = Obj1_push;
obj->myinterface.sort = Obj1_sort;
return obj;
}
main.c
#include "iexample.h"
int main(int argc, char* argv[])
{
struct MyIf* myIf = (struct MyIf*) newObj1();
myIf->init(myIf);
myIf->push(myIf, 3);
myIf->sort(myIf);
/* ... free, return ... */
}
When I compile, as I expect, I get for assigning the pointers in newObj1(),
warning: assignment from incompatible pointer type
The code works as long as I have the "struct MyIf myinterface" to be the first member of the struct, which is by design (I like to shoot myself in the foot)
Now, although I am assigning incompatible pointer types, and the C spec says behavior is undefined, does GCC or other compilers make any design claim on how this case is handled? I can almost swear that this OUGHT TO WORK due to how struct memory is laid out, but I cannot find the proof.
Thanks
C11 standard 6.7.2.1 Structure and union specifiers:
Within a structure object, the non-bit-field members and the
units in which bit-fields reside have addresses that increase in
the order in which they are declared. A pointer to a structure
object, suitably converted, points to its initial member (or
if that member is a bit-field, then to the unit in which it
resides), and vice versa. There may be unnamed padding within
a structure object, but not at its beginning.
So it should work as long, as you access only first structure member. However, I believe you understand, that this is pretty bad idea. Should you port this code to C++ and make some Obj1 member virtual, this will immediately fail.
Trying some code, I realized that the following code compiles:
struct { int x, y; } foo(void) {
}
It seems as if we are defining a function named foo which returns an anonymous struct.
Does it only happen to compile with my compiler or is this legal C(99)?
If so, what is the correct syntax for a return statement and how can I correctly assign the returned value to a variable?
The struct you're returning is not an anonymous struct. The C standard defines an anonymous struct as a member of another struct that doesn't use a tag. What you're returning is a struct without a tag, but since it isn't a member, it is not anonymous. GCC uses the name < anonymous > to indicate a struct without a tag.
Let's say you try to declare an identical struct in the function.
struct { int x, y; } foo( void )
{
return ( struct { int x, y; } ){ 0 } ;
}
GCC complains about it: incompatible types when returning type 'struct < anonymous>', but 'struct <anonymous>' was expected.
Apparently the types are not compatible. Looking in the standard we see that:
6.2.7 Compatible type and composite type
1: Two types have compatible type if their types are the same. Additional rules for determining whether two types are compatible are described in 6.7.2 for type specifiers, in 6.7.3 for type qualifiers, and in 6.7.6 for declarators. Moreover, two structure, union, or enumerated types declared in separate translation units are compatible if their tags and members satisfy the following requirements: If one is declared with a tag, the other shall be declared with the same tag. If both are completed anywhere within their respective translation units, then the following additional requirements apply: there shall be a one-to-one correspondence between their members such that each pair of corresponding members are declared with compatible types; if one member of the pair is declared with an alignment specifier, the other is declared with an equivalent alignment specifier; and if one member of the pair is declared with a name, the other is declared with the same name. For two structures, corresponding members shall be declared in the same order. For two structures or unions, corresponding bit-fields shall have the same widths. For two enumerations, corresponding members shall have the same values.
The second bold part, explains that if both struct are without the tag, such as in this example, they have to follow additional requirements listed following that part, which they do. But if you notice the first bold part, they have to be in separate translation units, and structs in the example aren't. So they are not compatible and the code is not valid.
It is impossible to make the code correct since if you declare a struct and use it in this function. You have to use a tag, which violates the rule that both have structs have to have the same tag:
struct t { int x, y; } ;
struct { int x, y; } foo( void )
{
struct t var = { 0 } ;
return var ;
}
Again GCC complains: incompatible types when returning type 'struct t' but 'struct <anonymous>' was expected
This works in my version of GCC, but seems like a total hack. It is perhaps useful in auto-generated code where you don't want to deal with the additional complexity of generating unique structure tags, but I'm sort of stretching to come up with even that rationalization.
struct { int x,y; }
foo(void) {
typeof(foo()) ret;
ret.x = 1;
ret.y = 10;
return ret;
}
main()
{
typeof(foo()) A;
A = foo();
printf("%d %d\n", A.x, A.y);
}
Also, it's dependent on typeof() being present in the compiler -- GCC and LLVM seem to support it, but I'm sure many compilers do not.
You probably cannot explicitly return some aggregate value from your function (unless you use a typeof extension to get the type of the result).
The moral of the story is that even if you can declare a function returning an anonymous struct, you should practically never do that.
Instead, name the struct and code:
struct twoints_st { int x; int y; };
struct twoints_st foo (void) {
return ((struct twoints_st) {2, 3});
};
Notice that it is syntactically ok, but it is generally undefined behavior at execution to have a function without return (e.g., you could call exit inside it). But why would you want to code the following (probably legal)?
struct { int xx; int yy; } bizarrefoo(void) { exit(EXIT_FAILURE); }
Here's a way to return anonymous structs in C++14 without any hacks I just discovered.
(C++ 11 should be sufficient, I suppose)
In my case a function intersect() returns std::pair<bool, Point> which is not very descriptive so I decided to make a custom type for the result.
I could have made a separate struct but it wasn't worth since I would need it only for this special case; that's why I used an anonymous struct.
auto intersect(...params...) {
struct
{
Point point;
bool intersects = false;
} result;
// do stuff...
return result;
}
And now, instead of the ugly
if (intersection_result.first) {
Point p = intersection_result.second
I can use the much better looking:
if (intersection_result.intersects) {
Point p = intersection_result.point;
You could define a structure in specification of the return argument. Moreover you can use compound literals introduced in C99 for brevity:
#include<stdio.h>
struct foo { int x, y; }
foo( void )
{
return (struct foo){ 1, 2 } ;
}
int main()
{
struct foo res = foo();
printf("%d %d\n", res.x, res.y);
}
prints:
1 2
The code compiles in pedantic mode for C99 with no warnings. Note that the tag name is the same as function. This works because namespaces for structs and global objects are separated. This way you minimize the chances for accidental conflict of names. You can use something like foo_result if you consider it more suitable.
Or you could create infinite recursion:
struct { int x, y; } foo(void) {
return foo();
}
Which I think is completely legal.
This works up to the newest version of GCC. It is particularly useful for creating dynamic arrays with macros. For instance:
#define ARRAY_DECL(name, type) struct { int count; type *array; } name
Then you can make the array with realloc, etc. This is useful because then you can create a dynamic array with any type, and there is one way to make all of them. Otherwise, you would end up using a lot of void *'s and then writing functions to actually get the values back out with casts and such. You can shortcut all of this with macros; that is their beauty.
Please consider the following code.
typedef struct{
int field_1;
int field_2;
int field_3;
int field_4;
uint8_t* data;
uint32_t data_size;
} my_struct;
void ext_function(inalterable_my_struct* ims, ...);
I want to allow ext_function (written by a third party) to modify only field_3and field_4 in my_struct. So I do the following:
typedef struct{
const int field_1;
const int field_2;
int field_3;
int field_4;
const uint8_t* data;
const uint32_t data_size;
} inalterable_my_struct;
void ext_function(inalterable_my_struct* ims, ...);
Is it safe to cast pointers between my_struct and inalterable_my_struct before calling ext_function (as shown after)?
void call_ext_function(my_struct* ms){
inalterable_my_struct* ims = (inalterable_my_struct*)ms;
ext_function(ims, ...);
}
I don't think this is a good idea.
The called function can always cast away any const:ness, and modify the data if it wants to.
If you can control the callpoints, it would be better to create a copy and call the function with a pointer to the copy, then copy back the two fields you care about:
void call_ext_function(my_struct* ms)
{
my_struct tmp = *ms;
ext_function(&tmp, ...);
ms->field_3 = tmp.field_3;
ms->field_4 = tmp.field_4;
}
much cleaner, and unless you do this thousands of times a second the performance penalty should really be minor.
You might have to fake the pointer-based data too, if the function touches it.
According to the C99 standard, two structs would not have compatible types even if their declarations were identical. From the section 6.7.7.5:
EXAMPLE 2 After the declarations
typedef struct s1 { int x; } t1, *tp1;
typedef struct s2 { int x; } t2, *tp2;
type t1 and the type pointed to by tp1 are compatible. Type t1 is also compatible with type struct s1, but not compatible with the types struct s2, t2, the type pointed to by tp2, or int.
Moreover, two types with different qualifiers are not considered compatible:
For two qualified types to be compatible, both shall have the identically qualified version
of a compatible type; the order of type qualifiers within a list of specifiers or qualifiers
does not affect the specified type.
A cleaner approach would be to hide your struct altogether, replace it with an obscure handle (a typedef on top of void*) and provide functions for manipulating the elements of the struct. This way you would retain full control over the structure of your struct: you would be able to rename its fields at will, change the layout as much and as often as you wish, change underlying types of the fields, and do other things that you normally avoid when the inner layout of the struct is known to your clients.
I don't think it's a good idea, because it is hard to track whether the structure has been cast or not (especially if the code is large). Also casting it into const does not guarantee that it won't be cast to a non-const structure later.
The solution provided by unwind is a very good one. An alternate (and more obvious) solution would be to split the structure into two smaller parts.
typedef struct{
const int field_1;
const int field_2;
const uint8_t* data;
const uint32_t data_size;
} inalterable_my_struct;
typedef struct{
int field_3;
int field_4;
} my_struct;
void ext_function(const inalterable_my_struct* const ims, my_struct* ms ...);
I have made the pointer also constant in the above call, but that is not necessary.
It will probably work on most compliers even though the standard doesn't say anything about it. You can probably even do something more portable with a union if you really have to. Except it won't change anything.
This is why it won't change anything:
$ cat foo.c
struct foo {
const int a;
int b;
};
void
foo(struct foo *foo)
{
foo->a = 1;
}
$ cc -c foo.c
foo.c: In function ‘foo’:
foo.c:9: error: assignment of read-only member ‘a’
$ cc -Dconst= -c foo.c
$