Will this trick work in C? - c

I want to add a field to a structure in C. So for example I have the following structure.
struct A
{
some_type x;
some_type y;
}
I declare a new structure, like this.
struct B
{
A a;
some_type z;
}
Now say I have a function like this.
int some_function( A * a )
Is it possible to pass a variable of type B to it like this in the program.
B * b;
......
A * a = (A*)b;
some_function( a );
And also be able to use the fields inside some_function by using a->x for example?

Yes, it is valid. Word of the Standard, C99 6.7.2.1/13:
... A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.

Yes, it would work. A a will be the first member in the struct
This is how some people simulated OO inheritance in C
You may use
&b->a
instead of the cast.
And probably do an ASSERT like
ASSERT (&b->a == b)
to be warned when you accidentally destroyed this semantic

Why not just call the method on the member?
some_function( &b->a );
Your code works now, but what if somebody decides to change the members of B? Or add a new member before a?

Yes, this would work, but only by accident.
If I recall the C99 standard correctly, this particular case is specifically required to work as you expect. But it's clear that this is only because sufficiently many people relied on it working before that, and it did work by accident in sufficiently many implementations, that the C99 standards committee felt obliged to legislate for it working de jure as well as de facto.
Don't let that tempt you into thinking that this is a good idea.
Anything which relies on standards edge-cases is permanently teetering on the edge of brokenness, because it looks hacky (and so makes your code's future readers uncomfortable) and looks clever (which makes them nervous of changing/fixing anything). Also it leads folk into making assumptions which, because you're already on the edge of what's legitimate, can tempt folk across the border into broken code. For example, the fact that the first element within the first sub-struct within a struct is aligned as you expect, does not imply that any other sub-elements are lined up. That fact that it works for your compiler does not imply that it'll work for anyone else's, leading to mind-bendingly confusing bugs.
Write:
A *a = &(b->a);
(as the comment above suggests) and your meaning is clear.
If for some obscure reason you have to cast B* to A*, then write a very clear comment explaining why you have no option but to do what you have to do, assuring the reader that it is legitimate, and pointing to the subsubsection of the C99 standard which licenses it.
If you really cannot find that subsection (and finding it is your homework/penance), then comment thus and I'll dig it up.

No it won't work. It would work if you change it a bit:
struct A
{
some_type x;
some_type y;
}; /* <- note semicolon here */
struct B
{
struct A a;
some_type z;
}; /* ... and here */
int some_function(struct A *a ); /* ... and here ... */
struct B *b;
......
struct A *a = (struct A*)b;
some_function( a );

Related

C function that returns a pointer to an array correct syntax?

In C you can declare a variable that points to an array like this:
int int_arr[4] = {1,2,3,4};
int (*ptr_to_arr)[4] = &int_arr;
Although practically it is the same as just declaring a pointer to int:
int *ptr_to_arr2 = int_arr;
But syntactically it is something different.
Now, how would a function look like, that returns such a pointer to an array (of int e.g.) ?
A declaration of int is int foo;.
A declaration of an array of 4 int is int foo[4];.
A declaration of a pointer to an array of 4 int is int (*foo)[4];.
A declaration of a function returning a pointer to an array of 4 int is int (*foo())[4];. The () may be filled in with parameter declarations.
As already mentioned, the correct syntax is int (*foo(void))[4]; And as you can tell, it is very hard to read.
Questionable solutions:
Use the syntax as C would have you write it. This is in my opinion something you should avoid, since it's incredibly hard to read, to the point where it is completely useless. This should simply be outlawed in your coding standard, just like any sensible coding standard enforces function pointers to be used with a typedef.
Oh so we just typedef this just like when using function pointers? One might get tempted to hide all this goo behind a typedef indeed, but that's problematic as well. And this is since both arrays and pointers are fundamental "building blocks" in C, with a specific syntax that the programmer expects to see whenever dealing with them. And the absensce of that syntax suggests an object that can be addressed, "lvalue accessed" and copied like any other variable. Hiding them behind typedef might in the end create even more confusion than the original syntax.
Take this example:
typedef int(*arr)[4];
...
arr a = create(); // calls malloc etc
...
// somewhere later, lets make a hard copy! (or so we thought)
arr b = a;
...
cleanup(a);
...
print(b); // mysterious crash here
So this "hide behind typedef" system heavily relies on us naming types somethingptr to indicate that it is a pointer. Or lets say... LPWORD... and there it is, "Hungarian notation", the heavily criticized type system of the Windows API.
A slightly more sensible work-around is to return the array through one of the parameters. This isn't exactly pretty either, but at least somewhat easier to read since the strange syntax is centralized to one parameter:
void foo (int(**result)[4])
{
...
*result = &arr;
}
That is: a pointer to a pointer-to-array of int[4].
If one is prepared to throw type safety out the window, then of course void* foo (void) solves all of these problems... but creates new ones. Very easy to read, but now the problem is type safety and uncertainty regarding what the function actually returns. Not good either.
So what to do then, if these versions are all problematic? There are a few perfectly sensible approaches.
Good solutions:
Leave allocation to the caller. This is by far the best method, if you have the option. Your function would become void foo (int arr[4]); which is readable and type safe both.
Old school C. Just return a pointer to the first item in the array and pass the size along separately. This may or may not be acceptable from case to case.
Wrap it in a struct. For example this could be a sensible implementation of some generic array type:
typedef struct
{
size_t size;
int arr[];
} array_t;
array_t* alloc (size_t items)
{
array_t* result = malloc(sizeof *result + sizeof(int[items]));
return result;
}
The typedef keyword can make things a lot clearer/simpler in this case:
int int_arr[4] = { 1,2,3,4 };
typedef int(*arrptr)[4]; // Define a pointer to an array of 4 ints ...
arrptr func(void) // ... and use that for the function return type
{
return &int_arr;
}
Note: As pointed out in the comments and in Lundin's excellent answer, using a typedef to hide/bury a pointer is a practice that is frowned-upon by (most of) the professional C programming community – and for very good reasons. There is a good discussion about it here.
However, although, in your case, you aren't defining an actual function pointer (which is an exception to the 'rule' that most programmers will accept), you are defining a complicated (i.e. difficult to read) function return type. The discussion at the end of the linked post delves into the "too complicated" issue, which is what I would use to justify use of a typedef in a case like yours. But, if you should choose this road, then do so with caution.

Does access through pointer change strict aliasing semantics?

With these definitions:
struct My_Header { uintptr_t bits; }
struct Foo_Type { struct My_Header header; int x; }
struct Foo_Type *foo = ...;
struct Bar_Type { struct My_Header header; float x; }
struct Bar_Type *bar = ...;
Is it correct to say that this C code ("case one"):
foo->header.bits = 1020;
...is actually different semantically from this code ("case two"):
struct My_Header *alias = &foo->header;
alias->bits = 1020;
My understanding is that they should be different:
Case One considers the assignment unable to affect the header in a Bar_Type. It only is seen as being able to influence the header in other Foo_Type instances.
Case Two, by forcing access through a generic aliasing pointer, will cause the optimizer to realize all bets are off for any type which might contain a struct My_Header. It would be synchronized with access through any pointer type. (e.g. if you had a Foo_Type which was pointing to what was actually a Bar_Type, it could access through the header and reliably find out which it had--assuming that's something the header bits could tell you.)
This relies on the optimizer not getting "smart" and making case two back into case one.
The code bar->header.bits = 1020; is exactly identical to struct My_Header *alias = &bar->header; alias->bits = 1020;.
The strict aliasing rule is defined in terms of access to objects through lvalues:
6.5p7 An object shall have its stored value accessed
only by an lvalue expression that has one of the following
types:
The only things that matter are the type of the lvalue, and the effective type of the object designated by the lvalue. Not whether you stored some intermediate stages of the lvalue's derivation in a pointer variable.
NOTE: The question was edited since the following text was posted. The following text applies to the original question where the space was allocated by malloc, not the current question as of August 23.
Regarding whether the code is correct or not. Your code is equivalent to Q80 effective_type_9.c in N2013 rev 1571, which is a survey of existing C implementations with an eye to drafting improved strict aliasing rules.
Q80. After writing a structure to a malloc’d region, can its members be accessed via a pointer to a different structure type that has the same leaf member type at the same offset?
The stumbling block is whether the code (*bar).header.bits = 1020; sets the effective type of only the int bits; or of the entire *bar. And accordingly, whether reading (*foo).header.bits reads an int, or does it read the entire *foo?
Reading only an int would not be a strict aliasing violation (it's OK to read int as int); but reading a Bar_Struct as Foo_Struct would be a violation.
The authors of this paper consider the write to set the effective type for the entire *bar, although they don't give their justification for that, and I do not see any text in the C Standard to support that position.
It seems to me there's no definitive answer currently for whether or not your code is correct.
The fact that you have two structures which contain My_Header is a red herring and complicates your thinking without bringing anything new to the table. Your problem can be stated and clarified without any struct (other than My_Header ofcourse).
foo->header.bits = 1020;
The compiler clearly knows which object to modify.
struct My_Header *alias = &foo->header;
alias->bits = 1020;
Again the same is true here: with a very rudimentary analysis the compiler knows exactly which object the alias->bits = 1020; modifies.
The interesting part comes here:
void foo(struct My_Header* p)
{
p->bits = 1020;
}
In this function the pointer p can alias any object (or sub-object) of type My_header. It really doesn't matter if you have N structures who contain My_header members or if you have none. Any object of type My_Header could be potentially modified in this function.
E.g.
// global:
struct My_header* global_p;
void foo(struct My_Header* p)
{
p->bits = 1020;
global_p->bits = 15;
return p->bits;
// the compiler can't just return 1020 here because it doesn't know
// if `p` and `global_p` both alias the same object or not.
}
To convince you that the Foo_Type and Bar_Type are red herrings and don't matter look at this example for which the analysis is identical to the previous case who doesn't involve neither Foo_Type nor Bar_type:
// global:
struct My_header* gloabl_p;
void foo(struct Foo_Type* foo)
{
foo->header.bits = 1020;
global_p->bits = 15;
return foo->header.bits;
// the compiler can't just return 1020 here because it doesn't know
// if `foo.header` and `global_p` both alias the same object or not.
}
The way N1570 p5.6p7 is written, the behavior of code that accesses individual members of structures or unions will only be defined if the accesses are performed using lvalues of character types, or by calling library functions like memcpy. Even if a struct or union has a member of type T, the Standard (deliberately IMHO) refrains from giving blanket permission to access that part of the aggregate's storage using seemingly-unrelated lvalues of type T. Presently, gcc and clang seem to grant blanket permission for accessing structs, but not unions, using lvalues of member type, but N1570 p5.6p7 doesn't require that. It applies the same rules to both kinds of aggregates and their members. Because the Standard doesn't grant blanket permission to access structures using unrelated lvalues of member type, and granting such permission impairs useful optimizations, there's no guarantee gcc and clang will continue this behavior with with unrelated lvalues of member types.
Unfortunately, as can be demonstrated using unions, gcc and clang are very poor at recognizing relationships among lvalues of different types, even when one lvalue is quite visibly derived from the other. Given something like:
struct s1 {short x; short y[3]; long z; };
struct s2 {short x; char y[6]; };
union U { struct s1 v1; struct s2 v2; } unionArr[100];
int i;
Nothing in the Standard would distinguish between the "aliasing" behaviors of the following pairs of functions:
int test1(int i)
{
return unionArr[i].v1.x;
}
int test2a(int j)
{
unionArr[j].v2.x = 1;
}
int test2a(int i)
{
struct s1 *p = &unionArr[i].v1;
return p->x;
}
int test2b(int j)
{
struct s2 *p = &unionArr[j].v2;
p->x = 1;
}
Both of them use an lvalue of type int to access the storage associated with objects of type struct s1, struct s2, union U, and union U[100], even though int is not listed as an allowable type for accessing any of those.
While it may seem absurd that even the first form would invoke UB, that shouldn't be a problem if one recognizes support for access patterns beyond those explicitly listed in the Standard as a Quality of Implementation issue. According to the published rationale, the authors of the Standard thought compiler writers would to try to produce high-quality implementations, and it was thus not necessary to forbid "conforming" implementations from being of such low quality as to be useless. An implementation could be "conforming" without being able to handle test1a() or test2b() in cases where they would access member v2.x of a union U, but only in the sense that an implementation could be "conforming" while being incapable of correctly processing anything other than some particular contrived and useless program.
Unfortunately, although I think the authors of the Standard would likely have expected that quality implementations would be able to handle code like test2a()/test2b() as well as test1a()/test1b(), neither gcc nor clang supports them pattern reliably(*). The stated purpose of the aliasing rules is to avoid forcing compilers to allow for aliasing in cases where there's no evidence of it, and where the possibility of aliasing would be "dubious" [doubtful]. I've seen no evidence that they intended that quality compilers wouldn't recognize that code which takes the address of unionArr[i].v1 and uses it is likely to access the same storage as other code that uses unionArr[i] (which is, of course, visibly associated with unionArr[i].v2). The authors of gcc and clang, however, seem to think it's possible for something to be a quality implementation without having to consider such things.
(*) Given e.g.
int test(int i, int j)
{
if (test2a(i))
test2b(j);
return test2a(i);
}
neither gcc nor clang will recognize that if i==j, test2b(j) would access the same storage as test2a(i), even when though both would access the same element of the same array.

Are struct names pointers to first element?

I found a few similar questions but none of them helped much. Are struct names pointers to the first element of the struct, similar to an array?
struct example {
int foo;
int bar;
};
struct example e;
e.foo = 5;
e.bar = 10;
printf("%d\n%d\n%d\n%d\n%d\n%d\n", e, e.foo, e.bar, &e, &e.foo, &e.bar);
Output:
5
5
10
2033501712
2033501712
2033501716
All of the answers to the other questions said "no", but this output confuses me. All help would be greatly appreciated.
The address of a struct is indeed the address of the first element, though you'll need to know the type of the element in order to safely cast it.
(C17 §6.7.2.1.15: "A pointer to a structure object, suitably
converted, points to its initial member ... and vice versa. There may
be unnamed padding within as structure object, but not at its
beginning.")
While it's kind of ugly, numerous pieces of production software rely on this. QNX, for example, uses this kind of behavior in open control block (OCB) logic when writing resource managers. Gtk also something similar.
Your current implementation is dangerous though. If you must rely on this behavior, do it like so, and don't attempt to pass a pointer-to-struct as an argument to printf(), as you're intentionally breaking a feature of a language with minimal type-safety.
struct example {
int foo;
int bar;
};
struct example myStruct = { 1, 2 };
int* pFoo = (int*)&myStruct;
printf("%d", *pFoo);
Finally, this only holds for the first element. Subsequent elements may not be situation where you expect them to be, namely due to struct packing and padding.
struct names aren't pointers to anything. You are invoking undefined behaviour by passing a struct to printf with an incompatible format specifier %d. It may seem to "work" because the first member of the struct has the same address as the struct itself.

C, Struct pointer polymorphism

NOTE: this is NOT a C++ question, i can't use a C++ compiler, only a C99.
Is this valid(and acceptable, beautiful) code?
typedef struct sA{
int a;
} A;
typedef struct aB{
struct sA a;
int b;
} B;
A aaa;
B bbb;
void init(){
bbb.b=10;
bbb.a.a=20;
set((A*)&bbb);
}
void set(A* a){
aaa=*a;
}
void useLikeB(){
printf("B.b = %d", ((B*)&aaa)->b);
}
In short, is valid to cast a "sub class" to "super class" and after recast "super class" to "sub class" when i need specified behavior of it?
Thanks
First of all, the C99 standard permits you to cast any struct pointer to a pointer to its first member, and the other way (6.7.2.1 Structure and union specifiers):
13 Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
In other way, in your code you are free to:
Convert B* to A* — and it will always work correctly,
Convert A* to B* — but if it doesn't actually point to B, you're going to get random failures accessing further members,
Assign the structure pointed through A* to A — but if the pointer was converted from B*, only the common members will be assigned and the remaining members of B will be ignored,
Assign the structure pointed through B* to A — but you have to convert the pointer first, and note (3).
So, your example is almost correct. But useLikeB() won't work correctly since aaa is a struct of type A which you assigned like stated in point (4). This has two results:
The non-common B members won't be actually copied to aaa (as stated in (3)),
Your program will fail randomly trying to access A like B which it isn't (you're accessing a member which is not there, as stated in (2)).
To explain that in a more practical way, when you declare A compiler reserves the amount of memory necessary to hold all members of A. B has more members, and thus requires more memory. As A is a regular variable, it can't change its size during run-time and thus can't hold the remaining members of B.
And as a note, by (1) you can practically take a pointer to the member instead of converting the pointer which is nicer, and it will allow you to access any member, not only the first one. But note that in this case, the opposite won't work anymore!
I think this is quite dirty and relatively hazardous. What are you trying to achieve with this? also there is no guarantee that aaa is a B , it might also be an A. so when someone calls "uselikeB" it might fail. Also depending on architecture "int a" and "pointer to struct a" might either overlap correctly or not and might result in interesting stuff happening when you assign to "int a" and then access "struct a"
Why would you do this? Having
set((A*)&bbb);
is not easier to write than the correct
set(&bbb.a);
Other things that you should please avoid when you post here:
you use set before it is declared
aaa=a should be aaa = *a
First of all, I agree with most concerns from previous posters about the safety of this assignments.
With that said, if you need to go that route, I'd add one level of indirection and some type-safety checkers.
static const int struct_a_id = 1;
static const int struct_b_id = 2;
struct MyStructPtr {
int type;
union {
A* ptra;
B* ptrb;
//continue if you have more types.
}
};
The idea is that you manage your pointers by passing them through a struct that contains some "type" information. You can build a tree of classes on the side that describe your class tree (note that given the restrictions for safely casting, this CAN be represented using a tree) and be able to answer questions to ensure you are correctly casting structures up and down. So your "useLikeB" function could be written like this.
MyStructPtr the_ptr;
void init_ptr(A* pa)
{
the_ptr.type = struct_a_id
the_ptr.ptra = pa;
}
void useLikeB(){
//This function should FAIL IF aaa CANT BE SAFELY CASTED TO B
//by checking in your type tree that the a type is below the
//a type (not necesarily a direct children).
assert( is_castable_to(the_ptr.type,struct_b_id ) );
printf("B.b = %d", the_ptr.ptrb->b);
}
My 2 cents.

Is it possible to cast pointers from a structure type to another structure type extending the first in C?

If I have structure definitions, for example, like these:
struct Base {
int foo;
};
struct Derived {
int foo; // int foo is common for both definitions
char *bar;
};
Can I do something like this?
void foobar(void *ptr) {
((struct Base *)ptr)->foo = 1;
}
struct Derived s;
foobar(&s);
In other words, can I cast the void pointer to Base * to access its foo member when its type is actually Derived *?
You should do
struct Base {
int foo;
};
struct Derived {
struct Base base;
char *bar;
};
to avoid breaking strict aliasing; it is a common misconception that C allows arbitrary casts of pointer types: although it will work as expected in most implementations, it's non-standard.
This also avoids any alignment incompatibilities due to usage of pragma directives.
Many real-world C programs assume the construct you show is safe, and there is an interpretation of the C standard (specifically, of the "common initial sequence" rule, C99 §6.5.2.3 p5) under which it is conforming. Unfortunately, in the five years since I originally answered this question, all the compilers I can easily get at (viz. GCC and Clang) have converged on a different, narrower interpretation of the common initial sequence rule, under which the construct you show provokes undefined behavior. Concretely, experiment with this program:
#include <stdio.h>
#include <string.h>
typedef struct A { int x; int y; } A;
typedef struct B { int x; int y; float z; } B;
typedef struct C { A a; float z; } C;
int testAB(A *a, B *b)
{
b->x = 1;
a->x = 2;
return b->x;
}
int testAC(A *a, C *c)
{
c->a.x = 1;
a->x = 2;
return c->a.x;
}
int main(void)
{
B bee;
C cee;
int r;
memset(&bee, 0, sizeof bee);
memset(&cee, 0, sizeof cee);
r = testAB((A *)&bee, &bee);
printf("testAB: r=%d bee.x=%d\n", r, bee.x);
r = testAC(&cee.a, &cee);
printf("testAC: r=%d cee.x=%d\n", r, cee.a.x);
return 0;
}
When compiling with optimization enabled (and without -fno-strict-aliasing), both GCC and Clang will assume that the two pointer arguments to testAB cannot point to the same object, so I get output like
testAB: r=1 bee.x=2
testAC: r=2 cee.x=2
They do not make that assumption for testAC, but — having previously been under the impression that testAB was required to be compiled as if its two arguments could point to the same object — I am no longer confident enough in my own understanding of the standard to say whether or not that is guaranteed to keep working.
That will work in this particular case. The foo field in the first member of both structures and hit has the same type. However this is not true in the general case of fields within a struct (that are not the first member). Items like alignment and packing can make this break in subtle ways.
As you seem to be aiming at Object Oriented Programming in C I can suggest you to have a look at the following link:
http://www.planetpdf.com/codecuts/pdfs/ooc.pdf
It goes into detail about ways of handling oop principles in ANSI C.
In particular cases this could work, but in general - no, because of the structure alignment.
You could use different #pragmas to make (actually, attempt to) the alignment identical - and then, yes, that would work.
If you're using microsoft visual studio, you might find this article useful.
There is another little thing that might be helpful or related to what you are doing ..
#define SHARED_DATA int id;
typedef union base_t {
SHARED_DATA;
window_t win;
list_t list;
button_t button;
}
typedef struct window_t {
SHARED_DATA;
int something;
void* blah;
}
typedef struct window_t {
SHARED_DATA;
int size;
}
typedef struct button_t {
SHARED_DATA;
int clicked;
}
Now you can put the shared properties into SHARED_DATA and handle the different types via the "superclass" packed into the union.. You could use SHARED_DATA to store just a 'class identifier' or store a pointer.. Either way it turned out handy for generic handling of event types for me at some point. Hope i'm not going too much off-topic with this
I know this is an old question, but in my view there is more that can be said and some of the other answers are incorrect.
Firstly, this cast:
(struct Base *)ptr
... is allowed, but only if the alignment requirements are met. On many compilers your two structures will have the same alignment requirements, and it's easy to verify in any case. If you get past this hurdle, the next is that the result of the cast is mostly unspecified - that is, there's no requirement in the C standard that the pointer once cast still refers to the same object (only after casting it back to the original type will it necessarily do so).
However, in practice, compilers for common systems usually make the result of a pointer cast refer to the same object.
(Pointer casts are covered in section 6.3.2.3 of both the C99 standard and the more recent C11 standard. The rules are essentially the same in both, I believe).
Finally, you've got the so called "strict aliasing" rules to contend with (C99/C11 6.5 paragraph 7); basically, you are not allowed to access an object of one type via a pointer of another type (with certain exceptions, which don't apply in your example). See "What is the strict-aliasing rule?", or for a very in-depth discussion, read my blog post on the subject.
In conclusion, what you attempt in your code is not guaranteed to work. It might be guaranteed to always work with certain compilers (and with certain compiler options), and it might work by chance with many compilers, but it certainly invokes undefined behavior according to the C language standard.
What you could do instead is this:
*((int *)ptr) = 1;
... I.e. since you know that the first member of the structure is an int, you just cast directly to int, which bypasses the aliasing problem since both types of struct do in fact contain an int at this address. You are relying on knowing the struct layout that the compiler will use and you are still relying on the non-standard semantics of pointer casting, but in practice this is significantly less likely you give you problems.
The great/bad thing about C is that you can cast just about anything -- the problem is, it might not work. :) However, in your case, it will*, since you have two structs whose first members are both of the same type; see this program for an example. Now, if struct derived had a different type as its first element -- for example, char *bar -- then no, you'd get weird behavior.
* I should qualitfy that with "almost always", I suppose; there're a lot of different C compilers out there, so some may have different behavior. However, I know it'll work in GCC.

Resources