Extending a structure - c

typedef struct A
{
int x;
}A;
typedef struct B
{
A a;
int d;
}B;
void fn()
{
B *b;
((A*)b)->x = 10;
}
I read the above code snippet in SO. ((A*)b)->xis not good programming style.b->a.x is good programming style. Because anybody adds something before the statement "A a;" in structure b , it will not work. I don't understand why? I tried it too. Any suggestions please?

That trick is used to emulate inheritance in C. It makes possible to pass address A or B to function which expects pointer to A.
This works because C guarantees that there is no padding before the first member of struct. So if A is first member of B, memory layout at the beginning of B is always same as A.
int doStuff(A * a) {
return a->x + 1;
}
...
B b;
doStuff((A*)&b); // Will work because b and b.a have the same start address
If you would change B declaration:
typedef struct B
{
int d;
A a;
}B;
this would no longer work as (A*)&b would return address of b.d, not b.a.

What you have here is a "poor man's inheritance". Like true inheritance in C++ it's used to define a type which comprises the common features (data, function pointers) of objects which may in fact carry more information than just that common subset.
That technique is widely used in e.g. GhostScript where printer drivers carry some common information and on top special information to control that particular printer model.
The C language mechanism employed here is that a struct is essentially the concatenation of its data in memory, in the order of member declaration. That order is important for getting access right after casting.
The memory layout of your B is |---int x---|---int d ---|. There is no additional information stored. An A * points to the first element, x; so does a B *. You could have a struct c
struct C
{
B b;
float f;
};
whose layout would be |---int x---|---int d ---|-----float f---|. The interesting thing is that you can pass an A *pa to a function which somehow knows that pa actually points to a C and cast that "down": ((C *)pa)->f. (C *)pa does not change the value of pa but just tells the compiler what it points to (at the responsibility of the programmer). The knowledge about what type actually is hidden in the object is often encoded in an enum/int data member which is manually set to a magic, type-indicating value when the object is created.

Related

Is multiple-level "struct inheritance" guaranteed to work everywhere?

I know that in C, the first member of a struct is guaranteed to have no padding before it. Thus &mystruct == &mystruct.firstmember is always true.
This allows the "struct inheritance" technique, as described in this question:
typedef struct
{
// base members
} Base;
typedef struct
{
Base base;
// derived members
} Derived;
// ... later
Base* object = (Base*) malloc(sizeof()); // This is legal
However, I'd like to make sure that this actually works safely with unlimited layers of "inheritance". E.g.:
typedef struct
{
// members
} A;
typedef struct
{
A base;
// members
} B;
typedef struct
{
B base;
// members
} C;
Are all of the following uses guaranteed to work?
A* a = (A*) malloc(sizeof(B));
A* a = (A*) malloc(sizeof(C));
B* b = (B*) malloc(sizeof(C));
C* c = malloc(sizeof(C));
// ... use and access members through the pointers
EDIT:
Let me clarify what I'm asking. Is the following use of "multi-level inheritance" guaranteed to work by the C standard?
C* c = malloc(sizeof(C));
// ... initialize fields in c
A* a = (A*) c;
// ... use A fields in a
B* b = (B*) a;
// ... use B fields in b
B* b = (B*) c;
// ... use B fields in b
c = (C*) a;
// ... go back to using C fields in c
That the kind of "multi-level inheritance" you describe must work follows from the same principles -- explained in the other Q&A you referenced -- that makes this kind of inheritance work at all. Specifically, the standard explicitly provides that casting the addresses of structures and of their initial members between the applicable types has the desired effect:
A pointer to a structure object, suitably
converted, points to its initial member [...] and vice versa.
(paragraph 6.7.2.1/15)
So consider this declaration, relative to the structure definitions provided:
C c;
The quoted provision specifies that &c == (C *) &c.base and (B *) &c == &c.base are both true.
But c.base is a B, so the provision also specifies that (A *) &c.base == &c.base.base and &c.base == (B *) &c.base.base are both true.
Since (B *) &c == &c.base is true and &c.base == (B *) &c.base.base are both true, it follows that (B *) &c == (B *) &c.base.base is also true.
Casting both sides to either A * or C * then produces also the equalities (A *) &c == &c.base.base and &c == (C *) &c.base.base.
This reasoning can be extended to an arbitrary nesting depth.
One can quibble a bit about dynamically allocated structures vis a vis the strict aliasing rule, but there's no reason to think that it is supposed to work any differently in that case, and as long as one first accesses the dynamically-allocated space via an lvalue of the most specific type (C in this example), I see no scenario that supports a different interpretation of the standard for the dynamic-allocation case than applies to other cases. In practice, I do not expect initial access via the most specific type actually to be required by any implementation.
What the ISO C standard requires to work is the following situation:
union U {
struct X x;
struct Y y;
struct Z z;
/* ... */
};
If the structures share some common initial sequence of members, then that initial sequence can be accessed through any of the members. For instance:
struct X {
/* common members, same as in Y and Z: */
int type;
unsigned flags;
/* different members */
};
If all the structures have type and flags in the same order and of the same types, then this is required to work:
union U u;
u.x.type = 42; /* store through x.type */
foo(u.y.type); /* access through y.type */
Other hacks of this type are not "blessed" by ISO C.
The situation you have there is a little different. It's question of whether, given a leading member of a structure, can we convert a pointer to the structure to that member's type and then use it. The simplest case is something like this:
struct S {
int m;
};
Given an object struct S s, we can take the address of m using &s.m, obtaining an int * pointer. Equivalently, we can obtain the same pointer using (int *) &s.
ISO C does require that a structure has the same address as its first member; a pointer to the structure and a pointer to the first member have a different type, but point to the same address, and we can convert between them.
This isn't restricted by nesting levels. Given an a of this type:
struct A {
struct B {
struct C {
int m;
} c;
} b
};
the address &a.b.c.m is still the same as the address &a. The pointer &a.b.c.m is the same as (int *) &a.

Correct value type for function pointers in C

I was trying to understand function pointers in C. While reading about it on internet (mostly stack overflow QAs) - I came across 2 ways in which I can assign value to a function pointer
#include <stdio.h>
double add(int a, double b) {
return a + b;
}
double subtract(int a, double b) {
return a - b;
}
int main() {
int op_a = 23;
double op_b = 2.9;
// Case 1. Just pass the function (not its address)
double (*math_op)(int a, double b) = add;
printf("The output of Math Operation %f\n", math_op(op_a, op_b));
// Case 2. Pass the function's address
math_op = &subtract;
printf("The output of Math Operation %f\n", math_op(op_a, op_b));
printf("End of program\n");
return 0;
}
I have couple of questions
In the code above - what is the right way to assign value value to function pointer. I see several answers on stack overflow which follow the convention followed in case 1, and I also see some answers following convention in case 2. Both seem to be working correctly for me. Which is one is correct (or preferable)?
Also, in order to call function pointers I see 2 ways in which you can call them - math_op(op_a, op_b) or (*math_op)(op_a, op_b). Again, is there a preferable way of doing this - both seem to be working correctly for me.
I think
Function calls (ยง6.5.2.2/1):
The expression that denotes the called function shall have type pointer to function returning void or returning an object type other than an array type.
is enough to answer both of your questions. The name of a function is implicitly converted to pointer to function type so using the address-of operator when assigning the adress of a function to a function pointer is superfluous. As for calling functions by function pointers the syntax is no different from "ordinary" function calls.
This is really a quirk of C. The * and (a single) & are kind of ignored when taking an address of a function.
int main() {
printf("%p %p %p", (void*)add, (void*)************add, (void*)&add);
}
The * are ignored when "dereferencing" a function pointer
int main() {
double (*math_op)(int a, double b) = add;
printf("%p %p %f", (void*)math_op, (void*)*************math_op, (***************math_op)(1, 1.0));
}
It does not matter. However be consistent. Mostly I saw no * nor & used in such contexts.
Case 1 is preferred.
The & is optional to take the address of a function (just like: char buf[100]; char *bp = buf;)
The reason case 1 is preferred is that the syntax is the same for assignment to a function pointer from a function address or another function pointer:
typedef double (*math_func)(int a, double b);
math_func math_op = add;
math_func math_op2 = sub;
math_func math_op3 = math_op2;
Analogously, that's why you can say: math_op(2,23) as a shorthand for: (*math_op)(2,23)
UPDATE:
Pointer typedefs are not preferred (by me anyway)
They certainly help for function pointers [which Jonathan Leffler makes an exception for].
If you had (e.g.) 10 functions that needed to take an argument that was a function pointer, would you want to spell it out in all 10 function prototypes, particularly if the function pointer took 8 arguments itself?
Suppose [in some API] you had a function pointer:
typedef void (*funcptr)(int x,int y,int z);
Consider that you had functions of the form:
void func(funcptr f,...);
And, you had a call chain of these functions:
a -> b -> c -> d -> apiabc
Functions a-d do not call f--only apiabc does. Only apiabc has knowledge of how to call f properly. The others are merely passing along the opaque [to them] pointer so that apiabc can act on it.
If funcptr ever had to change, changing a-d would be problematic (i.e. they are not part of the API--merely users of it).
On pointer typedefs in general, IMO, they are fine if they are reasonably obvious and consistent.
This would be an example of usages:
typedef struct node *Node; // poor
typedef struct node *pNode; // MS (common but, IMO, poor)
typedef struct node *nodeP; // okay
typedef struct node *nodeptr; // better?
typedef const struct node *nodecptr; // better?
IMO, the MS version is poor because if we sorted all names pNode would get intermixed with all [unrelated] pWhatever and not gravitate to node whereas looking left to right nodeP has the dominant part first (i.e. it's related to a node).
IMO, it is hubris for POSIX to lay claim to whatever_t as its province for type names. But, it's low probability that there will be a collision [for me, anyway, because I always use names that are likely to not conflict]. If there is a collision based on a future POSIX type, I lose, and have to change my definitions.
Thus, my personal conventions are:
typedef struct node node_t;
typedef node_t *node_p;
typedef const node_t *node_pc;
Any such convention can be used so long as it's applied consistently. In a given set of code, if one encounters foo_p for the first time, yes, one has to look in the .h file for the definition. After that, when one sees bar_p, the convention is already known.
For me, this usually isn't a problem, because when I encounter an unfamiliar code base, I always look at the definitions in the .h file before looking at code in a .c

A Simple Object System

I'm working my way through the learn c the hard way book and have run into a few issues on Exercise 19. The author said that ex19 was intended for the learners to get to know the macro in c. I have no problem in understanding the concept of that, but I just don't understand everything else. I can't understand how the object prototype is created.
Especilly,what does the following sentense mean?
Since C puts the Room.proto field first, that means the el pointer is
really only pointing at enough of the block of memory to see a full
Object struct. It has no idea that it's even called proto.
the relevant code is this:
// this seems weird, but we can make a struct of one size,
// then point a different pointer at it to "cast" it
Object *el = calloc(1, size);
*el = proto;
can anyone tell me how on earth malloc/calloc exactly works? As far as i know, it just allocate the required number of memory and return the first address. If so, how can the computer know the data struct of the allocated memory? like in the code, after Room *arena = NEW(Room, "The arena, with the minotaur");,you can do this directly arena->bad_guy = NEW(Monster, "The evil minotaur"); how does the computer know there is a bad_guy??
what on earth is the content of *el after the above two statements(Object *el = calloc(1, size); and *el = proto;)?
Any help will be appreciated!!
the link to the exercise: http://c.learncodethehardway.org/book/ex19.html
calloc has the additional feature that it fills the allocated memory with zero bytes, whereas using the equivalent malloc call would require an additional step if all or some of the allocation needs to be zero initially.
In the code
arena->bad_guy = NEW(Monster, "The evil minotaur");
the compiler knows the layout of the struct because the access is through the arena variable, which is declared as a pointer to Room, which is presumably a typedef of a struct.
For the other part, the guarantee of ordering within structs allows a limited form of inheritance in composite structs, or extended structs.
struct A {
int x;
};
struct B {
int foo;
double baloney;
};
struct B (or a pointer to it) can be cast to a (pointer to a) struct A because they both begin with an int. Of course, if you cast the other way, the struct A must have been originally a struct B or access to the baloney field will be undefined. In other words, struct B essentially begins with a struct A.
This may be easier to see if I rewrite my example like this:
struct A {
int x;
};
struct B {
struct A foo;
double baloney;
};
Now you can get a struct A out of struct B in different ways.
struct A a;
struct B b;
a = b.foo; // regular member variable access
struct A *ap = &a;
struct B *bp = &b;
ap = (struct A *)bp; // cast the pointer
ap = & b.foo; // take a pointer from the member variable
ap = & bp->foo; // take a pointer from the member variable via a pointer
All it does is to alloc 1*size bytes. There's nothing magic with malloc/calloc. He is passing the sizeof(T) to the function through that NEW macro and putting it in Object_new's size parameter. So all the function knows is the size in bytes.

Is it possible to store different kinds of structs into a flexible-length array?

Let's say we have two struct types as follows:
struct A {
int a;
}
struct B {
int b;
int c;
}
Would it be possible to initialize a flexible-length array to contain instances of both A and B using designated initializers, e.g:
<sometype> my_array[] = {
((struct A){ .a = 10, }),
((struct B){ .b = 1, .c = 5, }),
};
And since I need to know the type of elements in the array, a way to put some char before the structs would be nice too. :)
I know this looks terribly broken, but I am trying to pack some bytecode-like data structures together and this looks like an elegant way to define them (well, with the help of some macros at least).
Edit: To clarify a few points:
Dynamic allocation is not an option
Neither are unions - I want the elements to occupy exactly the space needed by their type
"Variable length array" in the question could have been misleading - the exact denomination would be "flexible length array", according to http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html. The example code is ideally how I'd like it to look like.
So what I'd basically like is to be able to pack some arbitrary, structured data into a memory area that is allocated in the .data segment of the binary. I do not need random access to elements, just to pack the data from structs - the use of a flexible length array in my example is because this construct seems to be the closest from what I want to achieve. But the declaration could be anything else that does the job (except assembler, I need to retain C portability).
The best way for this would be to use unions. You could define all your types within a union, including this union and the char you wanna you for defining what is the actual type into a struct.
struct TypesAB {
char type;
union {
struct {
int a;
} A;
struct {
int b;
int c;
} B;
};
};
enum {
TypeA,
TypeB
};
With this struct, you can define your array, and then set the elements.
struct TypesAB array[10];
array[0].type = TypeA;
array[0].A.a = 10;
array[1].type = TypeB;
array[1].B.b = 1;
array[1].B.c = 5;
Note that the memory layout will make you loose some space if your A and B types are not the same length. Indeed, with the above definition, struct TypesAB will be defined with a sizeof large enough to hold the larger of the A or B, plus the char. If you use it as a A, then the memory space that would have been used for the c member is lost. The same memory space is used for the a member of A and the b member of B.

Why can I not assign interchangeably with two structs that have identical contents?

I'm trying to learn C and I've come across something weird:
struct
{
int i;
double j;
} x, y;
struct
{
int i;
double j;
} z;
Here, you can see I created two structs that are identical in their elements.
Why is it that when I try to assign x = z it will generate a compile error but x = y does not? They have the same contents, so why can't I assign them back and forth with each other, regardless?
Is there any way I can make this so I can assign x = z? Or do they simply have to be the same struct.
Can any C gurus point me in the right direction?
They have the same content, but not the same type. If they are intended to be of the same type, simply typedef x z;. If they aren't the same thing, but just happen to contain the same fields, it's better to create a separate function that will assign the fields properly.
My usual style for declaring structs in C includes the typedef, so I forgot to mention it (sorry!). Here's the syntax:
typedef struct
{
int foo;
double bar;
} x;
/* Further down, if needed */
typedef x z;
Making identically structured types the same is called "duck typing". This is done in some languages, but not in C.
The compiler does not calculate "type equivalence" between structs that may have the same structure, so as far as it is concerned, the second struct has nothing to do with the first one.
It can do x and y because they are declared at the same time.
Why are you redeclaring the struct ? You should probably typedef the struct once (e.g., in an H file) to make it an actual type, and then declare instances.
Here's a good tutorial on typedefs for structs.
struct mystruct
{
int i;
double j;
};
struct mystruct x, y;
struct mystruct z;
If you intend to copy data between them, you must declare them with the same identity. You have two declarations, it doesn't matter they are equal for the compiler, they are two different structures, and the compiler is not supposed to detect their similarity.
C differentiates structs based on name, and if they're anonymous, then different structure definitions are different.
Anyhow, classic C doesn't allow x = z when x and z are structs -- is that an ANSI or a C99 addition? Anyhow, you should instead use
#include <string.h>
memcpy(&x, &z, sizeof(x));

Resources