nested struct in C - c

struct s{
int a;
struct s b;
};
The above code segment throws the error error: field 'b' has incomplete type while
struct s{
int a;
struct s *b;
};
doesn't give any error. I don't understand why this is allowed for pointers but not for the non-pointer variable !!

Class members must have a complete type when they are declared, so that their size can be used to determine the class layout.
Within the class definition, the class itself is incomplete, so you can't declare a member of the same type. That would be impossible anyway (at least if there are any other members), since the class would have to be larger than itself.
A pointer is a complete type, even if the type it points to isn't, so you can declare a class member to be a pointer to the class type.
(Note: I use the word "class" since I'm a C++ programmer. I just noticed that the question is also tagged C, and C++ has since been removed. I believe the answer is still correct in that language, if you replace "class" with "structure", but I'm not completely sure since they are different languages. It would be better if you only asked about one language, since there are differences (sometimes major, sometimes subtle) between languages.)

Q: What are incomplete types?
A: An incomplete type is a type which has the identifier but lacks information needed to determine the size of the identifier.
The ‘void’ type is an incomplete type.
A union/structure type whose members which are not yet specified.
‘void’ type cannot be completed.
To complete an incomplete type, we need to specify the missing
information.
Example:
struct Employee *ptr; // Here 'Employee' is incomplete
C/C++ allows pointers to incomplete types.
To make 'Employee' complete, we need to specify missing information like shown below
typedef struct Employee
{
char name[25];
int age;
int employeeID;
char department[25];
}EMP;
In your case,
struct s
{
int a;
struct s b; // Structure is incomplete.
}// Till this point the structure is incomplete.
The struct s b; the structure s is incomplete. We can declare a pointer to incomplete type not a variable.

To adequately define s, the compiler needs to know the size of s. In the first example, the size of s depends on the size of s, but not in the second.
In the first example, by defintion, sizeof(s) = sizeof(int) + sizeof(s) + padding. If we try to solve this equation for sizeof(s), we get 0 = sizeof(int) + padding, which clearly is impossible.
In the second, sizeof(s) = sizeof(int) + sizeof(s*) + padding. If we assume that sizeof(s*) ~= sizeof(int), then sizeof(s) = 2*sizeof(int) + padding.

I'm going to assume that the extra asterisks in struct s **b** are given for emphasis and not as some kind of demented pointer declaration. (Please don't do that! It's much easier to analyze someone's code if it's presented exactly as it runs.)
When you do this without declaring b as a pointer:
struct s{
int a;
struct s b;
};
the compiler doesn't know how much space it needs to allocate for the b field, since at that point you haven't finished defining struct s. In fact, it would be impossible for the compiler to define this particular structure: no matter how many bytes it allocated for struct s, it would have to add 4 more to make room for the int a field.
Declaring b to be a pointer to struct s makes things easier:
struct s{
int a;
struct s *b;
};
No matter how many fields you add to struct s, the compiler knows that the b field only needs to contain the address of the struct, and that doesn't change based on how large the struct itself it.

Related

Struct type aliasing / tagged-union without union

For two (or more) structs: Base and Sub with a common first (unnamed) struct, is it safe to convert/cast from Base to Sub and vice versa?
struct Base{
struct{
int id;
// ...
};
char data[]; // necessary?
}
struct Sub{
struct{
int id;
// same '...'
};
// actual data
};
Are these functions guaranteed to be safe and technically correct? (Also: is the zero-length char data[] member necessary and useful?)
struct Base * subToBase(struct Sub * s){
return (struct Base*)s;
}
struct Sub * baseToSub(struct Base * b){
if(b->id != SUB_ID){
return NULL;
}
return (struct Sub*)b;
}
Edit
I have no plans to nest any further than Base within Sub, but rather leave the possibility to add other sub-types (directly under Base) later without needing to change Base. My main concern is whether pointers to the structs can be safely converted back and forth between Base and any sub. References to the (C11) standard would be most appreciated.
Edit v2
Changed the wording slightly to discourage OOP/inheritance discussions. What I want is a tagged-union, without the union so it can be extended later. I have no plans for doing additional nesting. Sub-types that need other sub-types' functionality can do so explicitly, without doing any further nesting.
Context
For a script interpreter1 I have made a pseudo object-oriented tagged-union type system, without the union. It has an (abstract) generic base type Object with several (specific) sub-types, such as String, Number, List etc. Every type-struct has the following unnamed struct as the first member:
#define OBJHEAD struct{ \
int id; \
int line; \
int column; \
}
The id identifies the type of object, line and column should (also) be self-explanatory. A simplified implementation of various objects:
typedef struct Object{
OBJHEAD;
char data[]; // necessary?
} Object;
typedef struct Number{
OBJHEAD;
int value; // only int for simplicity
} Number;
typedef struct String{
OBJHEAD;
size_t length;
char * string;
} String;
typedef struct List{
OBJHEAD;
size_t size;
Object * elements; // may be any kind and mix of objects
} List;
Object * Number_toObject(Number * num){
return (Object*)num;
}
Number * Number_fromObject(Object * obj){
if(obj->type != TYPE_NUMBER){
return NULL;
}
return (Number*)obj;
}
I know that the most elegant and technically correct way to do this would be to use an enum for the id and a union for the various sub-types. But I want the type-system to be extensible (through some form of type-registry) so that types can be added later without changing all the Object-related code.
A later/external addition could be:
typedef struct File{
OBJHEAD;
FILE * fp;
} File;
without needing to change Object.
Are these conversions guaranteed to be safe?
(As for the small macro-abuse: the OBJHEAD will of course be extensively documented so additional implementers will know what member-names not to use. The idea is not to hide the header, but to save pasting it every time.)
Converting a pointer to one object type to a pointer to a different object type (via a cast, for instance) is permitted, but if the resulting pointer is not correctly aligned then behavior is undefined (C11 6.3.2.3/7). Depending on the members of Base and Sub and on implentation-dependent behavior, it is not necessarily the case that a Base * converted to a Sub * is correctly aligned. For example, given ...
struct Base{
struct{
int id;
};
char data[]; // necessary?
}
struct Sub{
struct{
int id;
};
long long int value;
};
... it may be that the implementation permits Base objects to be aligned on 32-bit boundaries but requires Sub objects to be aligned on 64-bit boundaries, or even on stricter ones.
None of this is affected by whether Base has a flexible array member.
It is a different question whether it is safe to dereference a pointer value of one type that was obtained by casting a pointer value of a different type. For one thing, C places rather few restrictions on how implementations choose to lay out structures: members must be laid out in the order they are declared, and there must not be any padding before the first one, but otherwise, implementations have free reign. To the best of my knowledge, in your case there is no requirement that the anonymous struct members of your two structures must be laid out the same way as each other if they have more than one member. (And if they have only one member then why use an anonumous struct?) It is also not safe to assume that Base.data starts at the same offset as the first element following the anonymous struct in Sub.
In practice, dereferencing the result of your subToBase() is probably ok, and you can certainly implement tests to verify that. Also, if you have a Base * that was obtained by conversion from a Sub *, then the result of converting it back, for instance via baseToSub(), is guaranteed to be the same as the original Sub * (C11 6.3.2.3/7 again). In that case, the conversion to Base * and back has no effect on the safety of dereferencing the the pointer as a Sub *.
On the other hand, though I'm having trouble finding a reference for it in the standard, I have to say that baseToSub() is very dangerous in the general context. If a Base * that does not actually point to a Sub is converted to Sub * (which in itself is permitted), then it is not safe to dereference that pointer to access members not shared with Base. In particular, given my declarations above, if the referenced object is in fact a Base, then Base.data being declared in no way prevents ((Sub *)really_a_Base_ptr)->value from producing undefined behavior.
To avoid all undefined and implementation-defined behavior, you want an approach that avoids casting and ensures consistent layout. #LoPiTaL's suggestion to embed a typed Base structure inside your Sub structures is a good approach in that regard.
No it is not safe, at least not under all circumstances. If your compiler sees two pointers p and q that have different base type, it may always assume that they don't alias, or stated in other words it may always assume that *p and *q are different objects.
Your cast punches a hole in that assumption. That is if you have a function
double foo(struct A* p, struct B* q) {
double b = q->field0;
*p = (struct A*){ 0 };
return b + q->field0; // compiler may return 2*b
}
the optimizer is allowed to avoid the additional read from memory.
If you'd know that no function will ever see the same object through differently typed pointers, you would be safe. But such an assertion is not made easily, so you'd better avoid such hackery.
It is correct, since it is guaranteed to have the same alignment on the first member of the struct, so you can cast from one struct to another.
Nevertheless, the common way to implement your behaviour is to "inherit" the base class:
//Base struct definition
typedef struct Base_{
int id;
// ...
//char data[]; //This is not needed.
}Base;
//Subclass definition
typedef struct Sub_{
Base base; //Note: this is NOT a pointer
// actual data
}Sub;
So now, you can cast a Sub struct into a Base struct, or just return the first member, which already is of type Base, so there is no need of casting anymore.
One word of caution: do not abuse MACROS. MACROS are nice and good for a lot of things, but abusing them may lead to difficult to read and maintain code.
In this case, the macro is easily replaced with the base member.
One final word, your macro is error prone, since the member names are now hidden. On the end, you may be adding new members with the same name, and getting weird errors without knowing why.
When you further expand your hierarchy into sub-subclasses, you will end up having to write ALL the base classes MACRO, while if you use the "inherit" aproach, you will have to write only the direct base.
Neither of these solutions actually solve your problem: inheritance. The only real solution you would have (the preferred) would be to change to a trully OO language. Due to similarity to C, the best match would be C++, but could do any other language.

Are struct names pointers to first element?

I found a few similar questions but none of them helped much. Are struct names pointers to the first element of the struct, similar to an array?
struct example {
int foo;
int bar;
};
struct example e;
e.foo = 5;
e.bar = 10;
printf("%d\n%d\n%d\n%d\n%d\n%d\n", e, e.foo, e.bar, &e, &e.foo, &e.bar);
Output:
5
5
10
2033501712
2033501712
2033501716
All of the answers to the other questions said "no", but this output confuses me. All help would be greatly appreciated.
The address of a struct is indeed the address of the first element, though you'll need to know the type of the element in order to safely cast it.
(C17 §6.7.2.1.15: "A pointer to a structure object, suitably
converted, points to its initial member ... and vice versa. There may
be unnamed padding within as structure object, but not at its
beginning.")
While it's kind of ugly, numerous pieces of production software rely on this. QNX, for example, uses this kind of behavior in open control block (OCB) logic when writing resource managers. Gtk also something similar.
Your current implementation is dangerous though. If you must rely on this behavior, do it like so, and don't attempt to pass a pointer-to-struct as an argument to printf(), as you're intentionally breaking a feature of a language with minimal type-safety.
struct example {
int foo;
int bar;
};
struct example myStruct = { 1, 2 };
int* pFoo = (int*)&myStruct;
printf("%d", *pFoo);
Finally, this only holds for the first element. Subsequent elements may not be situation where you expect them to be, namely due to struct packing and padding.
struct names aren't pointers to anything. You are invoking undefined behaviour by passing a struct to printf with an incompatible format specifier %d. It may seem to "work" because the first member of the struct has the same address as the struct itself.

Incomplete types in C

6.2.5
At various points within a translation unit an object type may be
incomplete (lacking sufficient information to determine the size of
objects of that type).
Also
6.2.5 19) The void type comprises an empty set of values; it is an incomplete object type that cannot be completed.
And
6.5.3.4 The sizeof operator shall not be applied to an expression that has function type or an incomplete type,
But Visual Studio 2010 prints 0 for
printf("Size of void is %d\n",sizeof(void));
My question is 'What are incomplete types'?
struct temp
{
int i;
char ch;
int j;
};
Is temp is incomplete here? If yes why it is incomplete(We know the size of temp)? Not getting clear idea of incomplete types. Any code snippet which explains this will be helpful.
Your struct temp is incomplete right up until the point where the closing brace occurs:
struct temp
{
int i;
char ch;
int j;
};// <-- here
The structure is declared (comes into existence) following the temp symbol but it's incomplete until the actual definition is finished. That's why you can have things like:
struct temp
{
int i;
char ch;
struct temp *next; // can use pointers to incomplete types.
};
without getting syntax errors.
C makes a distinction between declaration (declaring that something exists) and definition (actually defining what it is).
Another incomplete type (declared but not yet defined) is:
struct temp;
This case is often used to provide opaque types in C where the type exists (so you can declare a pointer to it) but is not defined (so you can't figure out what's in there). The definition is usually limited to the code implementing it while the header used by clients has only the incomplete declaration.
No, your struct temp example is certainly complete; Assuming int is 4 bytes, and char is 1, I can easily count 9 bytes in that struct (ignoring padding).
Another example of an incomplete type would be:
struct this_is_incomplete;
This tells the compiler, "hey, this struct exists, but you don't know what's in it yet". This is useful for information hiding, but when you need to pass a pointer to the type:
int some_function(struct this_is_incomplete* ptr);
The compiler can correctly generate calls to this function, because it knows a pointer is 4 (or 8) bytes, even though it doesn't know how big the thing is that the pointer points to.
A type can be incomplete when its name is declared but not its definition. This occurs when you forward-declare a type in a header file.
Say, record.h contains:
struct record_t;
void process_record(struct record_t *r);
And record.c contains:
struct record_t {
int data;
};
If, in another module, say "usage.c" you do this:
#include "record.h"
const int rec_size = sizeof(struct record_t); // FAIL
The type record_t is incomplete inside the "usage.c" compilation unit, because it only knows the name record_t, and not what the type is made up of.

C, Struct pointer polymorphism

NOTE: this is NOT a C++ question, i can't use a C++ compiler, only a C99.
Is this valid(and acceptable, beautiful) code?
typedef struct sA{
int a;
} A;
typedef struct aB{
struct sA a;
int b;
} B;
A aaa;
B bbb;
void init(){
bbb.b=10;
bbb.a.a=20;
set((A*)&bbb);
}
void set(A* a){
aaa=*a;
}
void useLikeB(){
printf("B.b = %d", ((B*)&aaa)->b);
}
In short, is valid to cast a "sub class" to "super class" and after recast "super class" to "sub class" when i need specified behavior of it?
Thanks
First of all, the C99 standard permits you to cast any struct pointer to a pointer to its first member, and the other way (6.7.2.1 Structure and union specifiers):
13 Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
In other way, in your code you are free to:
Convert B* to A* — and it will always work correctly,
Convert A* to B* — but if it doesn't actually point to B, you're going to get random failures accessing further members,
Assign the structure pointed through A* to A — but if the pointer was converted from B*, only the common members will be assigned and the remaining members of B will be ignored,
Assign the structure pointed through B* to A — but you have to convert the pointer first, and note (3).
So, your example is almost correct. But useLikeB() won't work correctly since aaa is a struct of type A which you assigned like stated in point (4). This has two results:
The non-common B members won't be actually copied to aaa (as stated in (3)),
Your program will fail randomly trying to access A like B which it isn't (you're accessing a member which is not there, as stated in (2)).
To explain that in a more practical way, when you declare A compiler reserves the amount of memory necessary to hold all members of A. B has more members, and thus requires more memory. As A is a regular variable, it can't change its size during run-time and thus can't hold the remaining members of B.
And as a note, by (1) you can practically take a pointer to the member instead of converting the pointer which is nicer, and it will allow you to access any member, not only the first one. But note that in this case, the opposite won't work anymore!
I think this is quite dirty and relatively hazardous. What are you trying to achieve with this? also there is no guarantee that aaa is a B , it might also be an A. so when someone calls "uselikeB" it might fail. Also depending on architecture "int a" and "pointer to struct a" might either overlap correctly or not and might result in interesting stuff happening when you assign to "int a" and then access "struct a"
Why would you do this? Having
set((A*)&bbb);
is not easier to write than the correct
set(&bbb.a);
Other things that you should please avoid when you post here:
you use set before it is declared
aaa=a should be aaa = *a
First of all, I agree with most concerns from previous posters about the safety of this assignments.
With that said, if you need to go that route, I'd add one level of indirection and some type-safety checkers.
static const int struct_a_id = 1;
static const int struct_b_id = 2;
struct MyStructPtr {
int type;
union {
A* ptra;
B* ptrb;
//continue if you have more types.
}
};
The idea is that you manage your pointers by passing them through a struct that contains some "type" information. You can build a tree of classes on the side that describe your class tree (note that given the restrictions for safely casting, this CAN be represented using a tree) and be able to answer questions to ensure you are correctly casting structures up and down. So your "useLikeB" function could be written like this.
MyStructPtr the_ptr;
void init_ptr(A* pa)
{
the_ptr.type = struct_a_id
the_ptr.ptra = pa;
}
void useLikeB(){
//This function should FAIL IF aaa CANT BE SAFELY CASTED TO B
//by checking in your type tree that the a type is below the
//a type (not necesarily a direct children).
assert( is_castable_to(the_ptr.type,struct_b_id ) );
printf("B.b = %d", the_ptr.ptrb->b);
}
My 2 cents.

How is it legal to reference an undefined type inside a structure?

As part of answering another question, I came across a piece of code like this, which gcc compiles without complaint.
typedef struct {
struct xyz *z;
} xyz;
int main (void) {
return 0;
}
This is the means I've always used to construct types that point to themselves (e.g., linked lists) but I've always thought you had to name the struct so you could use self-reference. In other words, you couldn't use xyz *z within the structure because the typedef is not yet complete at that point.
But this particular sample does not name the structure and it still compiles. I thought originally there was some black magic going on in the compiler that automatically translated the above code because the structure and typedef names were the same.
But this little beauty works as well:
typedef struct {
struct NOTHING_LIKE_xyz *z;
} xyz;
What am I missing here? This seems a clear violation since there is no struct NOTHING_LIKE_xyz type defined anywhere.
When I change it from a pointer to an actual type, I get the expected error:
typedef struct {
struct NOTHING_LIKE_xyz z;
} xyz;
qqq.c:2: error: field `z' has incomplete type
Also, when I remove the struct, I get an error (parse error before "NOTHING ...).
Is this allowed in ISO C?
Update: A struct NOSUCHTYPE *variable; also compiles so it's not just inside structures where it seems to be valid. I can't find anything in the c99 standard that allows this leniency for structure pointers.
As the warning says in the second case, struct NOTHING_LIKE_xyz is an incomplete type, like void or arrays of unknown size. An incomplete type can only appear in a struct as a type pointed to (C17 6.7.2.1:3), with an exception for arrays of unknown size that are allowed as the last member of a struct, making the struct itself an incomplete type in this case. The code that follows cannot dereference any pointer to an incomplete type (for good reason).
Incomplete types can offer some datatype encapsulation of sorts in C...
The corresponding paragraph in http://www.ibm.com/developerworks/library/pa-ctypes1/ seems like a good explanation.
The parts of the C99 standard you are after are 6.7.2.3, paragraph 7:
If a type specifier of the form
struct-or-union identifier occurs
other than as part of one of the above
forms, and no other declaration of the
identifier as a tag is visible, then
it declares an incomplete structure or
union type, and declares the
identifier as the tag of that type.
...and 6.2.5 paragraph 22:
A structure or union type of unknown
content (as described in 6.7.2.3) is
an incomplete type. It is completed,
for all declarations of that type, by
declaring the same structure or union
tag with its defining content later in
the same scope.
The 1st and 2nd cases are well-defined, because the size and alignment of a pointer is known. The C compiler only needs the size and alignment info to define a struct.
The 3rd case is invalid because the size of that actual struct is unknown.
But beware that for the 1st case to be logical, you need to give a name to the struct:
// vvv
typedef struct xyz {
struct xyz *z;
} xyz;
otherwise the outer struct and the *z will be considered two different structs.
The 2nd case has a popular use case known as "opaque pointer" (pimpl). For example, you could define a wrapper struct as
typedef struct {
struct X_impl* impl;
} X;
// usually just: typedef struct X_impl* X;
int baz(X x);
in the header, and then in one of the .c,
#include "header.h"
struct X_impl {
int foo;
int bar[123];
...
};
int baz(X x) {
return x.impl->foo;
}
the advantage is out of that .c, you cannot mess with the internals of the object. It is a kind of encapsulation.
You do have to name it. In this:
typedef struct {
struct xyz *z;
} xyz;
will not be able to point to itself as z refers to some complete other type, not to the unnamed struct you just defined. Try this:
int main()
{
xyz me1;
xyz me2;
me1.z = &me2; // this will not compile
}
You'll get an error about incompatible types.
Well... All I can say is that your previous assumption was incorrect. Every time you use a struct X construct (by itself, or as a part of larger declaration), it is interpreted as a declaration of a struct type with a struct tag X. It could be a re-declaration of a previously declared struct type. Or, it can be a very first declaration of a new struct type. The new tag is declared in scope in which it appears. In your specific example it happens to be a file scope (since C language has no "class scope", as it would be in C++).
The more interesting example of this behavior is when the declaration appears in function prototype:
void foo(struct X *p); // assuming `struct X` has not been declared before
In this case the new struct X declaration has function-prototype scope, which ends at the end of the prototype. If you declare a file-scope struct X later
struct X;
and try to pass a pointer of struct X type to the above function, the compiler will give you a diagnostics about non-matching pointer type
struct X *p = 0;
foo(p); // different pointer types for argument and parameter
This also immediately means that in the following declarations
void foo(struct X *p);
void bar(struct X *p);
void baz(struct X *p);
each struct X declaration is a declaration of a different type, each local to its own function prototype scope.
But if you pre-declare struct X as in
struct X;
void foo(struct X *p);
void bar(struct X *p);
void baz(struct X *p);
all struct X references in all function prototype will refer to the same previosly declared struct X type.
I was wondering about this too. Turns out that the struct NOTHING_LIKE_xyz * z is forward declaring struct NOTHING_LIKE_xyz. As a convoluted example,
typedef struct {
struct foo * bar;
int j;
} foo;
struct foo {
int i;
};
void foobar(foo * f)
{
f->bar->i;
f->bar->j;
}
Here f->bar refers to the type struct foo, not typedef struct { ... } foo. The first line will compile fine, but the second will give an error. Not much use for a linked list implementation then.
When a variable or field of a structure type is declared, the compiler has to allocate enough bytes to hold that structure. Since the structure may require one byte, or it may require thousands, there's no way for the compiler to know how much space it needs to allocate. Some languages use multi-pass compilers which would be able find out the size of the structure on one pass and allocate the space for it on a later pass; since C was designed to allow for single-pass compilation, however, that isn't possible. Thus, C forbids the declaration of variables or fields of incomplete structure types.
On the other hand, when a variable or field of a pointer-to-structure type is declared, the compiler has to allocate enough bytes to hold a pointer to the structure. Regardless of whether the structure takes one byte or a million, the pointer will always require the same amount of space. Effectively, the compiler can tread the pointer to the incomplete type as a void* until it gets more information about its type, and then treat it as a pointer to the appropriate type once it finds out more about it. The incomplete-type pointer isn't quite analogous to void*, in that one can do things with void* that one can't do with incomplete types (e.g. if p1 is a pointer to struct s1, and p2 is a pointer to struct s2, one cannot assign p1 to p2) but one can't do anything with a pointer to an incomplete type that one could not do to void*. Basically, from the compiler's perspective, a pointer to an incomplete type is a pointer-sized blob of bytes. It can be copied to or from other similar pointer-sized blobs of bytes, but that's it. the compiler can generate code to do that without having to know what anything else is going to do with the pointer-sized blobs of bytes.

Resources