Are struct names pointers to first element? - c

I found a few similar questions but none of them helped much. Are struct names pointers to the first element of the struct, similar to an array?
struct example {
int foo;
int bar;
};
struct example e;
e.foo = 5;
e.bar = 10;
printf("%d\n%d\n%d\n%d\n%d\n%d\n", e, e.foo, e.bar, &e, &e.foo, &e.bar);
Output:
5
5
10
2033501712
2033501712
2033501716
All of the answers to the other questions said "no", but this output confuses me. All help would be greatly appreciated.

The address of a struct is indeed the address of the first element, though you'll need to know the type of the element in order to safely cast it.
(C17 §6.7.2.1.15: "A pointer to a structure object, suitably
converted, points to its initial member ... and vice versa. There may
be unnamed padding within as structure object, but not at its
beginning.")
While it's kind of ugly, numerous pieces of production software rely on this. QNX, for example, uses this kind of behavior in open control block (OCB) logic when writing resource managers. Gtk also something similar.
Your current implementation is dangerous though. If you must rely on this behavior, do it like so, and don't attempt to pass a pointer-to-struct as an argument to printf(), as you're intentionally breaking a feature of a language with minimal type-safety.
struct example {
int foo;
int bar;
};
struct example myStruct = { 1, 2 };
int* pFoo = (int*)&myStruct;
printf("%d", *pFoo);
Finally, this only holds for the first element. Subsequent elements may not be situation where you expect them to be, namely due to struct packing and padding.

struct names aren't pointers to anything. You are invoking undefined behaviour by passing a struct to printf with an incompatible format specifier %d. It may seem to "work" because the first member of the struct has the same address as the struct itself.

Related

Is it legal to implement inheritance in C by casting pointers between one struct that is a subset of another rather than first member?

Now I know I can implement inheritance by casting the pointer to a struct to the type of the first member of this struct.
However, purely as a learning experience, I started wondering whether it is possible to implement inheritance in a slightly different way.
Is this code legal?
#include <stdio.h>
#include <stdlib.h>
struct base
{
double some;
char space_for_subclasses[];
};
struct derived
{
double some;
int value;
};
int main(void) {
struct base *b = malloc(sizeof(struct derived));
b->some = 123.456;
struct derived *d = (struct derived*)(b);
d->value = 4;
struct base *bb = (struct base*)(d);
printf("%f\t%f\t%d\n", d->some, bb->some, d->value);
return 0;
}
This code seems to produce desired results , but as we know this is far from proving it is not UB.
The reason I suspect that such a code might be legal is that I can not see any alignment issues that could arise here. But of course this is far from knowing no such issues arise and even if there are indeed no alignment issues the code might still be UB for any other reason.
Is the above code valid?
If it's not, is there any way to make it valid?
Is char space_for_subclasses[]; necessary? Having removed this line the code still seems to be behaving itself
As I read the standard, chapter §6.2.6.1/P5,
Certain object representations need not represent a value of the object type. If the stored
value of an object has such a representation and is read by an lvalue expression that does
not have character type, the behavior is undefined. [...]
So, as long as space_for_subclasses is a char (array-decays-to-pointer) member and you use it to read the value, you should be OK.
That said, to answer
Is char space_for_subclasses[]; necessary?
Yes, it is.
Quoting §6.7.2.1/P18,
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. However, when a . (or ->) operator has a left operand that is
(a pointer to) a structure with a flexible array member and the right operand names that
member, it behaves as if that member were replaced with the longest array (with the same
element type) that would not make the structure larger than the object being accessed; the
offset of the array shall remain that of the flexible array member, even if this would differ
from that of the replacement array. If this array would have no elements, it behaves as if
it had one element but the behavior is undefined if any attempt is made to access that
element or to generate a pointer one past it.
Remove that and you'd be accessing invalid memory, causing undefined behavior. However, in your case (the second snippet), you're not accessing value anyway, so that is not going to be an issue here.
This is more-or-less the same poor man's inheritance used by struct sockaddr, and it is not reliable with the current generation of compilers. The easiest way to demonstrate a problem is like this:
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
struct base
{
double some;
char space_for_subclasses[];
};
struct derived
{
double some;
int value;
};
double test(struct base *a, struct derived *b)
{
a->some = 1.0;
b->some = 2.0;
return a->some;
}
int main(void)
{
void *block = malloc(sizeof(struct derived));
if (!block) {
perror("malloc");
return 1;
}
double x = test(block, block);
printf("x=%g some=%g\n", x, *(double *)block);
return 0;
}
If a->some and b->some were allowed by the letter of the standard to be the same object, this program would be required to print x=2.0 some=2.0, but with some compilers and under some conditions (it won't happen at all optimization levels, and you may have to move test to its own file) it will print x=1.0 some=2.0 instead.
Whether the letter of the standard does allow a->some and b->some to be the same object is disputed. See http://blog.regehr.org/archives/1466 and the paper it links to.

Struct type aliasing / tagged-union without union

For two (or more) structs: Base and Sub with a common first (unnamed) struct, is it safe to convert/cast from Base to Sub and vice versa?
struct Base{
struct{
int id;
// ...
};
char data[]; // necessary?
}
struct Sub{
struct{
int id;
// same '...'
};
// actual data
};
Are these functions guaranteed to be safe and technically correct? (Also: is the zero-length char data[] member necessary and useful?)
struct Base * subToBase(struct Sub * s){
return (struct Base*)s;
}
struct Sub * baseToSub(struct Base * b){
if(b->id != SUB_ID){
return NULL;
}
return (struct Sub*)b;
}
Edit
I have no plans to nest any further than Base within Sub, but rather leave the possibility to add other sub-types (directly under Base) later without needing to change Base. My main concern is whether pointers to the structs can be safely converted back and forth between Base and any sub. References to the (C11) standard would be most appreciated.
Edit v2
Changed the wording slightly to discourage OOP/inheritance discussions. What I want is a tagged-union, without the union so it can be extended later. I have no plans for doing additional nesting. Sub-types that need other sub-types' functionality can do so explicitly, without doing any further nesting.
Context
For a script interpreter1 I have made a pseudo object-oriented tagged-union type system, without the union. It has an (abstract) generic base type Object with several (specific) sub-types, such as String, Number, List etc. Every type-struct has the following unnamed struct as the first member:
#define OBJHEAD struct{ \
int id; \
int line; \
int column; \
}
The id identifies the type of object, line and column should (also) be self-explanatory. A simplified implementation of various objects:
typedef struct Object{
OBJHEAD;
char data[]; // necessary?
} Object;
typedef struct Number{
OBJHEAD;
int value; // only int for simplicity
} Number;
typedef struct String{
OBJHEAD;
size_t length;
char * string;
} String;
typedef struct List{
OBJHEAD;
size_t size;
Object * elements; // may be any kind and mix of objects
} List;
Object * Number_toObject(Number * num){
return (Object*)num;
}
Number * Number_fromObject(Object * obj){
if(obj->type != TYPE_NUMBER){
return NULL;
}
return (Number*)obj;
}
I know that the most elegant and technically correct way to do this would be to use an enum for the id and a union for the various sub-types. But I want the type-system to be extensible (through some form of type-registry) so that types can be added later without changing all the Object-related code.
A later/external addition could be:
typedef struct File{
OBJHEAD;
FILE * fp;
} File;
without needing to change Object.
Are these conversions guaranteed to be safe?
(As for the small macro-abuse: the OBJHEAD will of course be extensively documented so additional implementers will know what member-names not to use. The idea is not to hide the header, but to save pasting it every time.)
Converting a pointer to one object type to a pointer to a different object type (via a cast, for instance) is permitted, but if the resulting pointer is not correctly aligned then behavior is undefined (C11 6.3.2.3/7). Depending on the members of Base and Sub and on implentation-dependent behavior, it is not necessarily the case that a Base * converted to a Sub * is correctly aligned. For example, given ...
struct Base{
struct{
int id;
};
char data[]; // necessary?
}
struct Sub{
struct{
int id;
};
long long int value;
};
... it may be that the implementation permits Base objects to be aligned on 32-bit boundaries but requires Sub objects to be aligned on 64-bit boundaries, or even on stricter ones.
None of this is affected by whether Base has a flexible array member.
It is a different question whether it is safe to dereference a pointer value of one type that was obtained by casting a pointer value of a different type. For one thing, C places rather few restrictions on how implementations choose to lay out structures: members must be laid out in the order they are declared, and there must not be any padding before the first one, but otherwise, implementations have free reign. To the best of my knowledge, in your case there is no requirement that the anonymous struct members of your two structures must be laid out the same way as each other if they have more than one member. (And if they have only one member then why use an anonumous struct?) It is also not safe to assume that Base.data starts at the same offset as the first element following the anonymous struct in Sub.
In practice, dereferencing the result of your subToBase() is probably ok, and you can certainly implement tests to verify that. Also, if you have a Base * that was obtained by conversion from a Sub *, then the result of converting it back, for instance via baseToSub(), is guaranteed to be the same as the original Sub * (C11 6.3.2.3/7 again). In that case, the conversion to Base * and back has no effect on the safety of dereferencing the the pointer as a Sub *.
On the other hand, though I'm having trouble finding a reference for it in the standard, I have to say that baseToSub() is very dangerous in the general context. If a Base * that does not actually point to a Sub is converted to Sub * (which in itself is permitted), then it is not safe to dereference that pointer to access members not shared with Base. In particular, given my declarations above, if the referenced object is in fact a Base, then Base.data being declared in no way prevents ((Sub *)really_a_Base_ptr)->value from producing undefined behavior.
To avoid all undefined and implementation-defined behavior, you want an approach that avoids casting and ensures consistent layout. #LoPiTaL's suggestion to embed a typed Base structure inside your Sub structures is a good approach in that regard.
No it is not safe, at least not under all circumstances. If your compiler sees two pointers p and q that have different base type, it may always assume that they don't alias, or stated in other words it may always assume that *p and *q are different objects.
Your cast punches a hole in that assumption. That is if you have a function
double foo(struct A* p, struct B* q) {
double b = q->field0;
*p = (struct A*){ 0 };
return b + q->field0; // compiler may return 2*b
}
the optimizer is allowed to avoid the additional read from memory.
If you'd know that no function will ever see the same object through differently typed pointers, you would be safe. But such an assertion is not made easily, so you'd better avoid such hackery.
It is correct, since it is guaranteed to have the same alignment on the first member of the struct, so you can cast from one struct to another.
Nevertheless, the common way to implement your behaviour is to "inherit" the base class:
//Base struct definition
typedef struct Base_{
int id;
// ...
//char data[]; //This is not needed.
}Base;
//Subclass definition
typedef struct Sub_{
Base base; //Note: this is NOT a pointer
// actual data
}Sub;
So now, you can cast a Sub struct into a Base struct, or just return the first member, which already is of type Base, so there is no need of casting anymore.
One word of caution: do not abuse MACROS. MACROS are nice and good for a lot of things, but abusing them may lead to difficult to read and maintain code.
In this case, the macro is easily replaced with the base member.
One final word, your macro is error prone, since the member names are now hidden. On the end, you may be adding new members with the same name, and getting weird errors without knowing why.
When you further expand your hierarchy into sub-subclasses, you will end up having to write ALL the base classes MACRO, while if you use the "inherit" aproach, you will have to write only the direct base.
Neither of these solutions actually solve your problem: inheritance. The only real solution you would have (the preferred) would be to change to a trully OO language. Due to similarity to C, the best match would be C++, but could do any other language.

dereferencing struct pointer to structure variable

I am having a little bit of confusion about derefrencing a structure pointer to a
structure variable.
It will be good if I demonstrate my problem with an example.
So here I am:
struct my_struct{
int num1;
int num2;
}tmp_struct;
void Display_struct(void * dest_var){
struct my_struct struct_ptr;
struct_ptr = *((struct my_struct *)dest_var);
printf("%d\t%d\n",struct_ptr.num1,struct_ptr.num2);
}
int main()
{
tmp_struct.num1 = 100;
tmp_struct.num2 = 150;
Display_struct(&tmp_struct);
return 0;
}
Now when I am running this example I am able to get the code to be compiled in a very clean manner and also the output is correct.
But what I am not able to get is that is this a correct way of dereferencing the structure pointer to a structure variable as we do in case of other simple
data types like this:
int example_num;
void Display_struct(void * dest_var){
int example_num_ptr;
example_num_ptr = *((int *)dest_var);
printf("%d\t%d\n",struct_ptr.num1,struct_ptr.num2);
}
int main()
{
example_num = 100;
Display_struct(&example_num);
return 0;
}
Here we can dereference the int pointer to int variable as it is a simple data
type but in my opinion we can't just dereference the structure pointer in similar manner to a structure variable as it is not simple data type but a complex data type or data structure.
Please help me in resolving the concept behind this.
The only problem is that you have to guarantee that the passed void* points to a variable of the correct struct type. As long as it does, everything will work fine.
The question is why you would use a void pointer and not the expected struct, but I assume this function is part of some generic programming setup, otherwise it wouldn't make sense to use void pointers.
However, if you would attempt something "hackish" like this:
int arr[2] = {100, 150};
Display_struct(arr); // BAD
Then there are no longer any guarantees: the above code will compile just fine but it invokes undefined behavior and therefore may crash & burn. The struct may contain padding bytes at any place and the code also breaks the "strict aliasing" rules of C.
(Aliasing refers to the rules stated by the C standard chapter 6.5 Expressions, 7§)
You are thinking up a problem where there isn't any. A struct-type (alias an aggregate data type) is technically not very different from any other type.
If we look at things on the lower level, a variable of any type (including a struct type) is just some number of bits in memory.
The type determines the number of bits in a variable and their interpretation.
Effectively, whether you dereference a pointer-to-int or a pointer-to-struct, you just get the chunk of bits your pointer points to.
In your main function, you have struct tmp_struct. It is not a pointer. But it is fine, because you pass address of tmp_struct to the function void Display_struct(void * dest_var).
Then function take the input argument, your pointer(void*). It hold the address of 'tmp_struct`.
Then inside the function you are de-referencing correctly.
struct_ptr = *((struct my_struct *)dest_var);
you deference void* to struct my_struct type. Your de-referencing correct, because you pass same type object. Otherwise it will cause run time issues.
No matter how complex your data type or data structure, de-referencing should work fine.
But if input arg type is void* make sure to pass struct my_struct to function.

nested struct in C

struct s{
int a;
struct s b;
};
The above code segment throws the error error: field 'b' has incomplete type while
struct s{
int a;
struct s *b;
};
doesn't give any error. I don't understand why this is allowed for pointers but not for the non-pointer variable !!
Class members must have a complete type when they are declared, so that their size can be used to determine the class layout.
Within the class definition, the class itself is incomplete, so you can't declare a member of the same type. That would be impossible anyway (at least if there are any other members), since the class would have to be larger than itself.
A pointer is a complete type, even if the type it points to isn't, so you can declare a class member to be a pointer to the class type.
(Note: I use the word "class" since I'm a C++ programmer. I just noticed that the question is also tagged C, and C++ has since been removed. I believe the answer is still correct in that language, if you replace "class" with "structure", but I'm not completely sure since they are different languages. It would be better if you only asked about one language, since there are differences (sometimes major, sometimes subtle) between languages.)
Q: What are incomplete types?
A: An incomplete type is a type which has the identifier but lacks information needed to determine the size of the identifier.
The ‘void’ type is an incomplete type.
A union/structure type whose members which are not yet specified.
‘void’ type cannot be completed.
To complete an incomplete type, we need to specify the missing
information.
Example:
struct Employee *ptr; // Here 'Employee' is incomplete
C/C++ allows pointers to incomplete types.
To make 'Employee' complete, we need to specify missing information like shown below
typedef struct Employee
{
char name[25];
int age;
int employeeID;
char department[25];
}EMP;
In your case,
struct s
{
int a;
struct s b; // Structure is incomplete.
}// Till this point the structure is incomplete.
The struct s b; the structure s is incomplete. We can declare a pointer to incomplete type not a variable.
To adequately define s, the compiler needs to know the size of s. In the first example, the size of s depends on the size of s, but not in the second.
In the first example, by defintion, sizeof(s) = sizeof(int) + sizeof(s) + padding. If we try to solve this equation for sizeof(s), we get 0 = sizeof(int) + padding, which clearly is impossible.
In the second, sizeof(s) = sizeof(int) + sizeof(s*) + padding. If we assume that sizeof(s*) ~= sizeof(int), then sizeof(s) = 2*sizeof(int) + padding.
I'm going to assume that the extra asterisks in struct s **b** are given for emphasis and not as some kind of demented pointer declaration. (Please don't do that! It's much easier to analyze someone's code if it's presented exactly as it runs.)
When you do this without declaring b as a pointer:
struct s{
int a;
struct s b;
};
the compiler doesn't know how much space it needs to allocate for the b field, since at that point you haven't finished defining struct s. In fact, it would be impossible for the compiler to define this particular structure: no matter how many bytes it allocated for struct s, it would have to add 4 more to make room for the int a field.
Declaring b to be a pointer to struct s makes things easier:
struct s{
int a;
struct s *b;
};
No matter how many fields you add to struct s, the compiler knows that the b field only needs to contain the address of the struct, and that doesn't change based on how large the struct itself it.

C, Struct pointer polymorphism

NOTE: this is NOT a C++ question, i can't use a C++ compiler, only a C99.
Is this valid(and acceptable, beautiful) code?
typedef struct sA{
int a;
} A;
typedef struct aB{
struct sA a;
int b;
} B;
A aaa;
B bbb;
void init(){
bbb.b=10;
bbb.a.a=20;
set((A*)&bbb);
}
void set(A* a){
aaa=*a;
}
void useLikeB(){
printf("B.b = %d", ((B*)&aaa)->b);
}
In short, is valid to cast a "sub class" to "super class" and after recast "super class" to "sub class" when i need specified behavior of it?
Thanks
First of all, the C99 standard permits you to cast any struct pointer to a pointer to its first member, and the other way (6.7.2.1 Structure and union specifiers):
13 Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
In other way, in your code you are free to:
Convert B* to A* — and it will always work correctly,
Convert A* to B* — but if it doesn't actually point to B, you're going to get random failures accessing further members,
Assign the structure pointed through A* to A — but if the pointer was converted from B*, only the common members will be assigned and the remaining members of B will be ignored,
Assign the structure pointed through B* to A — but you have to convert the pointer first, and note (3).
So, your example is almost correct. But useLikeB() won't work correctly since aaa is a struct of type A which you assigned like stated in point (4). This has two results:
The non-common B members won't be actually copied to aaa (as stated in (3)),
Your program will fail randomly trying to access A like B which it isn't (you're accessing a member which is not there, as stated in (2)).
To explain that in a more practical way, when you declare A compiler reserves the amount of memory necessary to hold all members of A. B has more members, and thus requires more memory. As A is a regular variable, it can't change its size during run-time and thus can't hold the remaining members of B.
And as a note, by (1) you can practically take a pointer to the member instead of converting the pointer which is nicer, and it will allow you to access any member, not only the first one. But note that in this case, the opposite won't work anymore!
I think this is quite dirty and relatively hazardous. What are you trying to achieve with this? also there is no guarantee that aaa is a B , it might also be an A. so when someone calls "uselikeB" it might fail. Also depending on architecture "int a" and "pointer to struct a" might either overlap correctly or not and might result in interesting stuff happening when you assign to "int a" and then access "struct a"
Why would you do this? Having
set((A*)&bbb);
is not easier to write than the correct
set(&bbb.a);
Other things that you should please avoid when you post here:
you use set before it is declared
aaa=a should be aaa = *a
First of all, I agree with most concerns from previous posters about the safety of this assignments.
With that said, if you need to go that route, I'd add one level of indirection and some type-safety checkers.
static const int struct_a_id = 1;
static const int struct_b_id = 2;
struct MyStructPtr {
int type;
union {
A* ptra;
B* ptrb;
//continue if you have more types.
}
};
The idea is that you manage your pointers by passing them through a struct that contains some "type" information. You can build a tree of classes on the side that describe your class tree (note that given the restrictions for safely casting, this CAN be represented using a tree) and be able to answer questions to ensure you are correctly casting structures up and down. So your "useLikeB" function could be written like this.
MyStructPtr the_ptr;
void init_ptr(A* pa)
{
the_ptr.type = struct_a_id
the_ptr.ptra = pa;
}
void useLikeB(){
//This function should FAIL IF aaa CANT BE SAFELY CASTED TO B
//by checking in your type tree that the a type is below the
//a type (not necesarily a direct children).
assert( is_castable_to(the_ptr.type,struct_b_id ) );
printf("B.b = %d", the_ptr.ptrb->b);
}
My 2 cents.

Resources