How does the C offsetof macro work? [duplicate]

How does the C offsetof macro work? [duplicate] - c

This question already has answers here:
Closed 11 years ago.
The community reviewed whether to reopen this question 9 months ago and left it closed:
Original close reason(s) were not resolved
Possible Duplicate:
Why does this C code work?
How do you use offsetof() on a struct?
I read about this offsetof macro on the Internet, but it doesn't explain what it is used for.
#define offsetof(a,b) ((int)(&(((a*)(0))->b)))
What is it trying to do and what is the advantage of using it?

R.. is correct in his answer to the second part of your question: this code is not advised when using a modern C compiler.
But to answer the first part of your question, what this is actually doing is:
(
(int)( // 4.
&( ( // 3.
(a*)(0) // 1.
)->b ) // 2.
)
)
Working from the inside out, this is ...
Casting the value zero to the struct pointer type a*
Getting the struct field b of this (illegally placed) struct object
Getting the address of this b field
Casting the address to an int
Conceptually this is placing a struct object at memory address zero and then finding out at what the address of a particular field is. This could allow you to figure out the offsets in memory of each field in a struct so you could write your own serializers and deserializers to convert structs to and from byte arrays.
Of course if you would actually dereference a zero pointer your program would crash, but actually everything happens in the compiler and no actual zero pointer is dereferenced at runtime.
In most of the original systems that C ran on the size of an int was 32 bits and was the same as a pointer, so this actually worked.

It has no advantages and should not be used, since it invokes undefined behavior (and uses the wrong type - int instead of size_t).
The C standard defines an offsetof macro in stddef.h which actually works, for cases where you need the offset of an element in a structure, such as:
#include <stddef.h>
struct foo {
int a;
int b;
char *c;
};
struct struct_desc {
const char *name;
int type;
size_t off;
};
static const struct struct_desc foo_desc[] = {
{ "a", INT, offsetof(struct foo, a) },
{ "b", INT, offsetof(struct foo, b) },
{ "c", CHARPTR, offsetof(struct foo, c) },
};
which would let you programmatically fill the fields of a struct foo by name, e.g. when reading a JSON file.

It's finding the byte offset of a particular member of a struct. For example, if you had the following structure:
struct MyStruct
{
double d;
int i;
void *p;
};
Then you'd have offsetOf(MyStruct, d) == 0, offsetOf(MyStruct, i) == 8, and offsetOf(MyStruct, p) == 12 (that is, the member named d is 0 bytes from the start of the structure, etc.).
The way that it works is it pretends that an instance of your structure exists at address 0 (the ((a*)(0)) part), and then it takes the address of the intended structure member and casts it to an integer. Although dereferencing an object at address 0 would ordinarily be an error, it's ok to take the address because the address-of operator & and the member dereference -> cancel each other out.
It's typically used for generalized serialization frameworks. If you have code for converting between some kind of wire data (e.g. bytes in a file or from the network) and in-memory data structures, it's often convenient to create a mapping from member name to member offset, so that you can serialize or deserialize values in a generic manner.

The implementation of the offsetof macro is really irrelevant.
The actual C standard defines it as in 7.17.3:
offsetof(type, member-designator)
which expands to an integer constant expression that has type size_t, the value of which is the offset in bytes, to the structure member (designated by member-designator), from the beginning of its structure (designated by type). The type and member designator shall be such that given static type t;.
Trust Adam Rosenfield's answer.
R is completely wrong, and it has many uses - especially being able to tell when code is non-portable among platforms.
(OK, it's C++, but we use it in static template compile time assertions to make sure our data structures do not change size between platforms/versions.)

Related

What exactly does the C Structure Dot Operator Do (Lower Level Perspective)?

I have a question regarding structs in C. So when you create a struct, you are essentially defining the framework of a block of memory. Thus when you create an instance of a struct, you are creating a block of memory such that it is capable of holding a certain number of elements.
However, I'm somewhat confused on what the dot operator is doing. If I have a struct Car and have a member called GasMileage (which is an int member), I am able to get the value of GasMileage by doing something like,
int x = CarInstance.GasMileage;
However, I'm confused as to what is actually happening with this dot operator. Does the dot operator simply act as an offset from the base address? And how exactly is it able to deduce that it is an int?
I guess I'm curious as to what is going on behind the scenes. Would it be possible to reference GasMileage by doing something else? Such as
int *GasMileagePointer = (&carInstance + offsetInBytes(GasMileage));
int x = *GasMileage
This is just something i quickly made up. I've tried hard searching for an good explanation, but nothing seems to explain it any further than treating the dot operator as magic.

When you use the . operator, the compiler translates this to an offset inside the struct, based on the size of the fields (and padding) that precede it.
For example:
struct Car {
char model[52];
int doors;
int GasMilage;
};
Assuming an int is 4 bytes and no padding, the offset of model is 0, the offset of doors is 52, and the offset of GasMilage is 56.
So if you know the offset of the member, you could get a pointer to it like this:
int *GasMileagePointer = (int*)((char *)&carInstance + offsetInBytes(GasMile));
The cast to char * is necessary so that pointer arithmetic goes 1 byte at a time instead of 1 sizeof(carInstance) at a time. Then the result needs to be casted to the correct pointer type, in this case int *

Yes, the dot operator simply applies an offset from the base of the structure, and then accesses the value at that address.
int x = CarInstance.GasMileage;
is equivalent to:
int x = *(int *)((char*)&CarInstance + offsetof(Car, GasMileage));
For a member with some other type T, the only difference is that the cast (int *) becomes (T *).

The dot operator simply selects the member.
Since the compiler has information about the type (and consequently size) of the member (all members, actually), it knows the offset of the member from the start of the struct and can generate appropriate instructions. It may generate a base+offset access, but it also may access the member directly (or even have it cached in a register). The compiler has all those options since it has all the necessary information at compile time.
If it hasn't, like for incomplete types, you'll get a compile-time error.

When it works, the "." behavior of the "." operator is equivalent to taking the address of the structure, indexing it by the offset of the member, and converting that to a pointer of the member type, and dereferencing it. The Standard, however, provides that there are situations where that isn't guaranteed to work. For example, given:
struct s1 {int x,y; }
struct s2 {int x,y; }
void test1(struct s1 *p1, struct s2 *p2)
{
s1->x++;
s2->x^=1;
s1->x--;
s2->x^=1;
}
a compiler may decide that there's no legitimate way that p1->x and p2->x
can identify the same object, so it may reorder the code so as to the ++
and -- operations on s1->x cancel, and the ^=1 operations on s2->x cancel,
thus leaving a function that does nothing.
Note that the behavior is different when using unions, since given:
union u { struct s1 v1; struct s2 v2; };
void test2(union u *uv)
{
u->v1.x^=1;
u->v2.x++;
u->v1.x^=1;
u->v2.x--;
}
the common-initial-subsequence rule indicates that since u->v1 and u->v2
start with fields of the same types, an access to such a field in u->v1 is
equivalent to an access to the corresponding field in u->v2. Thus, a
compiler is not allowed to resequence things. On the other hand, given
void test1(struct s1 *p1, struct s2 *p2);
void test3(union u *uv)
{
test1(&(u.v1), &(u.v2));
}
the fact that u.v1 and u.v2 start with matching fields doesn't guard against
a compiler's assumption that the pointers won't alias.
Note that some compilers offer an option to force generation of code where
member accesses always behave equivalent to the aforementioned pointer
operations. For gcc, the option is -fno-strict-alias. If code will need
to access common initial members of varying structure types, omitting that
switch may cause one's code to fail in weird, bizarre, and unpredictable
ways.

getting the name of a typedef in C? [duplicate]

This question already has answers here:
runtime determine type for C
(2 answers)
Closed 8 years ago.
I am writing a big code where I have to typedef a lot of structs , and then using void pointers to variables of these structs , for example :
typedef struct {
int age;
double height;
}human_t;
and then I will declare and initialize a variable of type:"human_t"
human_t peter = {
21,
1.95 };
then I will make a void pointer to peter :
void* ptr = &peter;
what I later need is to know that "ptr" is a pointer to variable of type "human_t". How can I do this ? is there some kind of a predefined method in C ? sorry for my ignorance :) still a beginner .

No, that is impossible "out of the box".
It would require run-time type information to be somehow associated with the pointer, and that simply does not exist. A void pointer is a memory address, and nothing more.
You can, of course, as with many things that exist in higher-level languages, implement it yourself.
For this case, you can require that each supported struct begins with an enum that specifies its type. You can then convert the pointer to struct into a pointer to that enum, read its value, and then know which type you're dealing with:
typedef enum {
ObjectType_Human,
ObjectType_Alien,
ObjectType_Predator,
ObjectType_Smurf,
} ObjectType;
typedef struct {
ObjectType type;
int age;
char name[32];
} human;
Then you can do:
void print_name(const void *obj)
{
const ObjectType *tp = obj; /* No cast required! */
switch(*tp)
{
case ObjectType_Human:
printf("the human is called %s\n", ((human *) obj)->name);
break;
/* and so on ... */
}
}
You could also, for instance, put the type information in a map, hashed on the pointer value.
There are many approaches; you're going to have to analyze and pick the best one for your particular application.

Alas this is not easy to implement in C.
One approach is to standardise your structures so that the first element is a type field.
The C standard guarantees that the address of the structure is always aligned to the address of the first element (i.e. there is no padding before the first structure member). So it's safe to first cast void* to the type field and, depending on that outcome, cast to the structure of your choice.

Is there a way to make GCC/Clang aware of inheritance in C?

I'm writing a C library that uses some simple object-oriented inheritance much like this:
struct Base {
int x;
};
struct Derived {
struct Base base;
int y;
};
And now I want to pass a Derived* to a function that takes a Base* much like this:
int getx(struct Base *arg) {
return arg->x;
};
int main() {
struct Derived d;
return getx(&d);
};
This works, and is typesafe of course, but the compiler doesn't know this. Is there a way to tell the compiler that this is typesafe? I'm focusing just on GCC and clang here so compiler-specific answers are welcome. I have vague memories of seeing some code that did this using __attribute__((inherits(Base)) or something of the sort but my memory could be lying.

This is safe in C except that you should cast the argument to Base *. The rule that prohibits aliasing (or, more precisely, that excludes it from being supported in standard C) is in C 2011 6.5, where paragraph 7 states:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
…
This rule prevents us from taking a pointer to float, converting it to pointer to int, and dereferencing the pointer to int to access the float as an int. (More precisely, it does not prevent us from trying, but it makes the behavior undefined.)
It might seems that your code violates this since it accesses a Derived object using a Base lvalue. However, converting a pointer to Derived to a pointer to Base is supported by C 2011 6.7.2.1 paragraph 15 states:
… A pointer to a structure object, suitably converted, points to its initial member…
So, when we convert the pointer to Derived to a pointer to Base, what we actually have is not a pointer to the Derived object using a different type than it is (which is prohibited) but a pointer to the first member of the Derived object using its actual type, Base, which is perfectly fine.
About the edit: Originally I stated function arguments would be converted to the parameter types. However, C 6.5.2.2 2 requires that each argument have a type that may be assigned to an object with the type of its corresponding parameter (with any qualifications like const removed), and 6.5.16.1 requires that, when assigning one pointer to another, they have compatible types (or meet other conditions not applicable here). Thus, passing a pointer to Derived to a function that takes a pointer to Base violates standard C constraints. However, if you perform the conversion yourself, it is legal. If desired, the conversion could be built into a preprocessor macro that calls the function, so that the code still looks like a simple function call.

Give address of a base member (truly type-safe option):
getx(&d.base);
Or use void pointer:
int getx(void * arg) {
struct Base * temp = arg;
return temp->x;
};
int main() {
struct Derived d;
return getx(&d);
};
It works because C requires that there is never a padding before the first struct member. This won't increase type safety, but removes the needs for casting.

As noted above by user694733, you are probably best off to conform to standards and type safety by using the address of the base field as in (repeating for future reference)
struct Base{
int x;
}
struct Derived{
int y;
struct Base b; /* look mam, not the first field! */
}
struct Derived d = {0}, *pd = &d;
void getx (struct Base* b);
and now despite the base not being the first field you can still do
getx (&d.b);
or if you are dealing with a pointer
getx(&pd->b).
This is a very common idiom. You have to be careful if the pointer is NULL, however, because the &pd->b just does
(struct Base*)((char*)pd + offsetof(struct Derived, b))
so &((Derived*)NULL)->b becomes
((struct Base*)offsetof(struct Derived, b)) != NULL.
IMO it is a missed opportunity that C has adopted anonymous structs but not adopted the plan9 anonymous struct model which is
struct Derived{
int y;
struct Base; /* look mam, no fieldname */
} d;
It allows you to just write getx(&d) and the compiler will adjust the Derived pointer to a base pointer i.e. it means exactly the same as getx(&d.b) in the example above. In other words it effectively gives you inheritance but with a very concrete memory layout model. In particular, if you insist on not embedding (== inheriting) the base struct at the top, you have to deal with NULL yourself. As you expect from inheritance it works recursively so for
struct TwiceDerived{
struct Derived;
int z;
} td;
you can still write getx(&td). Moreover, you may not need the getx as you can write d.x (or td.x or pd->x).
Finally using the typeof gcc extension you can write a little macro for downcasting (i.e. casting to a more derived struct)
#define TO(T,p) \
({ \
typeof(p) nil = (T*)0; \
(T*)((char*)p - ((char*)nil - (char*)0)); \
}) \
so you can do things like
struct Base b = {0}, *pb = &b;
struct Derived* pd = TO(struct Derived, pb);
which is useful if you try to do virtual functions with function pointers.
On gcc you can use/experiment with the plan 9 extensions with -fplan9-extensions. Unfortunately it does not seem to have been implemented on clang.

Why does a non-constant offsetof expression work?

Why does this work:
#include <sys/types.h>
#include <stdio.h>
#include <stddef.h>
typedef struct x {
int a;
int b[128];
} x_t;
int function(int i)
{
size_t a;
a = offsetof(x_t, b[i]);
return a;
}
int main(int argc, char **argv)
{
printf("%d\n", function(atoi(argv[1])));
}
If I remember the definition of offsetof correctly, it's a compile time construct. Using 'i' as the array index results in a non-constant expression. I don't understand how the compiler can evaluate the expression at compile time.
Why isn't this flagged as an error?

The C standard does not require this to work, but it likely works in some C implementations because offsetof(type, member) expands to something like:
type t; // Declare an object of type "type".
char *start = (char *) &t; // Find starting address of object.
char *p = (char *) &t->member; // Find address of member.
p - start; // Evaluate offset from start to member.
I have separated the above into parts to display the essential logic. The actual implementation of offsetof would be different, possibly using implementation-dependent features, but the core idea is that the address of a fictitious or temporary object would be subtracted from the address of the member within the object, and this results in the offset. It is designed to work for members but, as an unintended effect, it also works (in some C implementations) for elements of arrays in structures.
It works for these elements simply because the construction used to find the address of a member also works to find the address of an element of an array member, and the subtraction of the pointers works in a natural way.

it's a compile time construct
AFAICS, there are no such constraints. All the standard says is:
[C99, 7.17]:
The macro...
offsetof(type, member-designator)
...
The type and member designator shall be such that given
static type t;
then the expression &(t.member-designator) evaluates to an address constant.

offsetof (type,member)
Return member offset: This macro with functional form returns the offset value in bytes of member member in the data structure or union type type.
http://www.cplusplus.com/reference/cstddef/offsetof/
(C, C++98 and C++11 standards)

I think I understand this now.
The offsetof() macro does not evaluate to a constant, it evaluates to a run-time expression that returns the offset. Thus as long as type.member is valid syntax, the compiler doesn't care what it is. You can use arbitrary expressions for the array index. I had thought it was like sizeof and had to be constant at compile time.

There has been some confusion on what exactly is permitted as a member-designator. Here are two papers I am aware of:
DR 496
Offsetof for Pointers to Members
However, even quite old versions of GCC, clang, and ICC support calculating array elements with dynamic offset. Based on Raymond's blog I guess that MSVC has long supported it too.
I believe it is based out of pragmatism. For those not familiar, the "struct hack" and flexible array members use variable-length data in the last member of a struct:
struct string {
size_t size;
const char data[];
};
This type is often allocated with something like this:
string *string_alloc(size_t size) {
string *s = malloc(offsetof(string, data[size]));
s->size = size;
return s;
}
Admittedly, this latter part is just a theory. It's such a useful optimization that I imagine that initially it was permitted on purpose for such cases, or it was accidentally supported and then found to be useful for exactly such cases.

C, Struct pointer polymorphism

NOTE: this is NOT a C++ question, i can't use a C++ compiler, only a C99.
Is this valid(and acceptable, beautiful) code?
typedef struct sA{
int a;
} A;
typedef struct aB{
struct sA a;
int b;
} B;
A aaa;
B bbb;
void init(){
bbb.b=10;
bbb.a.a=20;
set((A*)&bbb);
}
void set(A* a){
aaa=*a;
}
void useLikeB(){
printf("B.b = %d", ((B*)&aaa)->b);
}
In short, is valid to cast a "sub class" to "super class" and after recast "super class" to "sub class" when i need specified behavior of it?
Thanks

First of all, the C99 standard permits you to cast any struct pointer to a pointer to its first member, and the other way (6.7.2.1 Structure and union specifiers):
13 Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
In other way, in your code you are free to:
Convert B* to A* — and it will always work correctly,
Convert A* to B* — but if it doesn't actually point to B, you're going to get random failures accessing further members,
Assign the structure pointed through A* to A — but if the pointer was converted from B*, only the common members will be assigned and the remaining members of B will be ignored,
Assign the structure pointed through B* to A — but you have to convert the pointer first, and note (3).
So, your example is almost correct. But useLikeB() won't work correctly since aaa is a struct of type A which you assigned like stated in point (4). This has two results:
The non-common B members won't be actually copied to aaa (as stated in (3)),
Your program will fail randomly trying to access A like B which it isn't (you're accessing a member which is not there, as stated in (2)).
To explain that in a more practical way, when you declare A compiler reserves the amount of memory necessary to hold all members of A. B has more members, and thus requires more memory. As A is a regular variable, it can't change its size during run-time and thus can't hold the remaining members of B.
And as a note, by (1) you can practically take a pointer to the member instead of converting the pointer which is nicer, and it will allow you to access any member, not only the first one. But note that in this case, the opposite won't work anymore!

I think this is quite dirty and relatively hazardous. What are you trying to achieve with this? also there is no guarantee that aaa is a B , it might also be an A. so when someone calls "uselikeB" it might fail. Also depending on architecture "int a" and "pointer to struct a" might either overlap correctly or not and might result in interesting stuff happening when you assign to "int a" and then access "struct a"

Why would you do this? Having
set((A*)&bbb);
is not easier to write than the correct
set(&bbb.a);
Other things that you should please avoid when you post here:
you use set before it is declared
aaa=a should be aaa = *a

First of all, I agree with most concerns from previous posters about the safety of this assignments.
With that said, if you need to go that route, I'd add one level of indirection and some type-safety checkers.
static const int struct_a_id = 1;
static const int struct_b_id = 2;
struct MyStructPtr {
int type;
union {
A* ptra;
B* ptrb;
//continue if you have more types.
}
};
The idea is that you manage your pointers by passing them through a struct that contains some "type" information. You can build a tree of classes on the side that describe your class tree (note that given the restrictions for safely casting, this CAN be represented using a tree) and be able to answer questions to ensure you are correctly casting structures up and down. So your "useLikeB" function could be written like this.
MyStructPtr the_ptr;
void init_ptr(A* pa)
{
the_ptr.type = struct_a_id
the_ptr.ptra = pa;
}
void useLikeB(){
//This function should FAIL IF aaa CANT BE SAFELY CASTED TO B
//by checking in your type tree that the a type is below the
//a type (not necesarily a direct children).
assert( is_castable_to(the_ptr.type,struct_b_id ) );
printf("B.b = %d", the_ptr.ptrb->b);
}
My 2 cents.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight