in qemu source code, I have the following macro named offsetof. Can anybody tell me what it does?
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *) 0)->MEMBER)
It's used in this manner :
offsetof(CPUState, icount_decr.u32)
where CPUState is a struct.
I think it gives the offset of the member inside a struct, but I'm not sure.
EDIT:Yeah, I found out what was happening. The definition of CPUState had a macro inside, which I missed, which included the variable icount_decr.
It gets the offset of the member of a struct. It does so by casting address zero to a struct of that type then taking the address of the member.
Your thinking is correct! And the name of the macro gives a good hint, too. ;)
It's defined in §7.17/3:
offsetof(type, member-designator)
which expands to an integer constant expression that has type size_t, the value of
which is the offset in bytes, to the structure member (designated by member-designator),
from the beginning of its structure (designated by type). The type and member designator
shall be such that given
static type t;
then the expression &(t.member-designator) evaluates to an address constant. (If the
specified member is a bit-field, the behavior is undefined.)
Because the library doesn't have to necessarily follow language rules, an implementation is free to get the result however it pleases.
So the result of this particular implementation is not undefined behavior, because you aren't suppose to care how it's implemented. (In other words, your implementation makes the guarantee that taking the address of an indirection through a null pointer is well-defined. You of course can't assume this in your own programs.)
If that some library has (re)defined offsetof, they've made your program behavior undefined and should be using the standard library instead. (The dummies.)
Related
I encountered a problem in reading a piece of C code. code show as below :
#define size_of_attribute(Struct, Attribute) sizeof(((Struct*)0)->Attribute)
The function of this macro function is gets the length of the attribute in the struct. I know what this function is for, but i can't understand the meaning of "((Struct*)0)".
I will appeaciate If you can give me some explanation :).
The constant value 0 qualifies as a null pointer constant. The expression (Struct*)0 is therefore casting that null pointer constant to a pointer of type Struct *. The expression then gets the Attribute member.
Attempting to evaluate ((Struct*)0)->Attribute would result in a null pointer defererence, however this expression is the argument to the sizeof operator. This means the expression is not actually evaluated but simply examined to determine its type.
So sizeof(((Struct*)0)->Attribute) gives you the size of the Attribute member of the struct named Struct without having to have an object of that type.
This is basically accessing a member variable type without actually mentioning / creating any variable of that structure type.
Here,
the 0 is casted to the structure type pointer, and
then that pointer is used to access the member variable
which is used as the operand of sizeof operator.
Since sizeof is a compile time operation, the NULL dereference never actually executes at runtime.
It's casting a null pointer to the Struct* type so it can determine the size of the attribute of that struct. Normally, reading an attribute from NULL is illegal, but for sizeof, it doesn't actually read anything, it just looks at the definition of the struct to determine the statically defined size of the attribute of any such struct.
At least for C++, this is useful because unlike a non-pointer-based:
sizeof(Struct{}.Attribute)
it doesn't require Struct to have a default constructor. A pointer can be made with no knowledge of how to construct the object, while an actual object (even if none is actually constructed) must still be constructed in a valid way, and you can't say with any reliability how an arbitrary struct can be legally constructed.
In particular I'd like to know what ->val does in the
sizeof(((stoken_t*)(0))->val)
and what stoken_t*(0) pointer do, in particular what the (0) means?
I hope I have formulated my question clearly enough.
This is a way of accessing a member of a structure at compile time, without needing to have a variable defined of that structure type.
The cast (stoken_t*) to a value of 0 emulates a pointer of that structure type, allowing you to make use of the -> operator on that, just like you would use it on a pointer variable of that type.
To add, as sizeof is a compile time operator, the expression is not evaluated at run-time, so unlike other cases, here there is no null-pointer dereference happening.
It is analogous to something like
stoken_t * ptr;
sizeof(ptr->val);
In detail:
(stoken_t*)(0) simply casts 0 (this could be an arbitrary numeric literal) to a pointer to stoken_t, ((stoken_t*)(0)->val) is then the type of the val member of stoken_t and sizeof returns the number of bytes this type occupies in memory. In short, this expression finds the size of a struct member at compile time without the need for an instance of that struct type.
I have a question about register keyword in C.
I found that register array name(e.g. array) can be assigned to pointer variable while &array[0] cannot be.
Can you explain why array name can be assigned to pointer? Or, if someone already explained it, please let me know links so that I can take a look to get the answer. Thank you.
Here is what I tried:
I read cppreference which explains register keyword and it says:
register arrays are not convertible to pointers.
Also, I read C89 draft which says:
The implementation may treat any register declaration simply as an auto declaration. However, whether or not addressable storage is actually used, the address of any part of an object declared with storage-class specifier register may not be computed, either explicitly (by use of the unary & operator as discussed in 3.3.3.2) or implicitly (by converting an array name to a pointer as discussed in 3.2.2.1). Thus the only operator that can be applied to an array declared with storage-class specifier register is sizeof.
This looks like I can't assign register array name to pointer to get its address.
Moreover, to find the answer, I searched here and found this question:
Address of register variable. There are good answers, however, still I couldn't find the answer what I wanted.
Here is the code which I tested. I compiled this code through Clang with flag -std=c89:
register int array[10];
int* p;
p = array; // Compiled without warning or error
p = &array[0]; // Compiled with error which I expected
I expected both p = array; and p = &array[0]; caused compiled error, but only p = &array[0]; made a compile error.
This is a bug in the Clang compiler.
GCC shows an error on both lines.
In discussing the automatic conversion of “array of type” to “pointer to type”, C 2018 6.3.2.1 says:
… If the array object has register storage class, the behavior is undefined.
Further, C 2018 footnote 124 says:
… the address of any part of an object declared with storage-class specifier register cannot be computed, either explicitly (by use of the unary & operator as discussed in 6.5.3.2) or implicitly (by converting an array name to a pointer as discussed in 6.3.2.1)…
Obviously, p = array; converts array to a pointer, as discussed in 6.3.2.1, but this footnote (which is non-normative but tells us the intent in this case quite explicitly) says the value for that pointer cannot be computed.
Since the behavior is not defined by the C standard, a C implementation could define it as an extension. However, the inconsistent behavior between array and &array[0] suggests this is not a deliberate extension by Clang but is a mistake.
Elements of the struct must be properly aligned, so struct can be padded with non-used bytes.
Size of the struct is always the same (since it's a constant expression).
So I presume compiler must somehow use the same padding every time such as:
placing a single struct instance
passing a struct instance by value to a function
returning a struct instance by value from a function
in the array of structs
If I'm correct with this presumption, can you please point me where does C standard says so? Are there any strict rules for padding placement that guarantee this property?
C locks down what the offsets of fields are in a particular structure type via the specification of the offsetof macro, in §7.17 of this version of the draft, which is describing the required macros in <stddef.h>.
The macros are
NULL
which expands to an implementation-defined null pointer constant; and
offsetof(type, member-designator)
which expands to an integer constant expression that has type size_t, the value of
which is the offset in bytes, to the structure member (designated by member-designator),
from the beginning of its structure (designated by type). The type and member designator
shall be such that given
static type t;
then the expression &(t.member-designator) evaluates to an address constant. (If the
specified member is a bit-field, the behavior is undefined.)
Since it is a constant expression, every use of a particular structure type must use the same offset (or this would all be completely crazy).
Two different struct types, even if declared the same, are not constrained by this to have the same layout. (For example, if the types were declared under the control of different structure layout pragmas, they would not be expected to necessarily have the same layout, yet the declarations would look prima facie identical. typedef is important!)
In ANSI C, offsetof is defined as below.
#define offsetof(st, m) \
((size_t) ( (char *)&((st *)(0))->m - (char *)0 ))
Why won't this throw a segmentation fault since we are dereferencing a NULL pointer? Or is this some sort of compiler hack where it sees that only address of the offset is taken out, so it statically calculates the address without actually dereferencing it? Also is this code portable?
At no point in the above code is anything dereferenced. A dereference occurs when the * or -> is used on an address value to find referenced value. The only use of * above is in a type declaration for the purpose of casting.
The -> operator is used above but it's not used to access the value. Instead it's used to grab the address of the value. Here is a non-macro code sample that should make it a bit clearer
SomeType *pSomeType = GetTheValue();
int* pMember = &(pSomeType->SomeIntMember);
The second line does not actually cause a dereference (implementation dependent). It simply returns the address of SomeIntMember within the pSomeType value.
What you see is a lot of casting between arbitrary types and char pointers. The reason for char is that it's one of the only type (perhaps the only) type in the C89 standard which has an explicit size. The size is 1. By ensuring the size is one, the above code can do the evil magic of calculating the true offset of the value.
Although that is a typical implementation of offsetof, it is not mandated by the standard, which just says:
The following types and macros are defined in the standard header <stddef.h> [...]
offsetof(type,member-designator)
which expands to an integer constant expression that has type size_t, the value of
which is the offset in bytes, to the structure member (designated by member-designator),
from the beginning of its structure (designated by type). The type and member designator
shall be such that given
statictypet;
then the expression &(t.member-designator) evaluates to an address constant. (If the specified member is a bit-field, the behavior is undefined.)
Read P J Plauger's "The Standard C Library" for a discussion of it and the other items in <stddef.h> which are all border-line features that could (should?) be in the language proper, and which might require special compiler support.
It's of historic interest only, but I used an early ANSI C compiler on 386/IX (see, I told you of historic interest, circa 1990) that crashed on that version of offsetof but worked when I revised it to:
#define offsetof(st, m) ((size_t)((char *)&((st *)(1024))->m - (char *)1024))
That was a compiler bug of sorts, not least because the header was distributed with the compiler and didn't work.
In ANSI C, offsetof is NOT defined like that. One of the reasons it's not defined like that is that some environments will indeed throw null pointer exceptions, or crash in other ways. Hence, ANSI C leaves the implementation of offsetof( ) open to compiler builders.
The code shown above is typical for compilers/environments that do not actively check for NULL pointers, but fail only when bytes are read from a NULL pointer.
To answer the last part of the question, the code is not portable.
The result of subtracting two pointers is defined and portable only if the two pointers point to objects in the same array or point to one past the last object of the array (7.6.2 Additive Operators, H&S Fifth Edition)
Listing 1: A representative set of offsetof() macro definitions
// Keil 8051 compiler
#define offsetof(s,m) (size_t)&(((s *)0)->m)
// Microsoft x86 compiler (version 7)
#define offsetof(s,m) (size_t)(unsigned long)&(((s *)0)->m)
// Diab Coldfire compiler
#define offsetof(s,memb) ((size_t)((char *)&((s *)0)->memb-(char *)0))
typedef struct
{
int i;
float f;
char c;
} SFOO;
int main(void)
{
printf("Offset of 'f' is %zu\n", offsetof(SFOO, f));
}
The various operators within the macro are evaluated in an order such that the following steps are performed:
((s *)0) takes the integer zero and casts it as a pointer to s.
((s *)0)->m dereferences that pointer to point to structure member m.
&(((s *)0)->m) computes the address of m.
(size_t)&(((s *)0)->m) casts the result to an appropriate data type.
By definition, the structure itself resides at address 0. It follows that the address of the field pointed to (Step 3 above) must be the offset, in bytes, from the start of the structure.
It doesn't segfault because you're not dereferencing it. The pointer address is being used as a number that's subtracted from another number, not used to address memory operations.
It calculates the offset of the member m relative to the start address of the representation of an object of type st.
((st *)(0)) refers to a NULL pointer of type st *.
&((st *)(0))->m refers to the address of member m in this object. Since the start address of this object is 0 (NULL), the address of member m is exactly the offset.
char * conversion and the difference calculates the offset in bytes. According to pointer operations, when you make a difference between two pointers of type T *, the result is the number of objects of type T represented between the two addresses contained by the operands.
Quoting the C standard for the offsetof macro:
C standard, section 6.6, paragraph 9
An address constant is a null pointer, a pointer to an lvalue designating an object of static storage duration, or a pointer to a function designator; it shall be created explicitly using the unary & operator or an integer constant cast to pointer type, or implicitly by the use of an expression of array or function type. The array-subscript [] and member-access . and -> operators, the address & and indirection * unary operators, and pointer casts may be used in the creation of an address constant, but the value of an object shall not be accessed by use of these operators.
The macro is defined as
#define offsetof(type, member) ((size_t)&((type *)0)->member)
and the expression comprises the creation of an address constant.
Although genuinely speaking, the result is not an address constant because it does not point to an object of static storage duration. But this is still agreed upon that the value of an object shall not be accessed, so the integer constant cast to pointer type will not be dereferenced.
Also, consider this quote from the C standard:
C standard, section 7.19, paragraph 3
The type and member designator shall be such that given
static type t;
then the expression &(t.member-designator) evaluates to an address constant. (If the
specified member is a bit-field, the behavior is undefined.)
A struct in C is a composite data type (or record) declaration that defines a physically grouped list of variables under one name in a block of memory, allowing the different variables to be accessed via a single pointer or by the struct declared name which returns the same address.
From the compiler perspective, the struct declared name is an address and the member designator is an offset from that address.