What does ((struct name *)0)->member) do in C? [duplicate] - c

This question already has answers here:
How does the C offsetof macro work? [duplicate]
(4 answers)
Closed 7 years ago.
What does ((struct name *)0)->member) do in C?
The actual C statement I came across was this:
(char *) &((struct name *)0)->member)

This is a trick for getting the offset of struct's member called member. The idea behind this approach is to have the compiler compute the address of member assuming that the structure itself is located at address zero.
Using offsetof offers a more readable alternative to this syntax:
size_t offset = offsetof(struct name, member);

(struct name *)0 is casting 0 to pointer to struct name.
&((struct name *)0)->member) is getting the address of member member.
(char *) &((struct name *)0)->member) is casting that address to char *.
In case anyone thinks that the above expression is dereferencing a NULL pointer, then note that there is no dereferencing here. It's all for getting the address of member number.

(struct name*)0 gives you a struct pointer.
((struct name*)0)->member gives you the member of the struct where the pointer points to.
Adding & gives you the address of that member, and lastly
(char *) is to cast the obtained address to a char pointer.

On many compilers, the above expression will yield a char* which, while it isn't a valid pointer, has a one or both of the following properties:
Casting the pointer directly to an integer type will yield the displacement of the indicated member within the structure.
Subtracting (char*)0 from the pointer will yield the displacement of the indicated member within the structure.
Note that the C Standard imposes no requirements with regard to what may happen if code forms an invalid pointer value via the above means or any other, even if the code makes no attempt to dereference the pointer. While many compilers produce pointers with the indicated qualities, such behavior is not universal. Even on platforms where there exists a clear natural relationship between pointers and integers, some compiler vendors may decide that having programs behave as implied by such a relationship, it would be more "efficient" to instead have the compiler to assume that a program will never receive inputs that would cause the generation of invalid pointers and, consequently, that any code which would only be relevant if such inputs were received should be omitted.

Related

Determining offset of a struct member via casting NULL to a struct pointer [duplicate]

This question already has answers here:
Does &((struct name *)NULL -> b) cause undefined behaviour in C11?
(6 answers)
Getting the offset of a variable inside a struct is based on the NULL pointer, but why?
(5 answers)
Closed 6 months ago.
Consider the following code:
#include <inttypes.h>
struct test {
int a;
long b;
};
int main(void){
intptr_t ptr = (intptr_t) &(((struct test *)NULL)->b);
printf("%"PRIiPTR"\n", ptr); //prints 8
//...
}
I've been using this sort of construction for pretty long time to determine offset of struct member, but now questioned if this construction a well defined in terms of the Standard standpoint.
The thing that's not clear to me is why is it legitimate to perform indirection on a NULL pointer with later taking address?
This is not valid code, as it dereferences a NULL pointer and attempts to obtain an lvalue for a non-existent object.
There is a language construct that gives the offset of a struct member, namely the offsetof macro.
Although it is possible that such a macro could expand to an expression similar to what you have, that macro is considered part of the implementation, and such code can do things that "user" code is not allowed to do

what does this line of code "#define LIBINJECTION_SQLI_TOKEN_SIZE sizeof(((stoken_t*)(0))->val)" do?

In particular I'd like to know what ->val does in the
sizeof(((stoken_t*)(0))->val)
and what stoken_t*(0) pointer do, in particular what the (0) means?
I hope I have formulated my question clearly enough.
This is a way of accessing a member of a structure at compile time, without needing to have a variable defined of that structure type.
The cast (stoken_t*) to a value of 0 emulates a pointer of that structure type, allowing you to make use of the -> operator on that, just like you would use it on a pointer variable of that type.
To add, as sizeof is a compile time operator, the expression is not evaluated at run-time, so unlike other cases, here there is no null-pointer dereference happening.
It is analogous to something like
stoken_t * ptr;
sizeof(ptr->val);
In detail:
(stoken_t*)(0) simply casts 0 (this could be an arbitrary numeric literal) to a pointer to stoken_t, ((stoken_t*)(0)->val) is then the type of the val member of stoken_t and sizeof returns the number of bytes this type occupies in memory. In short, this expression finds the size of a struct member at compile time without the need for an instance of that struct type.

Pointer manipulation in C

I am new to C programming. While solving one of my class assignments, I came across the following code snippet. I did not understand what it does.
Can any one tell me what is the meaning of following C syntax,
((char *)0 +1) or ((int*)0 +1))
The (char *) 0 part creates a pointer to character data, at address 0. This address is then incremented by one, triggering undefined behavior since pointers to address 0 (also known as NULL in C) cannot be used in pointer arithmetic. The second part does the same but for pointer to integer data.
If the compiler simply treats NULL as the address (which is common but, again, not required which is why this is undefined behavior) the resulting addresses, if viewed numerically, will not be the same, since pointer arithmetic in C is done in terms of the type being pointed at, and typically sizeof (int) > sizeof (char).
Can any one tell me what is the meaning of following C syntax,
((char *)0 +1) or ((int*)0 +1))
Nothing by the terms of the C standard, because it's not defined. This code invokes undefined behavior on part of the C compiler. Let me explain:
In C every pointer may either point to some object of the type the pointer dereferences to or it may be 0, which is then called a null pointer. Null pointers can not be used in →pointer arithmetic.
Note that the actual representation of a null pointer on the metal, i.e. the bits the variable has on the machine may be something different than all zeros. But on the C side of things the null pointer always compares equal to an integer of the value 0. Moreover null pointers of different types also compare equal by definition. However comparisons of non null pointers of different types invokes undefined behavior. Also you can cast any pointer to a void* pointer, and back. Also you can cast every pointer to an integer of type uintptr_t and back. But casting from a pointer to type A to a pointer of type B (where B is not void*) invokes undefined behavior.
The special function malloc is defined by the C language specification to return a void* pointer that can be cast to any pointer type, though. But say you use it to allocate some memory for an array of char and later you cast that to int this again invokes undefined behavior.
Now you may ask: "What is undefined behavior?". Well, it just means, that the language standard doesn't define it and an implementer may go about it in any way seen fit. On most plattforms writing something like ((char*)0 + 1) may do something naively expected (creating a pointer, pointing to address 1), but it may as well make the compiler build an artificial intelligence, that at first chases you down the street, then gains consciousness and finally takes over the world, turning humans into batteries. So be careful about what you do ;)
In C you have to tell compiler which type you mean to use, this is called "casting".
For example:
char *c; //define c as "char pointer" (pointer to char)
c = ((char *)0 + 1); //this casts "0 + 1" to "char pointer" type, in this example not strictly necessary but adds some clarification to code

Extraneous (void *) in: type_a sample; type_b *sample_b = (type_b *) ((void*) &sample);

I was reading this thread: Typecasting variable with another typedef
type_b *sample_b = (type_b *) ((void *) &sample);
Isn't (void *) extraneous? &sample would return a pointer of type: type_a, which can be cast directly to (type_b *). Why the extra (void *)? I feel it's wrong, but am not confident enough in my C - hence the extra verification.
GCC has the ability to optimize code based on the fact that it detects that two pointers that point to incompatible types point to the same memory location. Using both pointers to access the value will give a warning about an aliasing violation then ("type punning").
Sometimes, if you put a (void*) cast in between the one cast operands, before casting to the other point type, will silence false positive warnings in cases where you can legally do such an overlapping access.
Casting the pointer to type void hides the pointer's true type from the compiler so you can cast it to whatever different type you want and the compiler will not have enough information to know if it's valid or not, thus it will allow it.
Without doing this, the compiler will - at a minumum - complain that you are trying to cast a pointer from one type to another incompatible type.
IMPORTANT NOTE:
The original poster of the question you referenced is trying to do something dangerous. The assumption is the alignment of the structures in memory will be the same, allowing them to access the two individual char's as if they were a single array of two chars. This may be true in theory and will probably work in a small test program, but if this is done in a piece of production code, it will absolutely end in tears and gnashing of teeth at 3am.

Why does this implementation of offsetof() work?

In ANSI C, offsetof is defined as below.
#define offsetof(st, m) \
((size_t) ( (char *)&((st *)(0))->m - (char *)0 ))
Why won't this throw a segmentation fault since we are dereferencing a NULL pointer? Or is this some sort of compiler hack where it sees that only address of the offset is taken out, so it statically calculates the address without actually dereferencing it? Also is this code portable?
At no point in the above code is anything dereferenced. A dereference occurs when the * or -> is used on an address value to find referenced value. The only use of * above is in a type declaration for the purpose of casting.
The -> operator is used above but it's not used to access the value. Instead it's used to grab the address of the value. Here is a non-macro code sample that should make it a bit clearer
SomeType *pSomeType = GetTheValue();
int* pMember = &(pSomeType->SomeIntMember);
The second line does not actually cause a dereference (implementation dependent). It simply returns the address of SomeIntMember within the pSomeType value.
What you see is a lot of casting between arbitrary types and char pointers. The reason for char is that it's one of the only type (perhaps the only) type in the C89 standard which has an explicit size. The size is 1. By ensuring the size is one, the above code can do the evil magic of calculating the true offset of the value.
Although that is a typical implementation of offsetof, it is not mandated by the standard, which just says:
The following types and macros are defined in the standard header <stddef.h> [...]
offsetof(type,member-designator)
which expands to an integer constant expression that has type size_t, the value of
which is the offset in bytes, to the structure member (designated by member-designator),
from the beginning of its structure (designated by type). The type and member designator
shall be such that given
statictypet;
then the expression &(t.member-designator) evaluates to an address constant. (If the specified member is a bit-field, the behavior is undefined.)
Read P J Plauger's "The Standard C Library" for a discussion of it and the other items in <stddef.h> which are all border-line features that could (should?) be in the language proper, and which might require special compiler support.
It's of historic interest only, but I used an early ANSI C compiler on 386/IX (see, I told you of historic interest, circa 1990) that crashed on that version of offsetof but worked when I revised it to:
#define offsetof(st, m) ((size_t)((char *)&((st *)(1024))->m - (char *)1024))
That was a compiler bug of sorts, not least because the header was distributed with the compiler and didn't work.
In ANSI C, offsetof is NOT defined like that. One of the reasons it's not defined like that is that some environments will indeed throw null pointer exceptions, or crash in other ways. Hence, ANSI C leaves the implementation of offsetof( ) open to compiler builders.
The code shown above is typical for compilers/environments that do not actively check for NULL pointers, but fail only when bytes are read from a NULL pointer.
To answer the last part of the question, the code is not portable.
The result of subtracting two pointers is defined and portable only if the two pointers point to objects in the same array or point to one past the last object of the array (7.6.2 Additive Operators, H&S Fifth Edition)
Listing 1: A representative set of offsetof() macro definitions
// Keil 8051 compiler
#define offsetof(s,m) (size_t)&(((s *)0)->m)
// Microsoft x86 compiler (version 7)
#define offsetof(s,m) (size_t)(unsigned long)&(((s *)0)->m)
// Diab Coldfire compiler
#define offsetof(s,memb) ((size_t)((char *)&((s *)0)->memb-(char *)0))
typedef struct
{
int i;
float f;
char c;
} SFOO;
int main(void)
{
printf("Offset of 'f' is %zu\n", offsetof(SFOO, f));
}
The various operators within the macro are evaluated in an order such that the following steps are performed:
((s *)0) takes the integer zero and casts it as a pointer to s.
((s *)0)->m dereferences that pointer to point to structure member m.
&(((s *)0)->m) computes the address of m.
(size_t)&(((s *)0)->m) casts the result to an appropriate data type.
By definition, the structure itself resides at address 0. It follows that the address of the field pointed to (Step 3 above) must be the offset, in bytes, from the start of the structure.
It doesn't segfault because you're not dereferencing it. The pointer address is being used as a number that's subtracted from another number, not used to address memory operations.
It calculates the offset of the member m relative to the start address of the representation of an object of type st.
((st *)(0)) refers to a NULL pointer of type st *.
&((st *)(0))->m refers to the address of member m in this object. Since the start address of this object is 0 (NULL), the address of member m is exactly the offset.
char * conversion and the difference calculates the offset in bytes. According to pointer operations, when you make a difference between two pointers of type T *, the result is the number of objects of type T represented between the two addresses contained by the operands.
Quoting the C standard for the offsetof macro:
C standard, section 6.6, paragraph 9
An address constant is a null pointer, a pointer to an lvalue designating an object of static storage duration, or a pointer to a function designator; it shall be created explicitly using the unary & operator or an integer constant cast to pointer type, or implicitly by the use of an expression of array or function type. The array-subscript [] and member-access . and -> operators, the address & and indirection * unary operators, and pointer casts may be used in the creation of an address constant, but the value of an object shall not be accessed by use of these operators.
The macro is defined as
#define offsetof(type, member) ((size_t)&((type *)0)->member)
and the expression comprises the creation of an address constant.
Although genuinely speaking, the result is not an address constant because it does not point to an object of static storage duration. But this is still agreed upon that the value of an object shall not be accessed, so the integer constant cast to pointer type will not be dereferenced.
Also, consider this quote from the C standard:
C standard, section 7.19, paragraph 3
The type and member designator shall be such that given
static type t;
then the expression &(t.member-designator) evaluates to an address constant. (If the
specified member is a bit-field, the behavior is undefined.)
A struct in C is a composite data type (or record) declaration that defines a physically grouped list of variables under one name in a block of memory, allowing the different variables to be accessed via a single pointer or by the struct declared name which returns the same address.
From the compiler perspective, the struct declared name is an address and the member designator is an offset from that address.

Resources