Address of members of a struct via NULL pointer

Address of members of a struct via NULL pointer - c

Why the following expression is not a (null pointer) runtime error?
typedef struct{
int a,b,c;
} st;
st obj={10,12,15};
st *ptr1=&obj;
st *ptr2=NULL;
printf("%d",*(int *)((char*)ptr1+(int)&ptr2->b));

Because you are performing pointer arithmetic on a NULL pointer which invokes undefined behavior - and that's not required to crash.

actually, in GNU C, &ptr2->b while st *ptr2=NULL; produce the data member 'b' 's byte offset in the struct, it is 4 here

Unfortunately, dereferencing a null pointer doesn't always crash! Sometimes it just happily moves ahead, accessing invalid memory!
You're probably seeing the output value '12'. What the compiler is doing when you dereference a pointer is adding an offset to get the address of member.
ints are typically 4 bytes long, so if ptr2 had an address of, say, 10, then ptr2->a must also be at address 10, ptr2->b must be at 14, and ptr2->c must be at address 18.
Since ptr2 is NULL, it's adding 4 to get to member b, and NULL+4=4. Then you're adding 4 to ptr1 and dereferencing that, which gets you ptr1's b member!
But this is not portable, and you shouldn't do this!

This is perfectly legal usage because the "take address of"-operator actually means: "Do not load/store the value of the variable, just give me the address you have computed to access it." It stops the access before the actual undefined behaviour is invoked.
This is such a typical idiom in programming that it was actually encapsulated in the offsetof() macro (which is used quite extensively in linux) and has since made its way into the compilers themselves to provide more meaningful error messages. See http://en.wikipedia.org/wiki/Offsetof for more details, even though I disagree with this article on the legality of the statement.

Related

Getting the offset of a variable inside a struct is based on the NULL pointer, but why?

I found a trick on a youtube video explaining how you can get the offset of a struct member by using a NULL pointer. I understand the code snippit below (the casts, the ampersand, and so on), but I do not understand why this works with the NULL pointer. I thought that the NULL pointer could not point to anything. So I cannot mentally visualize how it works. Second, the NULL pointer is not always represented by the compiler as being 0, somtimes it is a non-zero value. But than how could this piece of code work correctly ? Or wouldn't it work correctly anymore ?
#include <stdio.h>
int main(void)
{
/* Getting the offset of a variable inside a struct */
typedef struct {
int a;
char b[23];
float c;
} MyStructType;
unsigned offset = (unsigned)(&((MyStructType * )NULL)->c);
printf("offset = %u\n", offset);
return 0;
}

I found a trick on a youtube video explaining how you can get the
offset of a struct member by using a NULL pointer.
Well, at least you came here to ask about the random Internet advice you turned up. We're an Internet resource ourselves, of course, but I like to think that our structure and reputation gives you a basis for estimating the reliability of what we have to say.
I understand the
code snippit below (the casts, the ampersand, and so on), but I do not
understand why this works with the NULL pointer. I thought that the
NULL pointer could not point to anything.
Yes, from the perspective of C semantics, a null pointer definitely does not point to anything, and NULL is a null pointer constant.
So I cannot mentally
visualize how it works.
The (flawed) idea is that
NULL is equivalent to a pointer to address 0 in a flat address space (unsafe assumption);
((MyStructType * )NULL)->c designates the member c of an altogether hypothetical object of type MyStructType residing at that address (not supported by the standard);
applying the & operator yields the address that such a member would have if it in fact existed (not supported by the standard); and
converting the resulting address to an integer yields an address in the assumed flat address space, expressed in units the size of a C char (in no way guaranteed);
so that the resulting integer simultaneously represents both an absolute address and an offset (follows from the previous assumptions, because the supposed base address of the hypothetical structure is 0).
Second, the NULL pointer is not always
represented by the compiler as being 0, somtimes it is a non-zero
value.
Quite right, that is one of the flaws in the scheme presented.
But than how could this piece of code work correctly ? Or
wouldn't it work correctly anymore ?
Although the Standard provides no basis to justify relying on the code to behave as advertised, that does not mean that it must necessarily fail. C implementations do need to be internally consistent about how they represent null pointers, and -- to a certain degree -- about how they convert between pointers and integer. It turns out to be fairly common that the code's assumptions about those things are in fact satisfied by implementations.
So in practice, the code does work with many C implementations. But it systematically produces the wrong answer with some others, and there may be some in which it produces the right answer some appreciable fraction of the time, but the wrong answer the rest of the time.

Note that this code is actually undefined behaviour. Dereferencing a NULL pointer is never allowed, even if no value is accessed, only the address (this was a root cause for a linux kernel exploit)
Use offsetof instead for a save alternative.
As to why it seems works with a NULL pointer: it assumes that NULL is 0. Basically you could use any pointer and calculate:
MyStructType t;
unsigned off = (unsigned)(&(&t)->c) - (unsigned)&t;
if &t == 0, this becomes:
unsigned off = (unsigned)(&(0)->c) - 0;
Substracting 0 is a no-op

This code is platform specific. This code might cause undefined behaviour on one platform and it might work on others.
That's why the C standard requires every library to implement the offsetof macro which could expand to code like derefering the NULL pointer, at least you can be sure the code will not crash on any platform
typedef struct Struct
{
double d;
} Struct;
offsetof(Struct, d)

This question resembles me to something seen more than 30 years ago:
#define XtOffset(p_type,field) \
((Cardinal) (((char *) (&(((p_type)NULL)->field))) - ((char *) NULL)))
#ifdef offsetof
#define XtOffsetOf(s_type,field) offsetof(s_type,field)
#else
#define XtOffsetOf(s_type,field) XtOffset(s_type*,field)
#endif
from xorg-libXt/include/X11/Intrinsic.h X11R4.
They took into account that a NULL Pointer could be different to 0x0 and included that in the definition of the XtOffsetOf macro.

This is a dirty hack and might not necessarily work.
(MyStructType * )NULL creates a null pointer. Null pointer and null pointer constant are two different terms. NULL is guaranteed to be a null pointer constant equivalent to 0, but the obtained null pointer we get when casting it to another type can be any implementation-defined value.
So it happened to work by luck on your specific system, you could as well have gotten any strange value.
The offsetof macro has been standard C since 1989 so maybe your Youtube hacker is still stuck in the early 1980s.

Invalid pointer becoming valid again

int *p;
{
int x = 0;
p = &x;
}
// p is no longer valid
{
int x = 0;
if (&x == p) {
*p = 2; // Is this valid?
}
}
Accessing a pointer after the thing it points to has been freed is undefined behavior, but what happens if some later allocation happens in the same area, and you explicitly compare the old pointer to a pointer to the new thing? Would it have mattered if I cast &x and p to uintptr_t before comparing them?
(I know it's not guaranteed that the two x variables occupy the same spot. I have no reason to do this, but I can imagine, say, an algorithm where you intersect a set of pointers that might have been freed with a set of definitely valid pointers, removing the invalid pointers in the process. If a previously-invalidated pointer is equal to a known good pointer, I'm curious what would happen.)

By my understanding of the standard (6.2.4. (2))
The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.
you have undefined behaviour when you compare
if (&x == p) {
as that meets these points listed in Annex J.2:
— The value of a pointer to an object whose lifetime has ended is used (6.2.4).
— The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.9, 6.8).

Okay, this seems to be interpreted as a two- make that three part question by some people.
First, there were concerns if using the pointer for a comparison is defined at all.
As is pointed out in the comments, the mere use of the pointer is UB, since $J.2: says use of pointer to object whose lifetime has ended is UB.
However, if that obstacle is passed (which is well in the range of UB, it can work after all and will on many platforms), here is what I found about the other concerns:
Given the pointers do compare equal, the code is valid:
C Standard, §6.5.3.2,4:
[...] If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.
Although a footnote at that location explicitly says. that the address of an object after the end of its lifetime is an invalid pointer value, this does not apply here, since the if makes sure the pointer's value is the address of x and thus is valid.
C++ Standard, §3.9.2,3:
If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained. [ Note: For instance, the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array’s element type that might be located at that address.
Emphasis is mine.

It will probably work with most of the compilers but it still is undefined behavior. For the C language these x are two different objects, one has ended its lifetime, so you have UB.
More seriously, some compilers may decide to fool you in a different way than you expect.
The C standard says
Two pointers compare equal if and only if both are null pointers, both
are pointers to the same object (including a pointer to an object and
a subobject at its beginning) or function, both are pointers to one
past the last element of the same array object, or one is a pointer to
one past the end of one array object and the other is a pointer to the
start of a different array object that happens to immediately follow
the first array object in the address space.
Note in particular the phrase "both are pointers to the same object". In the sense of the standard the two "x"s are not the same object. They may happen to be realized in the same memory location, but this is to the discretion of the compiler. Since they are clearly two distinct objects, declared in different scopes the comparison should in fact never be true. So an optimizer might well cut away that branch completely.
Another aspect that has not yet been discussed of all that is that the validity of this depends on the "lifetime" of the objects and not the scope. If you'd add a possible jump into that scope
{
int x = 0;
p = &x;
BLURB: ;
}
...
if (...)
...
if (something) goto BLURB;
the lifetime would extend as long as the scope of the first x is reachable. Then everything is valid behavior, but still your test would always be false, and optimized out by a decent compiler.
From all that you see that you better leave it at argument for UB, and don't play such games in real code.

It would work, if by work you use a very liberal definition, roughly equivalent to that it would not crash.
However, it is a bad idea. I cannot imagine a single reason why it is easier to cross your fingers and hope that the two local variables are stored in the same memory address than it is to write p=&x again. If this is just an academic question, then yes it's valid C - but whether the if statement is true or not is not guaranteed to be consistent across platforms or even different programs.
Edit: To be clear, the undefined behavior is whether &x == p in the second block. The value of p will not change, it's still a pointer to that address, that address just doesn't belong to you anymore. Now the compiler might (probably will) put the second x at that same address (assuming there isn't any other intervening code). If that happens to be true, it's perfectly legal to dereference p just as you would &x, as long as it's type is a pointer to an int or something smaller. Just like it's legal to say p = 0x00000042; if (p == &x) {*p = whatever;}.

The behaviour is undefined. However, your question reminds me of another case where a somewhat similar concept was being employed. In the case alluded, there were these threads which would get different amounts of cpu times because of their priorities. So, thread 1 would get a little more time because thread 2 was waiting for I/O or something. Once its job was done, thread 1 would write values to the memory for the thread two to consume. This is not "sharing" the memory in a controlled way. It would write to the calling stack itself. Where variables in thread 2 would be allocated memory. Now, when thread 2 eventually got round to execution,all its declared variables would never have to be assigned values because the locations they were occupying had valid values. I don't know what they did if something went wrong in the process but this is one of the most hellish optimizations in C code I have ever witnessed.

Winner #2 in this undefined behavior contest is rather similar to your code:
#include <stdio.h>
#include <stdlib.h>
int main() {
int *p = (int*)malloc(sizeof(int));
int *q = (int*)realloc(p, sizeof(int));
*p = 1;
*q = 2;
if (p == q)
printf("%d %d\n", *p, *q);
}
According to the post:
Using a recent version of Clang (r160635 for x86-64 on Linux):
$ clang -O realloc.c ; ./a.out
1 2
This can only be explained if the Clang developers consider that this example, and yours, exhibit undefined behavior.

Put aside the fact if it is valid (and I'm convinced now that it's not, see Arne Mertz's answer) I still think that it's academic.
The algorithm you are thinking of would not produce very useful results, as you could only compare two pointers, but you have no chance to determine if these pointers point to the same kind of object or to something completely different. A pointer to a struct could now be the address of a single char for example.

Is it dangerous to use NULL pointer dereferencing at rvalue side ?

I have the following code.
int *a = NULL;
int b;
b = *a;
Is it undefined in any way?
I am trying to access the location pointed by a only for reading. What wrong it can do ?

Is it undefined in any way?
Yes, the behavior is undefined.
I am trying to access the location pointed by a only for reading.
The location pointed to does not exist. There's nothing there. To read something, there has to be a value to read, but a null pointer points... nowhere. It doesn't point at anything.

Implementation chooses the physical values of null-pointers specifically to make them "point" to address(es) that contains nothing of any value on the given platform. That means that there's nothing to see there. Moreover, on a modern virtual memory platform there might not even be a specific "there" associated with that address.
If you are doing it out of pure curiosity, it still won't work on a typical modern platform: that [virtual] memory region is typically "not yours", meaning that hardware/OS protection mechanism will prevent you from reading it (by crashing your program).
Meanwhile, the language states that any attempts to dereference a null-pointer leads to undefined behavior, regardless of what you are dereferencing it for.

Yes, it's totally undefined behaviour. You are dereferencing a null pointer. Anything could happen.

Don't dereference NULL pointers. It's like crossing the streams, but worse. I'm not even sure what you're trying to accomplish? Are you trying to initialize two pointers to NULL?

The result is undefined and will be different under different platforms.
NULL is equal to 0. Therefore saying int* a = NULL means that you make a pointer to address 0, which should contain an integer.
By dereferecing a by b = *a you instruct the program to go to the address 0 and get the data.
Now, depending on the platform, address 0 will most probably be inaccessible to the user code. If you are not running on top of an operating system (as the case is for example on a microcontroller environment), then address 0 is normally the beginning of the internal ROM. In other cases it can be the beginning of the stack etc.
In most of the cases you are going to get a null pointer exception when you dereference a null pointer.

You have two alarms that pop up here. NULL and all other forms of null pointers are strickly forbidden by the standard. Not because they are often realized internally by a bit pattern of all 0 for the representation of such a pointer but because that is the one-and-single reserved value for pointers that indicate that it isn't pointing anywhere. Don't do this.
Having a pointer point to the address that corresponds to the integer value 0 is a different thing. You could only enforce such a value and interpretation of a point by some trick, e.g
union over {
int *a;
uintptr_t b;
} ov;
ov.b = 0;
int c = *ov.a;
What then happens is completely depending on your platform. If 0 is a valid address in your address space this could work, it probably wouldn't. Don't do it unless you know better.

The state of a pointer after deallocation

What does the pointer of a dynamically allocated memory points to after calling the free() function.
Does the pointer points to NULL, or it still points to the same place it pointed before the deallocation.
does the implementation of free() has some kind of standard for this, or it's implemented differently in different platforms.
uint8_t * pointer = malloc(12);
printf("%p", pointer); // The current address the pointer points to
free (pointer);
printf("%p", pointer); // must it be NULL or the same value as before ?
Edit:
I know that the printf will produce the same results, I just want to know if I can count on that on different implementations.

The pointer value is not modified. You pass the pointer (memory address) by value to free(). The free() function does not have access to the pointer variable, so it cannot set it to NULL.
The two printf() calls should produce identical output.

According to the standard (6.2.4/2 of C99):
The value of a pointer becomes indeterminate when the object it points
to reaches the end of its lifetime.
In practice, all implementations I know of will print the same value twice for your example code. However, it is permitted that when you free the memory, the pointer value itself becomes a trap representation, and the implementation does something strange when you try to use the value of the pointer (even though you don't dereference it).
Supposing that an implementation wants to do something unusual, a hardware exception or program abort would be the most plausible I think. You probably have to imagine an implementation/hardware that does a lot of extra work, though, so that every time a pointer value is loaded into a register, it somehow checks whether the address is valid. That could be by checking it in the memory map (in which case I suppose my hypothetical implementation would only trap if the whole page containing the allocation has been released and unmapped), or some other means.

free() only deallocates the memory. The pointer is still pointing to the old location (dangling pointer), which you should manually set to NULL.
Setting the pointer to NULL is a good practice. As the memory location may be reused for other object, you may be able to access and modify data which doesn't belong to you. This is especially hard to debug, since it won't produce a crash, or produce crash at some point which is irrelevant. Setting to NULL will guarantee a re-producible crash if you ever access the non-existent object.

I tried this code in C++
#include<iostream>
using namespace std;
int main(void)
{
int *a=new int;
*a=5;
cout<<*a<<" "<<a<<"\n";
delete a;
*a=45;
cout<<*a<<" "<<a<<"\n";
}
The output was something like this
5 0x1066c20
45 0x1066c20
Running once more yielded a similar result
5 0x13e3c20
45 0x13e3c20
Hence it seems that in gcc the pointer still points to the same memory location after deallocation.
Moreover we can modify the value at that location.

Real thing about "->" and "."

I always wanted to know what is the real thing difference of how the compiler see a pointer to a struct (in C suppose) and a struct itself.
struct person p;
struct person *pp;
pp->age, I always imagine that the compiler does: "value of pp + offset of atribute "age" in the struct".
But what it does with person.p? It would be almost the same. For me "the programmer", p is not a memory address, its like "the structure itself", but of course this is not how the compiler deal with it.
My guess is it's more of a syntactic thing, and the compiler always does (&p)->age.
I'm correct?

p->q is essentially syntactic sugar for (*p).q in that it dereferences the pointer p and then goes to the proper field q within it. It saves typing for a very common case (pointers to structs).
In essence, -> does two deferences (pointer dereference, field dereference) while . only does one (field dereference).
Due to the multiple-dereference factor, -> can't be completely replaced with a static address by the compiler and will always include at least address computation (pointers can change dynamically at runtime, thus the locations will also change), whereas in some cases, . operations can be replaced by the compiler with an access to a fixed address (since the base struct's address can be fixed as well).

Updated (see comments):
You have the right idea, but there is an important difference for global and static variables only: when the compiler sees p.age for a global or static variable, it can replace it, at compile time, with the exact address of the age field within the struct.
In contrast, pp->age must be compiled as "value of pp + offset of age field", since the value of pp can change at runtime.

The two statements are not equivalent, even from the "compiler perspective". The statement p.age translates to the address of p + the offset of age, while pp->age translates to the address contained in pp + the offset of age.
The address of a variable and the address contained in a (pointer) variable are very different things.
Say the offset of age is 5. If p is a structure, its address might be 100, so p.age references address 105.
But if pp is a pointer to a structure, its address might be 100, but the value stored at address 100 is not the beginning of a person structure, it's a pointer. So the value at address 100 (the address contained in pp) might be, for example, 250. In that case, pp->age references address 255, not 105.

Since p is a local (automatic) variable, it is stored in the stack. Therefore the compiler accesses it in terms of offset with regard to the stack pointer (SP) or frame pointer (FP or BP, in architectures where it exists). In contrast, *p refers to a memory address [usually] allocated in the heap, so the stack registers are not used.

This is a question I've always asked myself.
v.x, the member operator, is valid only for structs.
v->x, the member of pointer operator, is valid only for struct pointers.
So why have two different operators, since only one is needed? For example, only the . operator could be used; the compiler always knows the type of v, so it knows what to do: v.x if v is a struct, (*v).x if v is a struct pointer.
I have three theories:
temporary shortsightedness by K&R (which theory I'd like to be false)
making the job easier for the compiler (a practical theory, given the conception time of C :)
readability (which theory I prefer)
Unfortunately, I don't know which one (if any) is true.

In both cases the structure and its members are addressed by
address(person) + offset(age)
Using p with a struct stored in the stack memory gives the compiler more options to optimize memory usage. It could store the age only, instead of the whole struct if nothing else is used - this makes addressing with the above function fail (I think reading the address of a struct stops this optimization).
A struct on the stack may have no memory address at all. If the struct is small enough and only lives a short time it can be mapped to some of the processors registers (same as for the optimization above for reading the address).
The short answer: when the compiler does not optimize you are right. As soon as the compiler starts optimizing only what the c standard specifies is guaranteed.
Edit: Removed flawed stack/heap location for "pp->" since the pointed to struct can be on both heap and stack.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight