How are void pointers implemented? - c

I was wondering how void pointers are implemented. I tried to find it out on Godbolt with x86-64 (you can see it here), but it didn't reveal anything. How are void pointers implemented?
Edit:
This is the code I used:
int main() {
int volatile y = 123;
void* volatile x = &y;
}
All I was trying to see here is what x would look like. I put volatile so that gcc wouldn't eliminate it as dead code.

Generally speaking, all pointers on an x86-64 processor are simply 8 byte values containing some memory address. Only the compiler cares about what they point it.

From my perspective, the asm clearly reveals that void* uses the same object-representation as other pointers, such as int*, so casting to and from void* is a no-op that just keeps the compiler's type system happy.
Everything in asm is just bytes that you can do integer operations on if you want. Pointers are just integers that you can dereference. e.g. in x86-64 asm, +1 to a uintptr_t is no different than +1 to a char*, because C defines sizeof(char) as 1, and x86-64 is byte addressable so every integer increment to a pointer is a new byte. (So C implementations on x86-64 use CHAR_BIT=8).
void* is just a type you that can hold any pointer value, but that doesn't let you do math on it with +1 or whatever. OTOH in C you don't need to cast to assign other pointer types to/from it.
All of this follows from x86-64 having a flat memory model, not seg:off or something. And that function pointers have the same representation as data pointers. On hypothetical machines (or C implementations) you might see something special for void*.
e.g. a machine that emulated CHAR_BIT=8 on top of word-addressable memory might have sizeof(char*) > sizeof(int*), but void* would have to be wide enough to hold any possible pointer and might even have a different format than either. (Having a narrow char that you can't actually store in a thread-safe way (without a non-atomic RMW of the containing word) wouldn't be viable for C11.)
Semi-related: Does C have an equivalent of std::less from C++? talks about how C pointers might work in other hypothetical non-simple implementations. A seg:off pointer model wouldn't make void* different from int* though.

Adding to the answer above, memory has no type, whatsoever. Any generally speaking, every pointer type is basically an unsigned long integer value, pointing to some address in the memory.
Originally C language didn't have any void*, char* was the generic pointer type. And to quote from C: A Reference Manual:
the problem with this use of char* is that the compiler cannot check that
programmers always convert the pointer type properly

Related

Why is it possible to store the information content of an int pointer to an int variable in c?

Let us consider the following piece of code:
#include <stdio.h>
int main()
{
int v1, v2, *p;
p = &v1;
v2 = &v1;
printf("%d\t%d\n",p,v2);
printf("%d\t%d\n",sizeof(v2),sizeof(p));
return 0;
}
We can see, as expected, that the v2 variable (int) occupies 4 bytes and that the p variable (int pointer) occupies 8 bytes.
So, if a pointer occupies more than 4 bytes of memory, why we can store its content in an int variable?
In the underlying implementation, does the pointer variables store only the memory address of another variable, or it stores something else?
We can see, as expected, that the v2 variable (int) occupies 4 bytes
and that the p variable (int pointer) occupies 8 bytes.
I'm not sure what exactly the source of your expectation is there. The C language does not specify the sizes of ints or pointers. Its requirements on the range of representable values of type int afford int size as small as two 8-bit bytes, and historically, that was once a relatively common size for int. Some implementations these days have larger ints (and maybe also larger char, which is the unit of measure for sizeof!).
I suppose that your point here is that in the implementation tested, the size of int is smaller than the size of int *. Fair enough.
So, if a pointer occupies more than 4 bytes of memory, why we can
store its content in an int variable?
Who says the code stores the pointer's (entire) content in the int? It converts the pointer to an int,* but that does not imply that the result contains enough information to recover the original pointer value.
Exactly the same applies to converting a double to an int or an int to an unsigned char (for example). Those assignments are allowed without explicit type conversion, but they are not necessarily value-preserving.
Perhaps your confusion is reflected in the word "content". Assignment does not store the representation of the right-hand side to the left-hand object. It converts the value, if necessary, to the target object's type, and stores the result.
In the underlying implementation, does the pointer variables store
only the memory address of another variable, or it stores something
else?
Implementations can and have varied, and so too the meaning of "address" for different machines. But most commonly these days, pointers are represented as binary numbers designating locations in a flat address space.
But that's not really relevant. C specifies that pointers can be converted to integers and vice versa. It also provides integer types intptr_t and uintptr_t (in stdint.h) that support full-fidelity round trip void * to integer to void * conversion. Pointer representation is irrelevant to all that. It is the implementation's responsibility to implement the types and conversions involved so that they behave as required, and there is more than one way to do that.
*C actually requires an explicit conversion -- that is, a typecast -- between pointers and integer. The language specification does not define the meaning of the cast-less assignment in the example code, but some compilers do accept that and perform the needed conversion implicitly. My remarks assume such an implementation.
There is always a warning, see below.
main.c: In function ‘main’:
main.c:6:8: warning: assignment to ‘int’ from ‘int *’ makes integer from pointer without a cast [-Wint-conversion]
v2 = &v1;
main.c: In function ‘main’:
main.c:6:10: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
v2 = (int) &v1;
In the first case, just setting an integer value to a pointer value is not appropriate, because it is not a compatible type.
In the second case, with a cast of the pointer to an integer, the compiler recognizes the problem of the different sizes, which means v2 can not completely hold (int) &v1;
Conclusion: Both cases are "bad" in terms of creating an undesired behaviour.
About your question "So, if a pointer occupies more than 4 bytes of memory, why we can store its content in an int variable?" - It can NOT completely be stored in an int variable.
About your question "In the underlying implementation, does the pointer variables store only the memory address of another variable, or it stores something else?" - A pointer just points to an address. (It could be the address of another variable or not. It does not matter. It just points to an address.)
The key to understanding what's going on here is that C is an abstraction layer on top of the underlying ISA. Most architectures have little more than registers and memory addresses1 to work with, all of which are of a fixed size. When manipulating "variables", you're really just expressing your intent which the compiler translates into more concrete instructions.
On x86_64, a common architecture, an int is in actuality either a portion of a 64-bit register, or it's a 4-byte location in memory that's aligned on a 4-byte boundary. An int* is a 64-bit value, or 8-byte location in memory with corresponding alignment constraints.
Putting an int* value into a suitably sized variable, such as uint64_t, is allowed. Putting that value back into a pointer and exercising that pointer may not be permitted, it depends on your architecture.
From the programmer's perspective a pointer is just 64 bits of data. From the CPU's perspective it may contain more than that, with modern architectures having things like internal "Pointer Authentication Codes" (PACs) that ensure pointers cannot be injected from external sources. It gets quite a bit more complicated under the hood.
In general it's best to treat pointers as opaque, that is their actual value is as good as random and irrelevant to the internal operation of your program. It's only when you're doing deeper analysis at the architectural level with sufficiently robust profiling tools that the actual internals of the pointer can be informative or relevant.
There are several well-defined operations you can do on pointers, like p[n] to access specific offsets within the bounds of a structure or allocation, but outside of that you're pretty limited in what you can do, or even infer. Remember that modern CPUs and operating systems use virtual memory, so pointer addresses are "fake" and don't represent where they are in physical memory. In fact, they're deliberately scrambled to make them harder to guess.
1 This disregards VLIW, SIMD, and other extensions which are not so simple.
So, if a pointer occupies more than 4 bytes of memory, why we can store its content in an int variable?
You cannot, indeed the code you post is not legal.
#include <stdio.h>
int main()
{
int v1, v2, *p;
this declares to int variables and a pointer to int called p.
p = &v1;
this is legal, as you assign to p the address of the integer variable v1.
v2 = &v1; /* INCORRECT!!! */
this is not. It assigns to an int variable the address of another variable (which is a pointer, and as you well say, it is not possible) The most probable intention of the code writer was:
v2 = *p;
which assigns to v2 the integer value stored at address pointed to by p (which is pointing to v1, so it assigns v2 the value stored in v1.

C data type sizes intmax_t vs any other integer, void * vs any other pointer

Is it true that:
No data pointer is larger than sizeof(void *)?
No data pointer is larger than sizeof(char *)?
No integer is larger than sizeof(intmax_t)?
Neither float nor double are larger than sizeof(long double)?
I know that intmax_t can store any value any other signed integer can store, but is it required that the size is also the largest size of any integer? Or is it possible that some other integer uses so many more padding bits than intmax_t that the size of that integer becomes larger than intmax_t even when this integer can't hold any value that intmax_t can't hold?
Similarly to pointers, i know that any other data pointer can by converted to char * and void * and back again without losing information, but does that mean the size of char * and void * has to be the largest possible size of all data pointers?
I ask this question because i have a function in a library that converts a string to a different type which is indicated by an integer. To test if this conversation can be done for a given string (the format is correct) a function exists that test this for different data types, the function should reserve enough memory for all possible types, do the conversation, check for errors, and free this memory again. Is it enough to make sure it is large enough for intmax_t to cover all integer types?
There's an intptr_t now that's guaranteed to work; however the platforms of years past where intptr_t would need to be larger than ptrdiff_t don't have intptr_t because they're frozen in time, and the native compilers for the target platforms just don't have it. If you have a modern compiler such as OpenWatcom targeting the old architectures it would work.
While they did fix this stuff for modern compilers cross-compiling to embedded CPUs, you end up with other aberrations that could exist instead that are just as bad and equally hard to test for. The compilers I've had to deal with in the embedded world had some real nasties; if these are typical, trying to make a platform neutral library that doesn't assume flat architecture is fraught with pearl:
NULL is not 0. This means that memset() doesn't initialize pointers in structures to NULL and neither does calloc(); nor are they NULL when declared as static unless explicitly initialized to NULL.
sizeof(char *) < sizeof(const char *) and sizeof(void *) < sizeof(const void *); this particular compiler didn't give an intptr_t and ptrdiff_t could not contain the result of arbitrary subtraction of two pointers in the same character array (but was avoidable by making the character array no bigger than PTRDIFF_T_MAX).
Compiler didn't implement C99 and didn't have an intptr_t and ptrdiff_t was too small to hold a pointer.
free() didn't do anything
The compiler statically removed null pointer tests where it could prove the pointer pointed to a struct, but NULL was a possible address of a global variable.
I've never tried to make a nontrivial library work in the embedded world and certainly never one that messed with pointers like this.
TL;DR All your four assertions ought to be true, but when put to the test by fire where it matters you end up dealing with ugliness.
It occurs to me if you're converting between pointer and string, sprintf and sscanf have format specifiers that can do this.

C - Why cast to uintptr_t vs char* when doing pointer arithmetic

I am working on a programm where I have to modify the target process memory/ read it.
So far I am using void* for storing adresses and cast those to char* if I need to change them (add offset or modify in general)
I have heard of that type defined in stdint.h but I don't see the difference in using it for pointer arithmetic over the char* conversion (which seems more C89 friendly atleast to me)
So my question: What of those both methods should I use for pointer arithmetic? Should I even consider using uintptr_t over char* in any case?
EDIT 1
Basically I just need to know if this yields
0x00F00BAA hard coded memory adress in target
process
void* x = (void*)0x00F00BAA;
char* y = (void*)0x00F00BAA;
x = (uintptr_t)x + 0x123;
y = (char*)y + 0x123;
x == y?
x == (void*)0x00F00CCD?
y == (void*)0x00F00CCD?
In comments user R.. points out that the following is likely incorrect if the addresses the code is dealing with are not valid within the current process. I've asked the OP for clarification.
Do not use uintptr_t for pointer arithmetic if you care about the portability of your code. uintptr_t is an integer type. Any arithmetic operations on it are integer arithmetic, not pointer arithmetic.
If you have a void* value and you want to add a byte offset to it, casting to char* is the correct approach.
It's likely that arithmetic on uintptr_t values will work the same way as char* arithmetic, but it absolutely is not guaranteed. The only guarantee that the C standard provides is that you can convert a void* value to uintptr_t and back again, and the result will compare equal to the original pointer value.
And the standard doesn't guarantee that uintptr_t exists. If there is no integer type wide enough to hold a converted pointer value without loss of information, the implementation just won't define uintptr_t.
I've actually worked on systems (Cray vector machines) where arithmetic on uintptr_t wouldn't necessarily work. The hardware had 64-bit words, with a machine address containing the address of a word. The Unix-like OS needed to support 8-bit bytes, so byte pointers (void*, char*) contained a word address with a 3-bit offset stored in the otherwise unused high-order 3 bits of the 64-bit word. Pointer/integer conversions simply copied the representation. The result was that adding 1 to a char* pointer would cause it to point to the next byte (with the offset handled in software), but converting to uintptr_t and adding 1 would cause it to point to the next word.
Bottom line: If you need pointer arithmetic, use pointer arithmetic. That's what it's for.
(Incidentally, gcc has an extension that permits pointer arithmetic on void*. Don't use it in portable code. It also causes some odd side effects, like sizeof (void) == 1.)

C Pointers about the different data type

Pointer always stores a integer value i.e address so why do we need to declare them with different data type.
Like
int a=3,*p=&a;
char c=r,*cha=&r;
why can't we do like
int *c;
char r=a;
c=&r;
Essentially it's because
Pointer arithmetic would not work if the pointer types were not explicit.
The pointer to member operator would not work for struct pointer types.
The alignment requirements for types can differ. It might be possible to store a char at a location where it's not possible to store an int.
The sizes of pointers are not guaranteed to be the same by the C standard: i.e. sizeof(int*) is not necessarily the same as sizeof(char*). This allows C to be used on exotic architectures.
To pass to the "user" of the pointer information about what kind of data the pointer points to. Basing on that it will/may depend how the pointed data will be interpreted and handled. Any time you can use a pointer to void, in this case that information is not available. Then, the user of that pointer should know from other sources what is pointed and how to work with it.
Batcheba has already 4 good reasons. Let me give some less important ones:
function pointer and data pointer are not even compatible according to the standard (Harvard architecture with separate code and data address spaces, memory models of segmented architectures, etc.).
the optimizer can exploit the fact that pointer on different types generally do not overlap and can elide a lot of memory loads that way (https://en.wikipedia.org/wiki/Pointer_aliasing)
int *c;
char r=a;
c=&a;
In your declaration a is just a character not a variable.
A pointer when defined can only store the address of a specific type of variable. If you declare c as int *c, the pointer can only point on variables of type integer.
Of course you can assign an address value to any type pointers.
int d;
char* cp = (char*)&d; // OK
float* fp = (float*)&d; // OK
But when you get content from the address, you MUST know the type. Otherwise the compiler will not know how to interpret it.

Is dereferencing a pointer to a pointer to an int the same thing as dereferencing a pointer to a pointer to a char, in the C language?

Let's say in that situation you are wanting to store an address after you dereference your pointer to pointer. Does it make any difference, functionally and in any other way, to dereference it as:
*(int **)(/* some void ptr */) = // some other address;
versus
*(char **)(/* same void ptr */) = // some other address;
Because I know when you dereference just a pointer to an int, and just a pointer to a char, they are completely different things! But when dealing with pointers to pointers, there is a layer in between, and since all pointers (on the same machine) are of the same size, I was wondering if there was any difference at all in those two approaches
->edit: Literally could you even replace it with unsigned ptr ptr, long ptr ptr, and void ptr ptr, and get the same thing?
Your assumption that "all pointers (on the same machine) are of the same size" is incorrect. It may well be true for your particular machine, and it's fairly likely to be true on any machine you're likely to use, but the C standard makes no such guarantee, and there have been real-world machines where it's not true. For example, on a word-addressed machine, an int* might be represented as a word pointer, but a char* or void* might require additional information to specify which byte within a word it refers to. And function pointers, on some architectures, are quite different beasts than data pointers.
If you find yourself casting pointer types, it's likely that there's something wrong with your design, and the pointer should have been of the correct type in the first place.
That's not always true, of course; there are case where that kind of low-level access is appropriate and necessary, and one of C's great strengths is that it allows you to do that kind of thing.
But it's difficult to tell whether it's appropriate without a more concrete example.
The point of having distinct types isn't just that they have different sizes or representations; the point is that they're used for different purposes. An int* points to an int; a char* points to a char.
The difference between your examples is that one refers to an int* object, and the other refers to a char* object. They're different types, and they're not interchangeable, even if they happen to have the same size and representation. Your commented-out some other address on the right hand side of the assignment has to be some actual expression, and it has to be of some type.
What are you trying to accomplish?
You can cast a pointer to be whatever other type you'd like. In most modern implementations, the pointer type is of the same size as any other pointer type, regardless of what the data is pointing to.
However, C++ is stricter than C when it comes to type correctness, so you'll probably see an "invalid conversion" error if you were to blindly assign the pointer of one type to another one.
Yep, all the same size - size of the pointer, funny memory models aside. And yes you will get the same thing.

Resources