I have a void pointer pointing to a memory address. Then, I do
int pointer = the void pointer
float pointer = the void pointer
and then, dereference them go get the values.
{
int x = 25;
void *p = &x;
int *pi = p;
float *pf = p;
double *pd = p;
printf("x: n%d\n", x);
printf("*p: %d\n", *(int *)p);
printf("*pi: %d\n", *pi);
printf("*pf: %f\n", *pf);
printf("*pd: %f\n", *pd);
return 0;
}
The output of dereferencing pi(int pointer) is 25.
However the output of dereferencing pf(float pointer) is 0.000.
Also dereferncing pd(double pointer) outputs a negative fraction that keeps
changing?
Why is this and is it related to endianness(my CPU is little endian)?
As per C standard, you'er allowed to convert any pointer to void * and convert it back, it'll have the same effect.
To quote C11, chapter §6.3.2.3
[...] A pointer to
any object type may be converted to a pointer to void and back again; the result shall
compare equal to the original pointer.
That is why, when you cast the void pointer to int *, de-reference and print the result, it prints properly.
However, standard does not guarantee that you can dereference that pointer to be of a different data type. It is essentially invoking undefined behaviour.
So, dereferencing pf or pd to get a float or double is undefined behavior, as you're trying to read the memory allocated for an int as a float or double. There's a clear case of mismtach which leads to the UB.
To elaborate, int and float (and double) has different internal representations, so trying to cast a pointer to another type and then an attempt to dereference to get the value in other type won't work.
Related , C11, chapter §6.5.3.3
[...] If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. If an
invalid value has been assigned to the pointer, the behavior of the unary * operator is
undefined.
and for the invalid value part, (emphasis mine)
Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an
address inappropriately aligned for the type of object pointed to, and the address of an object after the
end of its lifetime.
In addition to the answers before, I think that what you were expecting could not be accomplished because of the way the float numbers are represented.
Integers are typically stored in Two's complement way, basically it means that the number is stored as one piece. Floats on the other hand are stored using a different way using a sign, base and exponent, Read here.
So the main idea of convertion is impossible since you try to take a number represented as raw bits (for positive) and look at it as if it was encoded differently, this will result in unexpected results even if the convertion was legit.
So... here's probably what's going on.
However the output of dereferencing pf(float pointer) is 0.000
It's not 0. It's just really tiny.
You have 4-byte integers. Your integer looks like this in memory...
5 0 0 0
00000101 00000000 00000000 00000000
Which interpreted as a float looks like...
sign exponent fraction
0 00001010 0000000 00000000 00000000
+ 2**-117 * 1.0
So, you're outputting a float, but it's incredibly tiny. It's 2^-117, which is virtually indistinguishable from 0.
If you try printing the float with printf("*pf: %e\n", *pf); then it should give you something meaningful, but small. 7.006492e-45
Also dereferncing pd(double pointer) outputs a negative fraction that keeps changing?
Doubles are 8-bytes, but you're only defining 4-bytes. The negative fraction change is the result of looking at uninitialized memory. The value of uninitialized memory is arbitrary and it's normal to see it change with every run.
There are two kinds of UBs going on here:
1) Strict aliasing
What is the strict aliasing rule?
"Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)"
However, strict aliasing can be turned off as a compiler extension, like -fno-strict-aliasing in GCC. In this case, your pf version would function well, although implementation defined, assuming nothing else has gone wrong (usually float and int are both 32 bit types and 32 bit aligned on most computers, usually). If your computer uses IEEE754 single, you can get a very small denorm floating point number, which explains for the result you observe.
Strict aliasing is a controversial feature of recent versions of C (and considered a bug by a lot of people) and makes it very difficult and more hacky than before to do reinterpret cast (aka type punning) in C.
Before you are very aware of type punning and how it behaves with your version of compiler and hardware, you shall avoid doing it.
2) Memory out of bound
Your pointer points to a memory space as large as int, but you dereference it as double, which is usually twice of the size of an int, you are basically reading half a double of garbage from somewhere in the computer, which is why your double keeps changing.
The types int, float, and double have different memory layouts, representations, and interpretations.
On my machine, int is 4 bytes, float is 4 bytes, and double is 8 bytes.
Here is how you explain the results you are seeing.
Derefrencing the int pointer works, obviously, because the original data was an int.
Derefrencing the float pointer, the compiler generates code to interpret the contents of 4 bytes in memory as a float. The value in the 4 bytes, when interpreted as a float, gives you 0.00. Lookup how float is represented in memory.
Derefrencing the double pointer, the compiler generates code to interpret the contents in memory as a double. Because a double is larger than an int, this accesses the 4 bytes of the original int, and an extra 4 bytes on the stack. Because the contents of these extra 4 bytes is dependent on the state of the stack, and is unpredictable from run to run, you see the varying values that correspond to interpreting the entire 8 bytes as a double.
In the following,
printf("x: n%d\n", x); //OK
printf("*p: %d\n", *(int *)p); //OK
printf("*pi: %d\n", *pi); //OK
printf("*pf: %f\n", *pf); // UB
printf("*pd: %f\n", *pd); // UB
The accesses in the first 3 printfs are fine as you are accessing int through the lvalue type of type int. But the next 2 are not fine as the violate 6.5, 7, Expressions.
An int * is not a compatible type with a float * or double *. So the accesses in the last two printf() calls cause undefined behaviour.
C11, $6.5, 7 states:
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.
The term "C" is used to describe two languages: one invented by K&R in which pointers identify physical memory locations, and one which is derived from that which works the same in cases where pointers are either read and written in ways that abide by certain rules, but may behave in arbitrary fashion if they are used in other ways. While the latter language is defined the by the Standards, the former language is what became popular for microcomputer programming in the 1980s.
One of the major impediments to generating efficient machine code from C code is that compilers can't tell what pointers might alias what variables. Thus, any time code accesses a pointer that might point to a given variable, generated code is required to ensure that the contents of the memory identified by the pointer and the contents of the variable match. That can be very expensive. The people writing the C89 Standard decided that compilers should be allowed to assume that named variables (static and automatic) will only be accessed using pointers of their own type or character types; the people writing C99 decided to add additional restrictions for allocated storage as well.
Some compilers offer means by which code can ensure that accesses using different types will go through memory (or at least behave as though they are doing so), but unfortunately I don't think there's any standard for that. C14 added a memory model for use with multi-threading which should be capable of achieving required semantics, but I don't think compilers are required to honor such semantics in cases where they can tell that there's no way for outside threads to access something [even if going through memory would be necessary to achieve correct single-thread semantics].
If you're using gcc and want to have memory semantics that work as K&R intended, use the "-fno-strict-aliasing" command-line option. To make code efficient it will be necessary to make substantial use of the "restrict" qualifier which was added in C99. While the authors of gcc seem to have focused more on type-based aliasing rules than "restrict", the latter should allow more useful optimizations.
Related
Let us consider the following piece of code:
#include <stdio.h>
int main()
{
int v1, v2, *p;
p = &v1;
v2 = &v1;
printf("%d\t%d\n",p,v2);
printf("%d\t%d\n",sizeof(v2),sizeof(p));
return 0;
}
We can see, as expected, that the v2 variable (int) occupies 4 bytes and that the p variable (int pointer) occupies 8 bytes.
So, if a pointer occupies more than 4 bytes of memory, why we can store its content in an int variable?
In the underlying implementation, does the pointer variables store only the memory address of another variable, or it stores something else?
We can see, as expected, that the v2 variable (int) occupies 4 bytes
and that the p variable (int pointer) occupies 8 bytes.
I'm not sure what exactly the source of your expectation is there. The C language does not specify the sizes of ints or pointers. Its requirements on the range of representable values of type int afford int size as small as two 8-bit bytes, and historically, that was once a relatively common size for int. Some implementations these days have larger ints (and maybe also larger char, which is the unit of measure for sizeof!).
I suppose that your point here is that in the implementation tested, the size of int is smaller than the size of int *. Fair enough.
So, if a pointer occupies more than 4 bytes of memory, why we can
store its content in an int variable?
Who says the code stores the pointer's (entire) content in the int? It converts the pointer to an int,* but that does not imply that the result contains enough information to recover the original pointer value.
Exactly the same applies to converting a double to an int or an int to an unsigned char (for example). Those assignments are allowed without explicit type conversion, but they are not necessarily value-preserving.
Perhaps your confusion is reflected in the word "content". Assignment does not store the representation of the right-hand side to the left-hand object. It converts the value, if necessary, to the target object's type, and stores the result.
In the underlying implementation, does the pointer variables store
only the memory address of another variable, or it stores something
else?
Implementations can and have varied, and so too the meaning of "address" for different machines. But most commonly these days, pointers are represented as binary numbers designating locations in a flat address space.
But that's not really relevant. C specifies that pointers can be converted to integers and vice versa. It also provides integer types intptr_t and uintptr_t (in stdint.h) that support full-fidelity round trip void * to integer to void * conversion. Pointer representation is irrelevant to all that. It is the implementation's responsibility to implement the types and conversions involved so that they behave as required, and there is more than one way to do that.
*C actually requires an explicit conversion -- that is, a typecast -- between pointers and integer. The language specification does not define the meaning of the cast-less assignment in the example code, but some compilers do accept that and perform the needed conversion implicitly. My remarks assume such an implementation.
There is always a warning, see below.
main.c: In function ‘main’:
main.c:6:8: warning: assignment to ‘int’ from ‘int *’ makes integer from pointer without a cast [-Wint-conversion]
v2 = &v1;
main.c: In function ‘main’:
main.c:6:10: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
v2 = (int) &v1;
In the first case, just setting an integer value to a pointer value is not appropriate, because it is not a compatible type.
In the second case, with a cast of the pointer to an integer, the compiler recognizes the problem of the different sizes, which means v2 can not completely hold (int) &v1;
Conclusion: Both cases are "bad" in terms of creating an undesired behaviour.
About your question "So, if a pointer occupies more than 4 bytes of memory, why we can store its content in an int variable?" - It can NOT completely be stored in an int variable.
About your question "In the underlying implementation, does the pointer variables store only the memory address of another variable, or it stores something else?" - A pointer just points to an address. (It could be the address of another variable or not. It does not matter. It just points to an address.)
The key to understanding what's going on here is that C is an abstraction layer on top of the underlying ISA. Most architectures have little more than registers and memory addresses1 to work with, all of which are of a fixed size. When manipulating "variables", you're really just expressing your intent which the compiler translates into more concrete instructions.
On x86_64, a common architecture, an int is in actuality either a portion of a 64-bit register, or it's a 4-byte location in memory that's aligned on a 4-byte boundary. An int* is a 64-bit value, or 8-byte location in memory with corresponding alignment constraints.
Putting an int* value into a suitably sized variable, such as uint64_t, is allowed. Putting that value back into a pointer and exercising that pointer may not be permitted, it depends on your architecture.
From the programmer's perspective a pointer is just 64 bits of data. From the CPU's perspective it may contain more than that, with modern architectures having things like internal "Pointer Authentication Codes" (PACs) that ensure pointers cannot be injected from external sources. It gets quite a bit more complicated under the hood.
In general it's best to treat pointers as opaque, that is their actual value is as good as random and irrelevant to the internal operation of your program. It's only when you're doing deeper analysis at the architectural level with sufficiently robust profiling tools that the actual internals of the pointer can be informative or relevant.
There are several well-defined operations you can do on pointers, like p[n] to access specific offsets within the bounds of a structure or allocation, but outside of that you're pretty limited in what you can do, or even infer. Remember that modern CPUs and operating systems use virtual memory, so pointer addresses are "fake" and don't represent where they are in physical memory. In fact, they're deliberately scrambled to make them harder to guess.
1 This disregards VLIW, SIMD, and other extensions which are not so simple.
So, if a pointer occupies more than 4 bytes of memory, why we can store its content in an int variable?
You cannot, indeed the code you post is not legal.
#include <stdio.h>
int main()
{
int v1, v2, *p;
this declares to int variables and a pointer to int called p.
p = &v1;
this is legal, as you assign to p the address of the integer variable v1.
v2 = &v1; /* INCORRECT!!! */
this is not. It assigns to an int variable the address of another variable (which is a pointer, and as you well say, it is not possible) The most probable intention of the code writer was:
v2 = *p;
which assigns to v2 the integer value stored at address pointed to by p (which is pointing to v1, so it assigns v2 the value stored in v1.
Which of the following ways of treating and trying to recover a C pointer, is guaranteed to be valid?
1) Cast to void pointer and back
int f(int *a) {
void *b = a;
a = b;
return *a;
}
2) Cast to appropriately sized integer and back
int f(int *a) {
uintptr_t b = a;
a = (int *)b;
return *a;
}
3) A couple of trivial integer operations
int f(int *a) {
uintptr_t b = a;
b += 99;
b -= 99;
a = (int *)b;
return *a;
}
4) Integer operations nontrivial enough to obscure provenance, but which will nonetheless leave the value unchanged
int f(int *a) {
uintptr_t b = a;
char s[32];
// assume %lu is suitable
sprintf(s, "%lu", b);
b = strtoul(s);
a = (int *)b;
return *a;
}
5) More indirect integer operations that will leave the value unchanged
int f(int *a) {
uintptr_t b = a;
for (uintptr_t i = 0;; i++)
if (i == b) {
a = (int *)i;
return *a;
}
}
Obviously case 1 is valid, and case 2 surely must be also. On the other hand, I came across a post by Chris Lattner - which I unfortunately can't find now - saying something similar to case 5 is not valid, that the standard licenses the compiler to just compile it to an infinite loop. Yet each case looks like an unobjectionable extension of the previous one.
Where is the line drawn between a valid case and an invalid one?
Added based on discussion in comments: while I still can't find the post that inspired case 5, I don't remember what type of pointer was involved; in particular, it might have been a function pointer, which might be why that case demonstrated invalid code whereas my case 5 is valid code.
Second addition: okay, here's another source that says there is a problem, and this one I do have a link to. https://www.cl.cam.ac.uk/~pes20/cerberus/notes30.pdf - the discussion about pointer provenance - says, and backs up with evidence, that no, if the compiler loses track of where a pointer came from, it's undefined behavior.
According to the C11 draft standard:
Example 1
Valid, by §6.5.16.1, even without an explicit cast.
Example 2
The intptr_t and uintptr_t types are optional. Assigning a pointer to an integer requires an explicit cast (§6.5.16.1), although gcc and clang will only warn you if you don’t have one. With those caveats, the round-trip conversion is valid by §7.20.1.4. ETA: John Bellinger brings up that the behavior is only specified when you do an intermediate cast to void* both ways. However, both gcc and clang allow the direct conversion as a documented extension.
Example 3
Safe, but only because you’re using unsigned arithmetic, which cannot overflow, and are therefore guaranteed to get the same object representation back. An intptr_t could overflow! If you want to do pointer arithmetic safely, you can convert any kind of pointer to char* and then add or subtract offsets within the same structure or array. Remember, sizeof(char) is always 1. ETA: The standard guarantees that the two pointers compare equal, but your link to Chisnall et al. gives examples where compilers nevertheless assume the two pointers do not alias each other.
Example 4
Always, always, always check for buffer overflows whenever you read from and especially whenever you write to a buffer! If you can mathematically prove that overflow cannot happen by static analysis? Then write out the assumptions that justify that, explicitly, and assert() or static_assert() that they haven’t changed. Use snprintf(), not the deprecated, unsafe sprintf()! If you remember nothing else from this answer, remember that!
To be absolutely pedantic, the portable way to do this would be to use the format specifiers in <inttypes.h> and define the buffer length in terms of the maximum value of any pointer representation. In the real world, you would print pointers out with the %p format.
The answer to the question you intended to ask is yes, though: all that matters is that you get back the same object representation. Here’s a less contrived example:
#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
int i = 1;
const uintptr_t u = (uintptr_t)(void*)&i;
uintptr_t v;
memcpy( &v, &u, sizeof(v) );
int* const p = (int*)(void*)v;
assert(p == &i);
*p = 2;
printf( "%d = %d.\n", i, *p );
return EXIT_SUCCESS;
}
All that matter are the bits in your object representation. This code also follows the strict aliasing rules in §6.5. It compiles and runs fine on the compilers that gave Chisnall et al trouble.
Example 5
This works, same as above.
One extremely pedantic footnote that will never ever be relevant to your coding: some obsolete esoteric hardware has one’s-complement or sign-and-magnitude representation of signed integers, and on these, there might be a distinct value of negative zero that might or might not trap. On some CPUs, this might be a valid pointer or null pointer representation distinct from positive zero. And on some CPUs, positive and negative zero might compare equal.
PS
The standard says:
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
Furthermore, if the two array objects are consecutive rows of the same multidimensional array, one past the end of the first row is a valid pointer to the start of the next row. Therefore, even a pathological implementation that deliberately sets out to cause as many bugs as the standard allows could only do so if your manipulated pointer compares equal to the address of an array object, in which case the implementation might in theory decide to interpret it as one-past-the-end of some other array object instead.
The intended behavior is clearly that the pointer comparing equal both to &array1+1 and to &array2 is equivalent to both: it means to let you compare it to addresses within array1 or dereference it to get array2[0]. However, the Standard does not actually say that.
PPS
The standards committee has addressed some of these issues and proposes that the C standard explicitly add language about pointer provenance. This would nail down whether a conforming implementation is allowed to assume that a pointer created by bit manipulation does not alias another pointer.
Specifically, the proposed corrigendum would introduce pointer provenance and allow pointers with different provenance not to compare equal. It would also introduce a -fno-provenance option, which would guarantee that any two pointers compare equal if and only if they have the same numeric address. (As discussed above, two object pointers that compare equal alias each other.)
1) Cast to void pointer and back
This yields a valid pointer equal to the original. Paragraph 6.3.2.3/1 of the standard is clear on this:
A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.
2) Cast to appropriately sized integer and back
3) A couple of trivial integer operations
4) Integer operations nontrivial enough to obscure provenance, but which will nonetheless leave the value unchanged
5) More indirect integer operations that will leave the value unchanged
[...] Obviously case 1 is valid, and case 2 surely must be also. On the other hand, I came across a post by Chris Lattner - which I unfortunately can't find now - saying case 5 is not valid, that the standard licenses the compiler to just compile it to an infinite loop.
C does require a cast when converting either way between pointers and integers, and you have omitted some of those in your example code. In that sense your examples (2) - (5) are all non-conforming, but for the rest of this answer I'll pretend the needed casts are there.
Still, being very pedantic, all of these examples have implementation-defined behavior, so they are not strictly conforming. On the other hand, "implementation-defined" behavior is still defined behavior; whether that means your code is "valid" or not depends on what you mean by that term. In any event, what code the compiler might emit for any of the examples is a separate matter.
These are the relevant provisions of the standard from section 6.3.2.3 (emphasis added):
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.
The definition of uintptr_t is also relevant to your particular example code. The standard describes it this way (C2011, 7.20.1.4/1; emphasis added):
an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer.
You are converting back and forth between int * and uintptr_t. int * is not void *, so 7.20.1.4/1 does not apply to these conversions, and the behavior is implementation-defined per section 6.3.2.3.
Suppose, however, that you convert back and forth via an intermediate void *:
uintptr_t b = (uintptr_t)(void *)a;
a = (int *)(void *)b;
On an implementation that provides uintptr_t (which is optional), that would make your examples (2 - 5) all strictly conforming. In that case, the result of the integer-to-pointer conversions depends only on the value of the uintptr_t object, not on how that value was obtained.
As for the claims you attribute to Chris Lattner, they are substantially incorrect. If you have represented them accurately, then perhaps they reflect a confusion between implementation-defined behavior and undefined behavior. If the code exhibited undefined behavior then the claim might hold some water, but that is not, in fact, the case.
Regardless of how its value was obtained, b has a definite value of type uintptr_t, and the loop must eventually increment i to that value, at which point the if block will run. In principle, the implementation-defined behavior of a conversion from uintptr_t directly to int * could be something crazy such as skipping the next statement (thus causing an infinite loop), but such behavior is entirely implausible. Every implementation you ever meet will either fail at that point, or store some value in variable a, and then, if it didn't crash, it will execute the return statement.
Because different application fields require the ability to manipulate pointers in different ways, and because the best implementations for some purposes may be totally unsuitable for some others, the C Standard treats the support (or lack thereof) for various kinds of manipulations as a Quality of Implementation issue. In general, people writing an implementation for a particular application field should be more familiar than the authors of the Standard with what features would be useful to programmers in that field, and people making a bona fide effort to produce quality implementations suitable for writing applications in that field will support such features whether the Standard requires them to or not.
In the pre-Standard language invented by Dennis Ritchie, all pointers of a particular type that were identified the same address were equivalent. If any sequence of operations on a pointer would end up producing another pointer of the same type that identified the same address, that pointer would--essentially by definition--be equivalent to the first. The C Standard specifies some situations, however, where pointers can identify the same location in storage and be indistinguishable from each other without being equivalent. For example, given:
int foo[2][4] = {0};
int *p = foo[0]+4, *q=foo[1];
both p and q will compare equal to each other, and to foo[0]+4, and to foo[1]. On the other hand, despite the fact that evaluation of p[-1] and q[0] would have defined behavior, evaluation of p[0] or q[-1] would invoke UB. Unfortunately, while the Standard makes clear that p and q would not be equivalent, it does nothing to clarify whether performing various sequences of operations on e.g. p will yield a pointer that is usable in all cases where p was usable, a pointer that's usable in all cases where either p or q would be usable, a pointer that's only usable in cases where q would be usable, or a pointer that's only usable in cases where both p and q would be usable.
Quality implementations intended for low-level programming should generally process pointer manipulations other than those involving restrict pointers in a way that would yield a pointer that would be usable in any case where a pointer that compares equal to it could usable. Unfortunately, the Standard provides no means by which a program can determine if it's being processed by a quality implementations that is suitable for low-level programming and refuse to run if it isn't, so most forms of systems programming must rely upon quality implementations processing certain actions in documented manner characteristic of the environment even when the Standard would impose no requirements.
Incidentally, even if the normal constructs for manipulating pointers wouldn't have any way of creating pointers where the principles of equivalence shouldn't apply, some platforms may define ways of creating "interesting" pointers. For example, if an implementation that would normally trap operations on null pointers were run on in an environment where it might sometimes be necessary to access an object at address zero, it might define a special syntax for creating a pointer that could be used to access any address, including zero, within the context where it was created. A "legitimate pointer to address zero" would likely compare equal to a null pointer (even though they're not equivalent), but performing a round-trip conversion to another type and back would likely convert what had been a legitimate pointer to address zero into a null pointer. If the Standard had mandated that a round-trip conversion of any pointer must yield one that's usable in the same way as the original, that would require that the compiler omit null traps on any pointers that could have been produced in such a fashion, even if they would far more likely have been produced by round-tripping a null pointer.
Incidentally, from a practical perspective, "modern" compilers, even in -fno-strict-aliasing, will sometimes attempt to track provenance of pointers through pointer-integer-pointer conversions in such a way that pointers produced by casting equal integers may sometimes be assumed incapable of aliasing.
For example, given:
#include <stdint.h>
extern int x[],y[];
int test(void)
{
if (!x[0]) return 999;
uintptr_t upx = (uintptr_t)x;
uintptr_t upy = (uintptr_t)(y+1);
//Consider code with and without the following line
if (upx == upy) upy = upx;
if ((upx ^ ~upy)+1) // Will return if upx != upy
return 123;
int *py = (int*)upy;
*py += 1;
return x[0];
}
In the absence of the marked line, gcc, icc, and clang will all assume--even when using -fno-strict-aliasing, that an operation on *py can't affect *px, even though the only way that code could be reached would be if upx and upy held the same value (meaning that both px and py were produced by casting the same uintptr_t value). Adding the marked line causes icc and clang to recognize that px and py could identify the same object, but gcc assumes that the assignment can be optimized out, even though it should mean that py would be derived from px--a situation a quality compiler should have no trouble recognizing as implying possible aliasing.
I'm not sure what realistic benefit compiler writers expect from their efforts to track provenance of uintptr_t values, given that I don't see much purpose for doing such conversions outside of cases where the results of conversions may be used in "interesting" ways. Given compiler behavior, however, I'm not sure I see any nice way to guarantee that conversions between integers and pointers will behave in a fashion consistent with the values involved.
This question already has answers here:
Pointer/Address difference [duplicate]
(3 answers)
Closed 6 years ago.
I'm trying to find the distance in memory between two variables. Specifically I need to find the distance between a char[] array and an int.
char data[5];
int a = 0;
printf("%p\n%p\n", &data[5], &a);
long int distance = &a - &data[5];
printf("%ld\n", distance);
When I run my my program without the last two lines I get the proper memory address of the two variables, something like this:
0x7fff5661aac7
0x7fff5661aacc
Now I understand, if I'm not wrong, that there are 5 bytes of distance between the two (0x7fff5661aac8, 0x7fff5661aac9, 0x7fff5661aaca, 0x7fff5661aacb, 0x7fff5661aacc).
Why I can't subtract a pointer of type (int *) and one of type (char *). Both refer to memory address.. What should I do in order to calculate the distance, in bytes, between the two?? I tried casting one of the two pointers but it's not working.
I get: "error: 'char *' and 'int *' are not pointers to compatible types". Thanks to everyone will help me
Nopes, this is not possible.
First, you can only subtract pointers of (to) "compatible" types, an int and a char are not compatible types here. Hence the subtraction is not possible.
That said, even if both are pointers to compatible type, then also, the following comes into picture.
So, secondly You cannot just subtract two arbitrary pointers, they need to be essentially part of (address for elements of) the same array. Othweise, it invokes undefined behavior.
Quoting C11, chapter §6.5.6, Additive operators
When two pointers are subtracted, both shall point to elements of the same array object,
or one past the last element of the array object; the result is the difference of the
subscripts of the two array elements. [....]
Thirdly, another important point, the result of subtraction of two pointers is of type ptrdiff_t, a signed integer type.
[...] The size of the result is implementation-defined,
and its type (a signed integer type) is ptrdiff_t defined in the <stddef.h> header. [...]
so, to print the result, you need to use %td format specifier.
Pointer subtraction is only defined for pointers within the same array (or just past the last element of an array). Any other use is undefined behavior. Let's ignore that for your experimentation.
When two pointers of the same type to elements of the same array object are subtracted, the result is the difference of the array indices. You could add that signed integer result (of type ptrdiff_t) to the first pointer and get the value of the second pointer, or subtract the result from the second pointer and get the value of the first pointer. So in effect, the result is the difference in the byte address of the two pointers divided by the size of the object being pointed to. This is why it makes no sense to allow subtraction of pointers of incompatible type, particularly when the referenced object types are of different size. How could you divide the difference in byte address by the size of the object being pointed to when the pointers being subtractedare referring to differently sized objects?
Still, for experimentation purposes, you can cast both pointers (pointing to different objects) to char * and subtract them, and many compilers will just give you the difference in their byte address as a number. However, the result could overflow an integer of ptrdiff_t. Alternatively, you can convert both pointers to an integer of type intptr_t and subtract the integers to get the difference in byte address. Again, it's theoretically possible that the result of the subtraction could overflow an integer of type intptr_t.
On a standard PC nothing keeps you from casting both pointers to an integer type which can hold the pointer value, and subtracting the two integers.
Such an integer type is not guaranteed to exist on all architectures (but on many common systems it does) — imagine segmented memory with more information than just a single number. If the integer type doesn't fit, the behavior of the cast is undefined.
From the standard draft n1570, 6.3.2.3/6:
Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any
integer type.
Often the difference between addresses will be what one expects (variables declared in succession are next to each other in memory) and can be used to tell the direction the stack grows etc.
It may be interesting to explore what else you can do with integers and pointers.
Olaf commented that if you "cast [an arithmetic computation result] back to a pointer, you invoke UB." That is not necessarily so; it depends on the integer value. The standard draft says the following in 6.3.2.3/5:
An integer may be converted to any pointer type. Except as previously
specified, the result is implementation-defined, might not be
correctly aligned, might not point to an entity of the referenced
type, and might be a trap representation
(Emphasis by me.) If we compute the address of a struct member by adding an offset to the struct's address we have obviously taken care of the mentioned issues, so it's up to the implementation. It is certainly not UB; a good many embedded systems would fail if we couldn't use an integer -> pointer conversion, and access that memory through the resulting pointer. We must make sure that the system allows it, and that the addresses are sound.
The paragraph has a footnote:
The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to
be consistent with the addressing structure of the execution environment.
That is, they are meant to not surprise the user. While in theory the addresses of unrelated objects adjacent in memory could be projected to wildly different integer values, they are not supposed to. A user can for example reasonably expect that linear memory is projected into a linear integer number space, keeping ordering and distances.
I should also emphasize (as I did in one comment) that the standard is not the world. It must accommodate and give guarantees for a wide range of machines. Therefore the standard can only be the smallest common denominator. If we can narrow the range of architectures we consider, we can make much better guarantees.
One common example is the possible presence of trap values in integer registers, or flags indicating a read from an uninitialized register, which also traps; these are responsible for a wide range of UB cases in the standard which simply do not apply, for example, to your PC.
uint8_t * ptr = ...;
uint8_t * ptr2 = ptr + 5;
Now if ptr was 100, what will ptr2 be? Correct, it will be 105. But now look at that code:
uint32_t * ptr = ...;
uint32_t * ptr2 = ptr + 5;
Again, if ptr was 100, what will ptr2 be? Wrong! It won't be 105, it will be 120.
Why? Pointer arithmetic is not integer arithmetic!
ptr2 = ptr + 5;
Actually means:
ptr2 = int_to_ptr(ptr_to_int(ptr) + (sizeof(ptr) * 5));
Functions int_to_ptr and ptr_to_int don't really exist, I'm just using them for demonstration purpose, so you better understand what is going on between the scenes.
So if you subtract two pointers, the result is not the difference of their addresses, it's the number of elements between them:
uint32_t test[50];
ptrdiff_t diff = &test[20] - &test[10];
diff will be 10, as there are 10 elements in between them (one element is one uint32_t value) but that doesn't mean there are 10 bytes between test[10] and test[20], there are 40 bytes between them as every uint32_t value takes up 4 bytes of memory.
Now you may understand why subtracting pointers of different types makes no sense, as different types have different element sizes and what shall such a subtraction then return?
If you want how many bytes are in between two pointers, you need to cast them both to a data type that has one-byte elements (e.g. uint8_t * or char * would work) or cast them to void * (GNU extension but many compilers support that as well), which means the data type is unknown and thus the element size is unknown as well and in that case the compiler will byte-sized elements. So this may work:
ptrdiff_t diff = (void *)ptr2 - (void *)ptr1;
yet this
ptrdiff_t diff = (char *)ptr2 - (char *)ptr1;
is more portable.
It will compile, it will deliver an result. If that result is meaningful, that's a different topic. Unless both pointers point to the same memory "object" (same struct, same array, same allocated memory region), it is not, as the standard says that in that case, the result is undefined. That means diff could (legally) have any value, so a compiler may as well always set diff to 0 in that case, that would be allowed by the standards.
If you want defined behavior, try this instead:
ptrdiff_t diff = (ptrdiff_t)ptr2 - (ptrdiff_t)ptr1;
That is legal and defined. Every pointer can be casted to an int value and ptrdiff_t is an int value, one that is guaranteed to big enough so that every pointer can fit into it (don't ever use int or long for that purpose, they do not make any such guarantee!). This code converts both pointers to integers and then subtract them. I still don't see anything useful you can do with diff now, but that code at least will deliver a defined result, yet maybe not the result you might be expecting.
Try typecasting each address to void *
long int distance = (void *)&a - (void *)&data[5];
As others will point out, this is dangerous and undefined, but if you're just exploring how memory works, it should be fine.
The size of int and size of char pointers are different.In a system where int size is 4 bytes if you will do int_pointer++ it will increase address by 4 bytes while it will increment address by 1 byte in case of char_ptr. Hence you might be getting error.
This is because pointer arithmetic is about offsets.
For example, if you have an array and a pointer to that array like:
int array[3] = { 1, 2, 3};
int *ptr = array;
and then you increment ptr, you expect next value from the array, e.g. array[0] after array[1], no matter what type is stored in that. So when you substract pointers you don't get e.g. bytes, but offset.
Don't substract pointers that are not the part of the same array.
unsigned int addr = 0x1000;
int temp = *((volatile int *) addr + 3);
Does it treat the incremented pointer (ie addr + 3 * sizeof(int)), as a pointer to volatile int (while dereferencing). In other words can I expect the hardware updated contents of say (0x1012) in temp ?
Yes.
Pointer arithmetic does not affect the type of the pointer, including any type qualifiers. Given an expression of the form A + B, if A has the type qualified pointer to T and B is an integral type, the expression A + B will also be a qualified pointer to T -- same type, same qualifiers.
From 6.5.6.8 of the C spec (draft n1570):
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand.
Presuming addr is either an integer (variable or constant) with a value your implementation can safely convert to an int * (see below).
Consider
volatile int a[4] = [ 1,2,3,4};
int i = a[3];
This is exactly the same, except for the explicit conversion integer to volatile int * (a pointer to ...). For the index operator, the name of the array decays to a pointer to the first element of a. This is volatile int * (type qualifiers in C apply to the elements of an array, never the array itself).
This is the same as the cast. Leaves 2 differences:
The conversion integer to "pointer". This is implementation defined, thus if your compiler supports it correctly (it should document this), and the value is correct, it is fine.
Finally the access. The underlying object is not volatile, but the pointer/resp. access. This actually is a defect in the standard (see DR476 which requires the object to be volatile, not the access. This is in contrast to the documented intention (read the link) and C++ semantics (which should be identical). Luckily all(most all) implementations generate code as one would expect and perform the access as intended. Note this is a common ideom on embedded systems.
So if the prerequisites are fulfilled, the code is correct. But please see below for better(in terms of maintainability and safety) options.
Notes: A better approach would be to
use uintptr_t to guarantee the integer can hold a pointer, or - better -
#define ARRAY_ADDR ((volatile int *)0x1000)
The latter avoids accidental modification to the integer and states the implications clear. It also can be used easier. It is a typical construct in low-level peripheral register definitions.
Re. your incrementing: addr is not a pointer! Thus you increment an integer, not a pointer. Left apart this is more to type than using a true pointer, it also is error-prone and obfuscates your code. If you need a pointer, use a pointer:
int *p = ARRAY_ADDR + 3;
As a personal note: Everybody passing such code (the one with the integer addr) in a company with at least some quality standards would have a very serious talk with her team leader.
First note that conversions from integers to pointers are not necessarily safe. It is implementation-defined what will happen. In some cases such conversions can even invoke undefined behavior, in case the integer value cannot be represented as a pointer, or in case the pointer ends up with a misaligned address.
It is safer to use the integer type uintptr_t to store pointers and addresses, as it is guaranteed to be able to store a pointer for the given system.
Given that your compiler implements a safe conversion for this code (for example, most embedded systems compilers do), then the code will indeed behave as you expect.
Pointer arithmetic will be done on a type that is volatile int, and therefore + 3 means increase the address by sizeof(volatile int) * 3 bytes. If an int is 4 bytes on your system, you will end up reading the contents of address 0x100C. Not sure where you got 0x1012 from, mixing up decimal and hex notation?
I have a question about type casting in C:
void *buffer;
(int *)((int)buffer);
what is this type casting doing? and what does the ((int)buffer) doing?
Imagine you are on a Linux/x86-64 computer like mine. Then pointers are 64 bits and int is 32 bits wide.
So, the buffer variable has been initialized to some location; Perhaps 0x7fff4ec52020 (which could be the address of some local variable, perhaps inside main).
The cast (int)buffer gives you an int, probably the least significant 32 bits, i.e. 0x4ec52020
You are casting again with (int*)((int)buffer), which gives you the bogus address 0x000000004ec52020 which does not point into valid memory. If you dereference that bogus pointer, you'll very probably get a SIGSEGV.
So on some machines (notably mine) (int *)((int)buffer) is not the same as (int*)buffer;
Fortunately, as a statement, (int *)((int)buffer); has no visible side effect and will be optimized (by "removing" it) by the compiler (if you ask it to optimize).
So such code is a huge mistake (could become an undefined behavior, e.g. if you dereference that pointer). If the original coder really intended the weird semantics described here, he should add a comment (and such code is unportable)!
Perhaps #include-ing <stdint.h> and using intptr_t or uintptr_t could make more sense.
Let's see what the C Standard has to say. On page 55 of the last freely published version of the C11 standard http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf we have
5 An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.
6 Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type.
What does this mean in your example? Section 6 says that the cast (int)buffer will compile, but if the integer is not big enough to hold the pointer (which is likely on 64-bit machines), the result is undefined. The final (int*) casts back to a pointer.
Section 5 says that if the integer was big enough to hold the intermediate result, the result is exactly the same as just casting to (int*) from the outset.
In short the cast (int) is at best useless and at worst causes the pointer value to be lost.
In straight C code this is pointless because conversions to and from void* are implicit. The following would compile just fine
int* p = buffer;
Worse though is this code potentially introduces errors. Consider the case of a 64 bit platform. The conversion to int will truncate the pointer to 32 bits and then assign it to an int*. This will cause the pointer value to be truncated and certainly lead to errors.