Which of the following ways of treating and trying to recover a C pointer, is guaranteed to be valid?
1) Cast to void pointer and back
int f(int *a) {
void *b = a;
a = b;
return *a;
}
2) Cast to appropriately sized integer and back
int f(int *a) {
uintptr_t b = a;
a = (int *)b;
return *a;
}
3) A couple of trivial integer operations
int f(int *a) {
uintptr_t b = a;
b += 99;
b -= 99;
a = (int *)b;
return *a;
}
4) Integer operations nontrivial enough to obscure provenance, but which will nonetheless leave the value unchanged
int f(int *a) {
uintptr_t b = a;
char s[32];
// assume %lu is suitable
sprintf(s, "%lu", b);
b = strtoul(s);
a = (int *)b;
return *a;
}
5) More indirect integer operations that will leave the value unchanged
int f(int *a) {
uintptr_t b = a;
for (uintptr_t i = 0;; i++)
if (i == b) {
a = (int *)i;
return *a;
}
}
Obviously case 1 is valid, and case 2 surely must be also. On the other hand, I came across a post by Chris Lattner - which I unfortunately can't find now - saying something similar to case 5 is not valid, that the standard licenses the compiler to just compile it to an infinite loop. Yet each case looks like an unobjectionable extension of the previous one.
Where is the line drawn between a valid case and an invalid one?
Added based on discussion in comments: while I still can't find the post that inspired case 5, I don't remember what type of pointer was involved; in particular, it might have been a function pointer, which might be why that case demonstrated invalid code whereas my case 5 is valid code.
Second addition: okay, here's another source that says there is a problem, and this one I do have a link to. https://www.cl.cam.ac.uk/~pes20/cerberus/notes30.pdf - the discussion about pointer provenance - says, and backs up with evidence, that no, if the compiler loses track of where a pointer came from, it's undefined behavior.
According to the C11 draft standard:
Example 1
Valid, by §6.5.16.1, even without an explicit cast.
Example 2
The intptr_t and uintptr_t types are optional. Assigning a pointer to an integer requires an explicit cast (§6.5.16.1), although gcc and clang will only warn you if you don’t have one. With those caveats, the round-trip conversion is valid by §7.20.1.4. ETA: John Bellinger brings up that the behavior is only specified when you do an intermediate cast to void* both ways. However, both gcc and clang allow the direct conversion as a documented extension.
Example 3
Safe, but only because you’re using unsigned arithmetic, which cannot overflow, and are therefore guaranteed to get the same object representation back. An intptr_t could overflow! If you want to do pointer arithmetic safely, you can convert any kind of pointer to char* and then add or subtract offsets within the same structure or array. Remember, sizeof(char) is always 1. ETA: The standard guarantees that the two pointers compare equal, but your link to Chisnall et al. gives examples where compilers nevertheless assume the two pointers do not alias each other.
Example 4
Always, always, always check for buffer overflows whenever you read from and especially whenever you write to a buffer! If you can mathematically prove that overflow cannot happen by static analysis? Then write out the assumptions that justify that, explicitly, and assert() or static_assert() that they haven’t changed. Use snprintf(), not the deprecated, unsafe sprintf()! If you remember nothing else from this answer, remember that!
To be absolutely pedantic, the portable way to do this would be to use the format specifiers in <inttypes.h> and define the buffer length in terms of the maximum value of any pointer representation. In the real world, you would print pointers out with the %p format.
The answer to the question you intended to ask is yes, though: all that matters is that you get back the same object representation. Here’s a less contrived example:
#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
int i = 1;
const uintptr_t u = (uintptr_t)(void*)&i;
uintptr_t v;
memcpy( &v, &u, sizeof(v) );
int* const p = (int*)(void*)v;
assert(p == &i);
*p = 2;
printf( "%d = %d.\n", i, *p );
return EXIT_SUCCESS;
}
All that matter are the bits in your object representation. This code also follows the strict aliasing rules in §6.5. It compiles and runs fine on the compilers that gave Chisnall et al trouble.
Example 5
This works, same as above.
One extremely pedantic footnote that will never ever be relevant to your coding: some obsolete esoteric hardware has one’s-complement or sign-and-magnitude representation of signed integers, and on these, there might be a distinct value of negative zero that might or might not trap. On some CPUs, this might be a valid pointer or null pointer representation distinct from positive zero. And on some CPUs, positive and negative zero might compare equal.
PS
The standard says:
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
Furthermore, if the two array objects are consecutive rows of the same multidimensional array, one past the end of the first row is a valid pointer to the start of the next row. Therefore, even a pathological implementation that deliberately sets out to cause as many bugs as the standard allows could only do so if your manipulated pointer compares equal to the address of an array object, in which case the implementation might in theory decide to interpret it as one-past-the-end of some other array object instead.
The intended behavior is clearly that the pointer comparing equal both to &array1+1 and to &array2 is equivalent to both: it means to let you compare it to addresses within array1 or dereference it to get array2[0]. However, the Standard does not actually say that.
PPS
The standards committee has addressed some of these issues and proposes that the C standard explicitly add language about pointer provenance. This would nail down whether a conforming implementation is allowed to assume that a pointer created by bit manipulation does not alias another pointer.
Specifically, the proposed corrigendum would introduce pointer provenance and allow pointers with different provenance not to compare equal. It would also introduce a -fno-provenance option, which would guarantee that any two pointers compare equal if and only if they have the same numeric address. (As discussed above, two object pointers that compare equal alias each other.)
1) Cast to void pointer and back
This yields a valid pointer equal to the original. Paragraph 6.3.2.3/1 of the standard is clear on this:
A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.
2) Cast to appropriately sized integer and back
3) A couple of trivial integer operations
4) Integer operations nontrivial enough to obscure provenance, but which will nonetheless leave the value unchanged
5) More indirect integer operations that will leave the value unchanged
[...] Obviously case 1 is valid, and case 2 surely must be also. On the other hand, I came across a post by Chris Lattner - which I unfortunately can't find now - saying case 5 is not valid, that the standard licenses the compiler to just compile it to an infinite loop.
C does require a cast when converting either way between pointers and integers, and you have omitted some of those in your example code. In that sense your examples (2) - (5) are all non-conforming, but for the rest of this answer I'll pretend the needed casts are there.
Still, being very pedantic, all of these examples have implementation-defined behavior, so they are not strictly conforming. On the other hand, "implementation-defined" behavior is still defined behavior; whether that means your code is "valid" or not depends on what you mean by that term. In any event, what code the compiler might emit for any of the examples is a separate matter.
These are the relevant provisions of the standard from section 6.3.2.3 (emphasis added):
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.
The definition of uintptr_t is also relevant to your particular example code. The standard describes it this way (C2011, 7.20.1.4/1; emphasis added):
an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer.
You are converting back and forth between int * and uintptr_t. int * is not void *, so 7.20.1.4/1 does not apply to these conversions, and the behavior is implementation-defined per section 6.3.2.3.
Suppose, however, that you convert back and forth via an intermediate void *:
uintptr_t b = (uintptr_t)(void *)a;
a = (int *)(void *)b;
On an implementation that provides uintptr_t (which is optional), that would make your examples (2 - 5) all strictly conforming. In that case, the result of the integer-to-pointer conversions depends only on the value of the uintptr_t object, not on how that value was obtained.
As for the claims you attribute to Chris Lattner, they are substantially incorrect. If you have represented them accurately, then perhaps they reflect a confusion between implementation-defined behavior and undefined behavior. If the code exhibited undefined behavior then the claim might hold some water, but that is not, in fact, the case.
Regardless of how its value was obtained, b has a definite value of type uintptr_t, and the loop must eventually increment i to that value, at which point the if block will run. In principle, the implementation-defined behavior of a conversion from uintptr_t directly to int * could be something crazy such as skipping the next statement (thus causing an infinite loop), but such behavior is entirely implausible. Every implementation you ever meet will either fail at that point, or store some value in variable a, and then, if it didn't crash, it will execute the return statement.
Because different application fields require the ability to manipulate pointers in different ways, and because the best implementations for some purposes may be totally unsuitable for some others, the C Standard treats the support (or lack thereof) for various kinds of manipulations as a Quality of Implementation issue. In general, people writing an implementation for a particular application field should be more familiar than the authors of the Standard with what features would be useful to programmers in that field, and people making a bona fide effort to produce quality implementations suitable for writing applications in that field will support such features whether the Standard requires them to or not.
In the pre-Standard language invented by Dennis Ritchie, all pointers of a particular type that were identified the same address were equivalent. If any sequence of operations on a pointer would end up producing another pointer of the same type that identified the same address, that pointer would--essentially by definition--be equivalent to the first. The C Standard specifies some situations, however, where pointers can identify the same location in storage and be indistinguishable from each other without being equivalent. For example, given:
int foo[2][4] = {0};
int *p = foo[0]+4, *q=foo[1];
both p and q will compare equal to each other, and to foo[0]+4, and to foo[1]. On the other hand, despite the fact that evaluation of p[-1] and q[0] would have defined behavior, evaluation of p[0] or q[-1] would invoke UB. Unfortunately, while the Standard makes clear that p and q would not be equivalent, it does nothing to clarify whether performing various sequences of operations on e.g. p will yield a pointer that is usable in all cases where p was usable, a pointer that's usable in all cases where either p or q would be usable, a pointer that's only usable in cases where q would be usable, or a pointer that's only usable in cases where both p and q would be usable.
Quality implementations intended for low-level programming should generally process pointer manipulations other than those involving restrict pointers in a way that would yield a pointer that would be usable in any case where a pointer that compares equal to it could usable. Unfortunately, the Standard provides no means by which a program can determine if it's being processed by a quality implementations that is suitable for low-level programming and refuse to run if it isn't, so most forms of systems programming must rely upon quality implementations processing certain actions in documented manner characteristic of the environment even when the Standard would impose no requirements.
Incidentally, even if the normal constructs for manipulating pointers wouldn't have any way of creating pointers where the principles of equivalence shouldn't apply, some platforms may define ways of creating "interesting" pointers. For example, if an implementation that would normally trap operations on null pointers were run on in an environment where it might sometimes be necessary to access an object at address zero, it might define a special syntax for creating a pointer that could be used to access any address, including zero, within the context where it was created. A "legitimate pointer to address zero" would likely compare equal to a null pointer (even though they're not equivalent), but performing a round-trip conversion to another type and back would likely convert what had been a legitimate pointer to address zero into a null pointer. If the Standard had mandated that a round-trip conversion of any pointer must yield one that's usable in the same way as the original, that would require that the compiler omit null traps on any pointers that could have been produced in such a fashion, even if they would far more likely have been produced by round-tripping a null pointer.
Incidentally, from a practical perspective, "modern" compilers, even in -fno-strict-aliasing, will sometimes attempt to track provenance of pointers through pointer-integer-pointer conversions in such a way that pointers produced by casting equal integers may sometimes be assumed incapable of aliasing.
For example, given:
#include <stdint.h>
extern int x[],y[];
int test(void)
{
if (!x[0]) return 999;
uintptr_t upx = (uintptr_t)x;
uintptr_t upy = (uintptr_t)(y+1);
//Consider code with and without the following line
if (upx == upy) upy = upx;
if ((upx ^ ~upy)+1) // Will return if upx != upy
return 123;
int *py = (int*)upy;
*py += 1;
return x[0];
}
In the absence of the marked line, gcc, icc, and clang will all assume--even when using -fno-strict-aliasing, that an operation on *py can't affect *px, even though the only way that code could be reached would be if upx and upy held the same value (meaning that both px and py were produced by casting the same uintptr_t value). Adding the marked line causes icc and clang to recognize that px and py could identify the same object, but gcc assumes that the assignment can be optimized out, even though it should mean that py would be derived from px--a situation a quality compiler should have no trouble recognizing as implying possible aliasing.
I'm not sure what realistic benefit compiler writers expect from their efforts to track provenance of uintptr_t values, given that I don't see much purpose for doing such conversions outside of cases where the results of conversions may be used in "interesting" ways. Given compiler behavior, however, I'm not sure I see any nice way to guarantee that conversions between integers and pointers will behave in a fashion consistent with the values involved.
Related
For example, you might want to align a void * that is aligned to 4-byte boundary to 16-byte boundary.
int *align16(void *p) {
return (int *)((char *)p + 16 - (uintptr_t)p % 16);
}
Normally, accessing an int * casted from a char * violates strict aliasing rules and invokes undefined behaviour, while the opposite is safe.
Is it still undefined behaviour in cases such as above?
I found an old question similar to this where the reply says it's fine as far as the alignment is kept. However, that answer does not mention anything about strict aliasing and did not reply to a comment related to the standard specification.
A side question. The optimized compiler output of align16 can be decompiled as follows.
int *align16(void *p) {
return (int *)((uintptr_t)p + 16 & -16);
}
How does the standard deal with such case, accessing a pointer modified while casted to a uintptr_t?
Does casting to a char pointer to increment a pointer by a certain amount and then accessing as a different type violate strict aliasing?
Not inherently so.
Normally, accessing an int * casted from a char * violates strict aliasing rules
Not necessarily. Strict aliasing is about the (effective) type of the pointed-to object. It is quite possible for the object to which a char * points to be an int, or compatible with int, or to be assigned effective type int as a consequence of the (write) access. In such cases, casting to int * and dereferencing the result is perfectly valid.
There are, yes, lots of cases in which casting a char * to an int * and then dereferencing the result would constitute a strict-aliasing violation, but it is not specifically because of the involvement of, or the casting to or from, type char *.
The above applies regardless of how the particular char * value was obtained, so in your particular example case, too. If the result of your pointer computation is a valid pointer, and the object to which it points is genuinely an (effective) int or is compatible with int in one of the specific ways documented in section 6.5 of the language spec, then reading the pointed-to value via the pointer is fine. Otherwise, it is a strict-aliasing violation.
Attempting to dereference a pointer value that is not correctly aligned for its type is a potential issue in general with pointer manipulation, but the strict aliasing rule is stronger than and effectively inclusive of pointer alignment considerations. If you have an access that satisfies the strict aliasing rule then the pointer involved must be satisfactorily aligned for its type. The reverse is not necessarily true.
Do note, however, that although on many platforms, your align16() will indeed attempt to perform a read of a 16-byte-aligned object, the C language specifications do not require that to be so. Pointer-to-integer and integer-to-pointer conversions are explicitly allowed, but their results are implementation defined. It is not necessarily the case that value on the integer side of such a conversion reports on or controls the alignment of the pointer on the other side.
How does the standard deal with such case, accessing a pointer modified while casted to a uintptr_t?
See above. Pointer-to-integer and integer-to-pointer conversions have implementation-defined effect as far as the language spec is concerned. However, on most implementations you're likely to meet, your two versions of align16() will have equivalent behavior.
Operations which displace a structure pointer by an amount which is not a multiple of the structure's size and then make use of the resulting pointer had unambiguously defined semantics in the language defined by the 1974 C Reference Manual. The language would be useless for many purposes if such operations could not be expected to behave usefully in at least some cases, but some existing implementations do not process them meaningfully in all cases, and nothing in any formal description of the language indicates when such operations should or should not be expected to behave meaningfully (in a fashion analogous to the 1974 language). The Standard does allow implementations to impose certain constraints, and deviate from the 1974 behavior if they are violated, but none seem applicable here.
Given a function like
struct foo { int x,y; };
int test(struct foo *p1, struct foo *p2)
{
p1->x = 1;
p2->y = 2;
return p1->x;
}
no constraint would be violated if p1->x and p2->y happened to coincide, but neither clang nor gcc would produce code that would return 2 in that case. Even if the code were something like:
struct foo { int x,y; };
void test(struct foo *p1, struct foo *p2, int mode)
{
p1->x = 1;
p2->y = 2;
if (mode)
p1->x = 1;
}
whose behavior should be defined if calling code access the storage exclusively via p2->y in cases where mode is zero, and exclusively via p1->x otherwise, clang would generate code that would the shared storage location holding 2 even when mode is zero.
If actions which derive a pointer of one type from an object of another, manipulate it in "weird" ways, and then access the resulting pointer were regarded as unsequenced with regard to intervening actions that access the storage via other means, then the above constructs would both invoke Undefined Behavior in the storage-overlap case because they involve unsequenced accesses to partially-overlapping objects of type struct foo. Unfortunately, even though such a rule would allow nearly all of the useful type-based aliasing optimizations without interfering with useful type-punning constructs, and even though compilers don't behave reliably in cases that would violate such a rule, the way the Standard is worded uses a different abstraction model which fits neither programmer needs nor the behavior of actual compilers.
Let's say I want to move a void* pointer by 4 bytes. Are the following equivalent:
A:
void* new_address(void* in_ptr) {
intptr_t tmp = (intptr_t)in_ptr;
intptr_t new_address = tmp + 4;
return (void*)new_address;
}
B:
void* new_address(void* in_ptr) {
char* tmp = (char*)in_ptr;
char* new_address = tmp + 4;
return (void*)new_address;
}
Are both defined behavior? Is one more popular/accepted convention? Any other reason to use one over the other?.
Let's only consider 64bit systems. If intptr_t is not available we can use int64_t instead.
The context is a custom memory allocator which needs to move the pointer before allocating new block of memory to a specific address (for alignment purposes). We don't know what object the resulting pointer is going to point to yet but we know we need to move it to a specific location which in the examples above is 4 bytes.
Michael Kerrisk says on page 1415 that,
The C standards make one exception to the rule that pointers of
different types need not have the same representation: pointers of the
types char * and void * are required to have the same internal
representation.
All the C standard guarantees (7.18.1.4) is that you can convert void* values to intptr_t (or uintptr_t) and back again and end up with an equal value for the pointer.
The nuance is here that we cannot apply mathematical operations (including ==) if void* is in use.
Is casting a pointer to intptr_t [...] defined behavior?
Converting a pointer to any integer type is defined and the result is implementation defined, except when result can't be represented in integer type, then it's undefined behavior. See C11 6.3.2.3p6. But intptr_t has to be able to represent void* - the behavior is defined.
, doing arithmetic on it and then casting back, defined behavior?
Any integer may be converted to any pointer type. The resulting pointer is implementation defined - there is no guarantee that adding 4 to intptr_t will increment the pointer value by 4. See C11 6.3.2.3p5.
Are both defined behavior?
Yes, however the result is implementation defined.
Is one more popular/accepted convention?
Subjective: I say using uintptr_t is more popular then intptr_t. Converting a pointer to uintptr_t or to char* to do some arithmetic happens in some code, I can't say which is more popular.
Any other reason to use one over the other?.
Not really, but I think go with char*.
When it comes to actually accessing the data behind the resulting pointer - it depends. If the resulting pointer points within the same object then you're fine (remember, conversion is implementation defined). If the resulting pointer does not point to the same object, I believe the best interpretation would be from reading c2263 Clarifying Pointer Provenance v4 2.2.3Q5 and I think that's: the current C11 standard does not clearly specify that, which would make the behavior not defined.
Because you tagged gcc, both code snippets should compile to equivalent code - I believe on all architectures pointers are converted 1:1 to (u)intptr_t on gcc. Gcc docs implementation defined behavior 4.7 arrays and pointers states casting from pointer to integer and back again, the resulting pointer must reference the same object as the original pointer, otherwise the behavior is undefined - so you're safe as long as the resulting pointer points to the same object.
The context is a custom memory allocator
See implementations of container_of and offsetof macros. Do not hardcode + 4 in your code, and if you do, do not depend on alignment requirements on accessing the resulting pointers - remember to use memcpy to safely copy the context or handle alignment properly. Do not reinvent the wheel - when in doubt see other implementations like glibc malloc.c or newlib malloc.c - they both calculate on char* in mem2chunk macro, but also happen to do calculations on uintptr_t integers.
No 'strictly conforming program uses A. Using the result may be Undefined Behaviour as there is no requirement for addition against intptr_t to be reflected in a pointer value if that intptr_ is converted back to a pointer.
It is both unspecified behaviour and implementation-defined.
If the optional type intptr_t is defined all you are guaranteed is that you can convert void * to intptr_t and then convert that value back to void * and the two values will compare equal (==).
The strictly conforming way to perform pointer arithmetic is B. B is guaranteed to work if and only if the pointer int_ptr is valid and for the largest enclosing object there are 3 or more bytes in that object beyond that value. It's 3 because it's valid to point to (but not dereference) to the address that is (logically) one byte beyond the end of an object.
Object includes a declared object (including array) or block of memory such as returned by malloc().
All good practice is to prefer to write 'strictly conforming' programs where possible. So all good practice is to prefer B over A.
According to the standard the use of the pointer (as a pointer) may result in Undefined Behaviour because it may be (implementation defined) to be a trap representation.
A strictly conforming program is defined as "A strictly conforming program shall use only those features of the language and library specified in this International Standard.3) It shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior, and shall not exceed any minimum implementation limit.
There's some disagreement about whether the code offered for A is unspecified or implementation defined. The standard says both because implementation-defined behaviour is a sub-category of unspecified. However because the implementation may document it as a trap representation using the value may result in Undefined Behaviour.
But I hope that is swept aside by the fact that 'strictly conforming programs' don't depend on unspecified, undefined or implementation defined behaviour.
So good practice here is certainly B.
Consider a secure environment that encrypts pointer values to deliberately confound the de-referencing of arbitrary pointer values. In principle it could provide intptr_t and be conformant.
Though I still maintain that if A doesn't work then intptr_t being an optional type it would be better to not provide it. Whether it is defined is unspecified and implementation dependent. That's because no 'strictly conforming program' uses it and it has no practical use other than to manipulate a pointer as an arithmetic type in a way not supported by pointer arithmetic on a compatible pointer type char *. The snippet in A falls into that category.
To store a void * declare a void * or char[sizeof(void*)] or malloc() or similar. To overlay a void * over an arithmetic type, declare a union and benefit that the union will be aligned for a void *.
But according to the specification it is unspecified, implementation-defined no 'strictly conforming program' can rely on it and may result in Undefined Behaviour.
A very long winded way of saying the answer, here, is B.
Unfortunately, I haven't found anything like std-discussion for the ISO C standard, so I'll ask here.
Before answering the question, make sure you are familiar with the idea of pointer provenance (see DR260 and/or http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2263.htm).
6.3.2.3(Pointers) paragraph 7 says:
When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining bytes of the object.
The questions are:
What does "a pointer to an object" mean? Does it mean that if we want to get the pointer to the lowest addressed byte of an object of type T, we shall convert a pointer of type cvT* to a pointer to a character type, and converting a pointer to void, obtained from the pointer of type T*, won't give us desired result? Or "a pointer to an object" is the value of a pointer which follows from the pointer provenance and cast to void* does not change the value (analogous to how it was recently formalized in C++17)?
Why the paragraph explicitly mentions increments? Does it mean that adding value greater than 1 is undefined? Does it mean that decrementing (after we have incremented the result several times so that we won't go beyond the lower object boundary) is undefined? In short: is the sequences of bytes composing an object an array?
The general description of pointer addition suggests that for any values/types of pointer p and signed integers x and y where ((ptr+x)+y) and (x+y) are both defined by the Standard, (ptr+(x+y)) would behave equivalent to ((ptr+x)+y). While it might be possible to argue that the Standard doesn't explicitly say that incrementing a pointer five times would be equivalent to adding 5, there is nothing in the Standard that would suggest that quality implementations should not be expected to behave in that fashion.
Note that the authors of the Standard didn't try to make it "language-lawyer-proof". Further, they didn't think anyone would care about whether or not an obviously-inferior implementation was "conforming". An implementation that only works reliably if bytes of an object are accessed sequentially would be less versatile than one which supported reliable indexing, while offering no plausible advantage. Consequently, there should be no need for the Standard to mandate support for indexing, because anyone trying to produce a quality implementation would support it whether the Standard mandated it or not.
Of course, there are some constructs which programmers in the 1990s--or even the authors of the Standard themselves--expected quality compilers to handle reliably, but which some of today's "clever" compilers don't. Whether that means such expectations were unreasonable, or whether they remain accurate when applied to quality compilers, is a matter of opinion. In this particular case, I think the implication that positive indexing should behave like repeated incrementing is strong enough that I wouldn't expect compiler writers to argue otherwise, but I'm not 100% certain that no compiler would ever be "clever"/obtuse enough to look at something like:
int test(unsigned char foo[5][5], int x)
{
foo[1][0] = 1;
// Following should yield a pointer that can be used to access the entire
// array 'foo', but an obtuse compiler writer could perhaps argue that the
// code is converting the address of foo[0] into a pointer to the first
// element of that sub-array, and that the resulting pointer is thus only
// usable to access items within that sub-array.
unsigned char *p = (unsigned char*)foo;
// Following should be able to access any element of the object [i.e. foo]
// whose address was taken
p[x] = 2;
return foo[1][0];
}
and decide that it could skip the second read of foo[1][0] since p[x] wouldn't possibly access any element of foo beyond the first row. I would, however, say that programmers should not try to code around the possibility of vandals writing a compiler that would behave that way. No program can be made bullet-proof against vandals writing obtuse-but-"conforming" compilers, and the fact that a program can be undermined by such vandals should hardly be viewed as a defect.
Take a non-char c object and create a pointer to it, i.e.
int obj;
int *objPtr = &obj;
convert the pointer to object to pointer to char:
char *charPtr = (char *)objPtr;
now, charPtr points to the lowest byte or the int obj.
increment it:
charPtr++;
now it points to the next byte of the object. and so on till you reach the size of the object:
int i;
for (i = 0; i < sizeof(obj); i++)
printf("%d", *charPtr++);
unsigned int addr = 0x1000;
int temp = *((volatile int *) addr + 3);
Does it treat the incremented pointer (ie addr + 3 * sizeof(int)), as a pointer to volatile int (while dereferencing). In other words can I expect the hardware updated contents of say (0x1012) in temp ?
Yes.
Pointer arithmetic does not affect the type of the pointer, including any type qualifiers. Given an expression of the form A + B, if A has the type qualified pointer to T and B is an integral type, the expression A + B will also be a qualified pointer to T -- same type, same qualifiers.
From 6.5.6.8 of the C spec (draft n1570):
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand.
Presuming addr is either an integer (variable or constant) with a value your implementation can safely convert to an int * (see below).
Consider
volatile int a[4] = [ 1,2,3,4};
int i = a[3];
This is exactly the same, except for the explicit conversion integer to volatile int * (a pointer to ...). For the index operator, the name of the array decays to a pointer to the first element of a. This is volatile int * (type qualifiers in C apply to the elements of an array, never the array itself).
This is the same as the cast. Leaves 2 differences:
The conversion integer to "pointer". This is implementation defined, thus if your compiler supports it correctly (it should document this), and the value is correct, it is fine.
Finally the access. The underlying object is not volatile, but the pointer/resp. access. This actually is a defect in the standard (see DR476 which requires the object to be volatile, not the access. This is in contrast to the documented intention (read the link) and C++ semantics (which should be identical). Luckily all(most all) implementations generate code as one would expect and perform the access as intended. Note this is a common ideom on embedded systems.
So if the prerequisites are fulfilled, the code is correct. But please see below for better(in terms of maintainability and safety) options.
Notes: A better approach would be to
use uintptr_t to guarantee the integer can hold a pointer, or - better -
#define ARRAY_ADDR ((volatile int *)0x1000)
The latter avoids accidental modification to the integer and states the implications clear. It also can be used easier. It is a typical construct in low-level peripheral register definitions.
Re. your incrementing: addr is not a pointer! Thus you increment an integer, not a pointer. Left apart this is more to type than using a true pointer, it also is error-prone and obfuscates your code. If you need a pointer, use a pointer:
int *p = ARRAY_ADDR + 3;
As a personal note: Everybody passing such code (the one with the integer addr) in a company with at least some quality standards would have a very serious talk with her team leader.
First note that conversions from integers to pointers are not necessarily safe. It is implementation-defined what will happen. In some cases such conversions can even invoke undefined behavior, in case the integer value cannot be represented as a pointer, or in case the pointer ends up with a misaligned address.
It is safer to use the integer type uintptr_t to store pointers and addresses, as it is guaranteed to be able to store a pointer for the given system.
Given that your compiler implements a safe conversion for this code (for example, most embedded systems compilers do), then the code will indeed behave as you expect.
Pointer arithmetic will be done on a type that is volatile int, and therefore + 3 means increase the address by sizeof(volatile int) * 3 bytes. If an int is 4 bytes on your system, you will end up reading the contents of address 0x100C. Not sure where you got 0x1012 from, mixing up decimal and hex notation?
I have a void pointer pointing to a memory address. Then, I do
int pointer = the void pointer
float pointer = the void pointer
and then, dereference them go get the values.
{
int x = 25;
void *p = &x;
int *pi = p;
float *pf = p;
double *pd = p;
printf("x: n%d\n", x);
printf("*p: %d\n", *(int *)p);
printf("*pi: %d\n", *pi);
printf("*pf: %f\n", *pf);
printf("*pd: %f\n", *pd);
return 0;
}
The output of dereferencing pi(int pointer) is 25.
However the output of dereferencing pf(float pointer) is 0.000.
Also dereferncing pd(double pointer) outputs a negative fraction that keeps
changing?
Why is this and is it related to endianness(my CPU is little endian)?
As per C standard, you'er allowed to convert any pointer to void * and convert it back, it'll have the same effect.
To quote C11, chapter §6.3.2.3
[...] A pointer to
any object type may be converted to a pointer to void and back again; the result shall
compare equal to the original pointer.
That is why, when you cast the void pointer to int *, de-reference and print the result, it prints properly.
However, standard does not guarantee that you can dereference that pointer to be of a different data type. It is essentially invoking undefined behaviour.
So, dereferencing pf or pd to get a float or double is undefined behavior, as you're trying to read the memory allocated for an int as a float or double. There's a clear case of mismtach which leads to the UB.
To elaborate, int and float (and double) has different internal representations, so trying to cast a pointer to another type and then an attempt to dereference to get the value in other type won't work.
Related , C11, chapter §6.5.3.3
[...] If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. If an
invalid value has been assigned to the pointer, the behavior of the unary * operator is
undefined.
and for the invalid value part, (emphasis mine)
Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an
address inappropriately aligned for the type of object pointed to, and the address of an object after the
end of its lifetime.
In addition to the answers before, I think that what you were expecting could not be accomplished because of the way the float numbers are represented.
Integers are typically stored in Two's complement way, basically it means that the number is stored as one piece. Floats on the other hand are stored using a different way using a sign, base and exponent, Read here.
So the main idea of convertion is impossible since you try to take a number represented as raw bits (for positive) and look at it as if it was encoded differently, this will result in unexpected results even if the convertion was legit.
So... here's probably what's going on.
However the output of dereferencing pf(float pointer) is 0.000
It's not 0. It's just really tiny.
You have 4-byte integers. Your integer looks like this in memory...
5 0 0 0
00000101 00000000 00000000 00000000
Which interpreted as a float looks like...
sign exponent fraction
0 00001010 0000000 00000000 00000000
+ 2**-117 * 1.0
So, you're outputting a float, but it's incredibly tiny. It's 2^-117, which is virtually indistinguishable from 0.
If you try printing the float with printf("*pf: %e\n", *pf); then it should give you something meaningful, but small. 7.006492e-45
Also dereferncing pd(double pointer) outputs a negative fraction that keeps changing?
Doubles are 8-bytes, but you're only defining 4-bytes. The negative fraction change is the result of looking at uninitialized memory. The value of uninitialized memory is arbitrary and it's normal to see it change with every run.
There are two kinds of UBs going on here:
1) Strict aliasing
What is the strict aliasing rule?
"Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)"
However, strict aliasing can be turned off as a compiler extension, like -fno-strict-aliasing in GCC. In this case, your pf version would function well, although implementation defined, assuming nothing else has gone wrong (usually float and int are both 32 bit types and 32 bit aligned on most computers, usually). If your computer uses IEEE754 single, you can get a very small denorm floating point number, which explains for the result you observe.
Strict aliasing is a controversial feature of recent versions of C (and considered a bug by a lot of people) and makes it very difficult and more hacky than before to do reinterpret cast (aka type punning) in C.
Before you are very aware of type punning and how it behaves with your version of compiler and hardware, you shall avoid doing it.
2) Memory out of bound
Your pointer points to a memory space as large as int, but you dereference it as double, which is usually twice of the size of an int, you are basically reading half a double of garbage from somewhere in the computer, which is why your double keeps changing.
The types int, float, and double have different memory layouts, representations, and interpretations.
On my machine, int is 4 bytes, float is 4 bytes, and double is 8 bytes.
Here is how you explain the results you are seeing.
Derefrencing the int pointer works, obviously, because the original data was an int.
Derefrencing the float pointer, the compiler generates code to interpret the contents of 4 bytes in memory as a float. The value in the 4 bytes, when interpreted as a float, gives you 0.00. Lookup how float is represented in memory.
Derefrencing the double pointer, the compiler generates code to interpret the contents in memory as a double. Because a double is larger than an int, this accesses the 4 bytes of the original int, and an extra 4 bytes on the stack. Because the contents of these extra 4 bytes is dependent on the state of the stack, and is unpredictable from run to run, you see the varying values that correspond to interpreting the entire 8 bytes as a double.
In the following,
printf("x: n%d\n", x); //OK
printf("*p: %d\n", *(int *)p); //OK
printf("*pi: %d\n", *pi); //OK
printf("*pf: %f\n", *pf); // UB
printf("*pd: %f\n", *pd); // UB
The accesses in the first 3 printfs are fine as you are accessing int through the lvalue type of type int. But the next 2 are not fine as the violate 6.5, 7, Expressions.
An int * is not a compatible type with a float * or double *. So the accesses in the last two printf() calls cause undefined behaviour.
C11, $6.5, 7 states:
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.
The term "C" is used to describe two languages: one invented by K&R in which pointers identify physical memory locations, and one which is derived from that which works the same in cases where pointers are either read and written in ways that abide by certain rules, but may behave in arbitrary fashion if they are used in other ways. While the latter language is defined the by the Standards, the former language is what became popular for microcomputer programming in the 1980s.
One of the major impediments to generating efficient machine code from C code is that compilers can't tell what pointers might alias what variables. Thus, any time code accesses a pointer that might point to a given variable, generated code is required to ensure that the contents of the memory identified by the pointer and the contents of the variable match. That can be very expensive. The people writing the C89 Standard decided that compilers should be allowed to assume that named variables (static and automatic) will only be accessed using pointers of their own type or character types; the people writing C99 decided to add additional restrictions for allocated storage as well.
Some compilers offer means by which code can ensure that accesses using different types will go through memory (or at least behave as though they are doing so), but unfortunately I don't think there's any standard for that. C14 added a memory model for use with multi-threading which should be capable of achieving required semantics, but I don't think compilers are required to honor such semantics in cases where they can tell that there's no way for outside threads to access something [even if going through memory would be necessary to achieve correct single-thread semantics].
If you're using gcc and want to have memory semantics that work as K&R intended, use the "-fno-strict-aliasing" command-line option. To make code efficient it will be necessary to make substantial use of the "restrict" qualifier which was added in C99. While the authors of gcc seem to have focused more on type-based aliasing rules than "restrict", the latter should allow more useful optimizations.