Pointers, casting and different compilers - c

I am now taking an ANSI C programming language course and trying to run this code from lecturer's slide:
#include<stdio.h>
int main()
{
int a[5] = {10, 20, 30, 40, 50};
double *p;
for (p = (double*)a; p<(double*)(a+5); ((int*)p)++)
{
printf("%d",*((int*)p));
}
return 0;
}
Unfortunately it doesn't work. On MacOS, XCode, Clang I get an error:"Assignment to cast is illegal, lvalue casts are not supported" and on Ubuntu gcc I get the next error: "lvalue required as increment operand"
I suspect that issue is compiler as we learn ANCI C and it has its own requirements which can violent other standards.

Regarding ((int*)p)++:
The result of (int *) p is a value. This is different from an lvalue. An lvalue (potentially) designates an object. For example, after int x = 3;, the name x designates the object that we defined. We can use it in an expression generally, such as y = 2*x, and then the x is used for its value. But we can also use it in an assignment, such as x = 5, and then the x is used for the object.
The expression (int *) p takes p and converts it to a pointer to int. The result is only a value. It is not an lvalue, so it does not represent an object that can be modified.
The ++ operator modifies an object. So it can only be applied to an lvalue. Since (int *) p is not an lvalue, ++ cannot be applied to it.
The code from the slide, as you have shown it, is incorrect, and I would not expect it to work in any C implementation. (C does permit implementations to make many extensions, but extending C to permit this operation would be unusual.)
Regarding (double*)a and (int*)p:
C does permit you to convert pointers to objects to pointers to different kinds of objects. There are various rules about this. An important one is that the resulting pointer must have the correct alignment for the type it points to.
Objects have various alignment requirements, meaning they must be placed at particular addresses in memory. Commonly, char objects may have any address, int objects must be at multiples of four bytes, and double objects must be at multiples of eight bytes. These requirements vary from C implementation to C implementation. I will use these values for illustration.
When you convert a proper double * to an int *, we know that the resulting pointer is a multiple of four, because it started as a multiple of eight (assuming the requirements stated above). So that is a safe conversion. When you convert an int *, to a double *, it may have the wrong alignment. In particular, given an array a of int, we know that either a[0] or a[1] must be improperly aligned for a double, because, if one of them is at a multiple of eight bytes, the other one must be off by four bytes from a multiple of eight. Therefore, the conversions from int * to double * in this code are not defined by the C standard.
They might work in many C implementations, but you should not rely on them.
The C rules also state that when a pointer to an object is converted back to its original type, the result is equal to the original pointer. So, the round-trip conversions in the sample code would be okay if the alignment rules had been obeyed: You can convert an int * to a double * and back to an int *, provided alignment requirements are obeyed.
If the double * had been used to access a double, that would violate aliasing rules. Generally, C does not define the behavior when an object of one type is accessed as if it were another type. There are some exceptions, notably with character types. However, simply converting pointers back and forth without using them to access objects is okay, except for the alignment problem.

Related

Why can't we use * on non-pointers?

Let's say I have the code:
int x = 5;
int* p = &x;
then writing *p will return 5 and allow me to modify x (as expected). Say, for whatever reason that I then write:
int y = p; // y holds x's address
*y = 3; // this is invalid and throws an error when compiling
*((int*)y) = 3; // this is okay
(when compiling on gcc 9.2)
My question is: why does C not allow us to use * on non-pointer types?
C is a strongly typed language, which means that the operations which are allowed on an object (and the interpretation of those operations) is a function of the object's type. That's literally what it means for an object to have a type: the type determines the operations you can do with the object.
Unary * (the pointer indirection operator) is defined for pointer types, and it's not defined for integer types.
If you want to treat an integer's value as if it were a pointer, you can use an explicit cast, as in the *((int *)y) = 3; example you mentioned in your question.
There are two reasons the unary * operator is not defined for integers:
Taking an integer and pretending it's a pointer is generally a bad idea, not something to be encouraged. If you really want to do it, the extra cost imposed on you -- namely that you have to use that pointer cast -- is appropriate.
The bare expression *y doesn't contain enough information to know how big the pointed-to object might be. If you write *y = 3 and it were legal, how would the compiler know to assign an int, a short, or a char?
Point 2 is key. It's important to remember that C does not have one "pointer" type. Every pointer type incorporates a specification of the type of object which the pointer will point to. That's no accident, it's fundamental, and there's no way around it.
So you can't implicitly treat an integer as if it were a pointer, and even if you do it explicitly -- that is, with a cast, as in *((int *)y) = 3, you may still be on shaky ground, especially if integers and pointers don't have the same size on your machine.
These days, this is all generally such a bad idea that the compilers are slowly dropping their old "the programmer must know what he's doing" attitude, and getting somewhat hissy with warnings. For example, int y = p will generally get you a warning about a pointer-to-int assignment, and even with the explicit cast, *((int *)y) = 3 might get you a warning about "cast to pointer from integer of different size".
Boring answer - because that's how the language is defined:
6.5.3.2 Address and indirection operators
Constraints
1 The operand of the unary & operator shall be either a function designator, the result of a
[] or unary * operator, or an lvalue that designates an object that is not a bit-field and is
not declared with the register storage-class specifier.
2 The operand of the unary * operator shall have pointer type.
C 2011 Online Draft
Slightly-less boring answer:
Pointers are not integers; they do not behave like integers. The operations on pointers and integers are different. While there is such a thing as pointer arithmetic, it does not behave like integer arithmetic. Pointers are abstractions of memory addresses, which do not have to have integer representation.
Type matters in C (not as much as in some other languages, but it does matter). Operations on integer types do not apply to pointer types and vice versa, just like operations on aggregate (struct or array types) do not apply to integer types.
You can't use * on an integer operand for the same reason you can't use [] or () or . or -> on an integer operand; those operations are not defined for integer types.
The reason that we can't use * on integer because pointer and integer both are different.
you can not use int y to store address because integer can,t hold addresses it can only take integer value.
int y = p; // y holds x's address
*y = 3; // this is invalid and throws an error when compiling
*((int*)y) = 3; // this is okay
On the other hand pointer are designed to store address of variable so you can use * to access the value store at that address while integer can not store address.
operations for both integer and pointer are different from each other
*((int*)y) = 3; // this is okay No, this is not ok, it causes UB. Just because the compiler does not complain does not mean the code is ok or is free from UB.

What treatment can a pointer undergo and still be valid?

Which of the following ways of treating and trying to recover a C pointer, is guaranteed to be valid?
1) Cast to void pointer and back
int f(int *a) {
void *b = a;
a = b;
return *a;
}
2) Cast to appropriately sized integer and back
int f(int *a) {
uintptr_t b = a;
a = (int *)b;
return *a;
}
3) A couple of trivial integer operations
int f(int *a) {
uintptr_t b = a;
b += 99;
b -= 99;
a = (int *)b;
return *a;
}
4) Integer operations nontrivial enough to obscure provenance, but which will nonetheless leave the value unchanged
int f(int *a) {
uintptr_t b = a;
char s[32];
// assume %lu is suitable
sprintf(s, "%lu", b);
b = strtoul(s);
a = (int *)b;
return *a;
}
5) More indirect integer operations that will leave the value unchanged
int f(int *a) {
uintptr_t b = a;
for (uintptr_t i = 0;; i++)
if (i == b) {
a = (int *)i;
return *a;
}
}
Obviously case 1 is valid, and case 2 surely must be also. On the other hand, I came across a post by Chris Lattner - which I unfortunately can't find now - saying something similar to case 5 is not valid, that the standard licenses the compiler to just compile it to an infinite loop. Yet each case looks like an unobjectionable extension of the previous one.
Where is the line drawn between a valid case and an invalid one?
Added based on discussion in comments: while I still can't find the post that inspired case 5, I don't remember what type of pointer was involved; in particular, it might have been a function pointer, which might be why that case demonstrated invalid code whereas my case 5 is valid code.
Second addition: okay, here's another source that says there is a problem, and this one I do have a link to. https://www.cl.cam.ac.uk/~pes20/cerberus/notes30.pdf - the discussion about pointer provenance - says, and backs up with evidence, that no, if the compiler loses track of where a pointer came from, it's undefined behavior.
According to the C11 draft standard:
Example 1
Valid, by §6.5.16.1, even without an explicit cast.
Example 2
The intptr_t and uintptr_t types are optional. Assigning a pointer to an integer requires an explicit cast (§6.5.16.1), although gcc and clang will only warn you if you don’t have one. With those caveats, the round-trip conversion is valid by §7.20.1.4. ETA: John Bellinger brings up that the behavior is only specified when you do an intermediate cast to void* both ways. However, both gcc and clang allow the direct conversion as a documented extension.
Example 3
Safe, but only because you’re using unsigned arithmetic, which cannot overflow, and are therefore guaranteed to get the same object representation back. An intptr_t could overflow! If you want to do pointer arithmetic safely, you can convert any kind of pointer to char* and then add or subtract offsets within the same structure or array. Remember, sizeof(char) is always 1. ETA: The standard guarantees that the two pointers compare equal, but your link to Chisnall et al. gives examples where compilers nevertheless assume the two pointers do not alias each other.
Example 4
Always, always, always check for buffer overflows whenever you read from and especially whenever you write to a buffer! If you can mathematically prove that overflow cannot happen by static analysis? Then write out the assumptions that justify that, explicitly, and assert() or static_assert() that they haven’t changed. Use snprintf(), not the deprecated, unsafe sprintf()! If you remember nothing else from this answer, remember that!
To be absolutely pedantic, the portable way to do this would be to use the format specifiers in <inttypes.h> and define the buffer length in terms of the maximum value of any pointer representation. In the real world, you would print pointers out with the %p format.
The answer to the question you intended to ask is yes, though: all that matters is that you get back the same object representation. Here’s a less contrived example:
#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
int i = 1;
const uintptr_t u = (uintptr_t)(void*)&i;
uintptr_t v;
memcpy( &v, &u, sizeof(v) );
int* const p = (int*)(void*)v;
assert(p == &i);
*p = 2;
printf( "%d = %d.\n", i, *p );
return EXIT_SUCCESS;
}
All that matter are the bits in your object representation. This code also follows the strict aliasing rules in §6.5. It compiles and runs fine on the compilers that gave Chisnall et al trouble.
Example 5
This works, same as above.
One extremely pedantic footnote that will never ever be relevant to your coding: some obsolete esoteric hardware has one’s-complement or sign-and-magnitude representation of signed integers, and on these, there might be a distinct value of negative zero that might or might not trap. On some CPUs, this might be a valid pointer or null pointer representation distinct from positive zero. And on some CPUs, positive and negative zero might compare equal.
PS
The standard says:
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
Furthermore, if the two array objects are consecutive rows of the same multidimensional array, one past the end of the first row is a valid pointer to the start of the next row. Therefore, even a pathological implementation that deliberately sets out to cause as many bugs as the standard allows could only do so if your manipulated pointer compares equal to the address of an array object, in which case the implementation might in theory decide to interpret it as one-past-the-end of some other array object instead.
The intended behavior is clearly that the pointer comparing equal both to &array1+1 and to &array2 is equivalent to both: it means to let you compare it to addresses within array1 or dereference it to get array2[0]. However, the Standard does not actually say that.
PPS
The standards committee has addressed some of these issues and proposes that the C standard explicitly add language about pointer provenance. This would nail down whether a conforming implementation is allowed to assume that a pointer created by bit manipulation does not alias another pointer.
Specifically, the proposed corrigendum would introduce pointer provenance and allow pointers with different provenance not to compare equal. It would also introduce a -fno-provenance option, which would guarantee that any two pointers compare equal if and only if they have the same numeric address. (As discussed above, two object pointers that compare equal alias each other.)
1) Cast to void pointer and back
This yields a valid pointer equal to the original. Paragraph 6.3.2.3/1 of the standard is clear on this:
A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.
2) Cast to appropriately sized integer and back
3) A couple of trivial integer operations
4) Integer operations nontrivial enough to obscure provenance, but which will nonetheless leave the value unchanged
5) More indirect integer operations that will leave the value unchanged
[...] Obviously case 1 is valid, and case 2 surely must be also. On the other hand, I came across a post by Chris Lattner - which I unfortunately can't find now - saying case 5 is not valid, that the standard licenses the compiler to just compile it to an infinite loop.
C does require a cast when converting either way between pointers and integers, and you have omitted some of those in your example code. In that sense your examples (2) - (5) are all non-conforming, but for the rest of this answer I'll pretend the needed casts are there.
Still, being very pedantic, all of these examples have implementation-defined behavior, so they are not strictly conforming. On the other hand, "implementation-defined" behavior is still defined behavior; whether that means your code is "valid" or not depends on what you mean by that term. In any event, what code the compiler might emit for any of the examples is a separate matter.
These are the relevant provisions of the standard from section 6.3.2.3 (emphasis added):
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.
The definition of uintptr_t is also relevant to your particular example code. The standard describes it this way (C2011, 7.20.1.4/1; emphasis added):
an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer.
You are converting back and forth between int * and uintptr_t. int * is not void *, so 7.20.1.4/1 does not apply to these conversions, and the behavior is implementation-defined per section 6.3.2.3.
Suppose, however, that you convert back and forth via an intermediate void *:
uintptr_t b = (uintptr_t)(void *)a;
a = (int *)(void *)b;
On an implementation that provides uintptr_t (which is optional), that would make your examples (2 - 5) all strictly conforming. In that case, the result of the integer-to-pointer conversions depends only on the value of the uintptr_t object, not on how that value was obtained.
As for the claims you attribute to Chris Lattner, they are substantially incorrect. If you have represented them accurately, then perhaps they reflect a confusion between implementation-defined behavior and undefined behavior. If the code exhibited undefined behavior then the claim might hold some water, but that is not, in fact, the case.
Regardless of how its value was obtained, b has a definite value of type uintptr_t, and the loop must eventually increment i to that value, at which point the if block will run. In principle, the implementation-defined behavior of a conversion from uintptr_t directly to int * could be something crazy such as skipping the next statement (thus causing an infinite loop), but such behavior is entirely implausible. Every implementation you ever meet will either fail at that point, or store some value in variable a, and then, if it didn't crash, it will execute the return statement.
Because different application fields require the ability to manipulate pointers in different ways, and because the best implementations for some purposes may be totally unsuitable for some others, the C Standard treats the support (or lack thereof) for various kinds of manipulations as a Quality of Implementation issue. In general, people writing an implementation for a particular application field should be more familiar than the authors of the Standard with what features would be useful to programmers in that field, and people making a bona fide effort to produce quality implementations suitable for writing applications in that field will support such features whether the Standard requires them to or not.
In the pre-Standard language invented by Dennis Ritchie, all pointers of a particular type that were identified the same address were equivalent. If any sequence of operations on a pointer would end up producing another pointer of the same type that identified the same address, that pointer would--essentially by definition--be equivalent to the first. The C Standard specifies some situations, however, where pointers can identify the same location in storage and be indistinguishable from each other without being equivalent. For example, given:
int foo[2][4] = {0};
int *p = foo[0]+4, *q=foo[1];
both p and q will compare equal to each other, and to foo[0]+4, and to foo[1]. On the other hand, despite the fact that evaluation of p[-1] and q[0] would have defined behavior, evaluation of p[0] or q[-1] would invoke UB. Unfortunately, while the Standard makes clear that p and q would not be equivalent, it does nothing to clarify whether performing various sequences of operations on e.g. p will yield a pointer that is usable in all cases where p was usable, a pointer that's usable in all cases where either p or q would be usable, a pointer that's only usable in cases where q would be usable, or a pointer that's only usable in cases where both p and q would be usable.
Quality implementations intended for low-level programming should generally process pointer manipulations other than those involving restrict pointers in a way that would yield a pointer that would be usable in any case where a pointer that compares equal to it could usable. Unfortunately, the Standard provides no means by which a program can determine if it's being processed by a quality implementations that is suitable for low-level programming and refuse to run if it isn't, so most forms of systems programming must rely upon quality implementations processing certain actions in documented manner characteristic of the environment even when the Standard would impose no requirements.
Incidentally, even if the normal constructs for manipulating pointers wouldn't have any way of creating pointers where the principles of equivalence shouldn't apply, some platforms may define ways of creating "interesting" pointers. For example, if an implementation that would normally trap operations on null pointers were run on in an environment where it might sometimes be necessary to access an object at address zero, it might define a special syntax for creating a pointer that could be used to access any address, including zero, within the context where it was created. A "legitimate pointer to address zero" would likely compare equal to a null pointer (even though they're not equivalent), but performing a round-trip conversion to another type and back would likely convert what had been a legitimate pointer to address zero into a null pointer. If the Standard had mandated that a round-trip conversion of any pointer must yield one that's usable in the same way as the original, that would require that the compiler omit null traps on any pointers that could have been produced in such a fashion, even if they would far more likely have been produced by round-tripping a null pointer.
Incidentally, from a practical perspective, "modern" compilers, even in -fno-strict-aliasing, will sometimes attempt to track provenance of pointers through pointer-integer-pointer conversions in such a way that pointers produced by casting equal integers may sometimes be assumed incapable of aliasing.
For example, given:
#include <stdint.h>
extern int x[],y[];
int test(void)
{
if (!x[0]) return 999;
uintptr_t upx = (uintptr_t)x;
uintptr_t upy = (uintptr_t)(y+1);
//Consider code with and without the following line
if (upx == upy) upy = upx;
if ((upx ^ ~upy)+1) // Will return if upx != upy
return 123;
int *py = (int*)upy;
*py += 1;
return x[0];
}
In the absence of the marked line, gcc, icc, and clang will all assume--even when using -fno-strict-aliasing, that an operation on *py can't affect *px, even though the only way that code could be reached would be if upx and upy held the same value (meaning that both px and py were produced by casting the same uintptr_t value). Adding the marked line causes icc and clang to recognize that px and py could identify the same object, but gcc assumes that the assignment can be optimized out, even though it should mean that py would be derived from px--a situation a quality compiler should have no trouble recognizing as implying possible aliasing.
I'm not sure what realistic benefit compiler writers expect from their efforts to track provenance of uintptr_t values, given that I don't see much purpose for doing such conversions outside of cases where the results of conversions may be used in "interesting" ways. Given compiler behavior, however, I'm not sure I see any nice way to guarantee that conversions between integers and pointers will behave in a fashion consistent with the values involved.

De-referencing pointer to a volatile int after increment

unsigned int addr = 0x1000;
int temp = *((volatile int *) addr + 3);
Does it treat the incremented pointer (ie addr + 3 * sizeof(int)), as a pointer to volatile int (while dereferencing). In other words can I expect the hardware updated contents of say (0x1012) in temp ?
Yes.
Pointer arithmetic does not affect the type of the pointer, including any type qualifiers. Given an expression of the form A + B, if A has the type qualified pointer to T and B is an integral type, the expression A + B will also be a qualified pointer to T -- same type, same qualifiers.
From 6.5.6.8 of the C spec (draft n1570):
When an expression that has integer type is added to or subtracted from a pointer, the
result has the type of the pointer operand.
Presuming addr is either an integer (variable or constant) with a value your implementation can safely convert to an int * (see below).
Consider
volatile int a[4] = [ 1,2,3,4};
int i = a[3];
This is exactly the same, except for the explicit conversion integer to volatile int * (a pointer to ...). For the index operator, the name of the array decays to a pointer to the first element of a. This is volatile int * (type qualifiers in C apply to the elements of an array, never the array itself).
This is the same as the cast. Leaves 2 differences:
The conversion integer to "pointer". This is implementation defined, thus if your compiler supports it correctly (it should document this), and the value is correct, it is fine.
Finally the access. The underlying object is not volatile, but the pointer/resp. access. This actually is a defect in the standard (see DR476 which requires the object to be volatile, not the access. This is in contrast to the documented intention (read the link) and C++ semantics (which should be identical). Luckily all(most all) implementations generate code as one would expect and perform the access as intended. Note this is a common ideom on embedded systems.
So if the prerequisites are fulfilled, the code is correct. But please see below for better(in terms of maintainability and safety) options.
Notes: A better approach would be to
use uintptr_t to guarantee the integer can hold a pointer, or - better -
#define ARRAY_ADDR ((volatile int *)0x1000)
The latter avoids accidental modification to the integer and states the implications clear. It also can be used easier. It is a typical construct in low-level peripheral register definitions.
Re. your incrementing: addr is not a pointer! Thus you increment an integer, not a pointer. Left apart this is more to type than using a true pointer, it also is error-prone and obfuscates your code. If you need a pointer, use a pointer:
int *p = ARRAY_ADDR + 3;
As a personal note: Everybody passing such code (the one with the integer addr) in a company with at least some quality standards would have a very serious talk with her team leader.
First note that conversions from integers to pointers are not necessarily safe. It is implementation-defined what will happen. In some cases such conversions can even invoke undefined behavior, in case the integer value cannot be represented as a pointer, or in case the pointer ends up with a misaligned address.
It is safer to use the integer type uintptr_t to store pointers and addresses, as it is guaranteed to be able to store a pointer for the given system.
Given that your compiler implements a safe conversion for this code (for example, most embedded systems compilers do), then the code will indeed behave as you expect.
Pointer arithmetic will be done on a type that is volatile int, and therefore + 3 means increase the address by sizeof(volatile int) * 3 bytes. If an int is 4 bytes on your system, you will end up reading the contents of address 0x100C. Not sure where you got 0x1012 from, mixing up decimal and hex notation?

void pointer = int pointer = float pointer

I have a void pointer pointing to a memory address. Then, I do
int pointer = the void pointer
float pointer = the void pointer
and then, dereference them go get the values.
{
int x = 25;
void *p = &x;
int *pi = p;
float *pf = p;
double *pd = p;
printf("x: n%d\n", x);
printf("*p: %d\n", *(int *)p);
printf("*pi: %d\n", *pi);
printf("*pf: %f\n", *pf);
printf("*pd: %f\n", *pd);
return 0;
}
The output of dereferencing pi(int pointer) is 25.
However the output of dereferencing pf(float pointer) is 0.000.
Also dereferncing pd(double pointer) outputs a negative fraction that keeps
changing?
Why is this and is it related to endianness(my CPU is little endian)?
As per C standard, you'er allowed to convert any pointer to void * and convert it back, it'll have the same effect.
To quote C11, chapter §6.3.2.3
[...] A pointer to
any object type may be converted to a pointer to void and back again; the result shall
compare equal to the original pointer.
That is why, when you cast the void pointer to int *, de-reference and print the result, it prints properly.
However, standard does not guarantee that you can dereference that pointer to be of a different data type. It is essentially invoking undefined behaviour.
So, dereferencing pf or pd to get a float or double is undefined behavior, as you're trying to read the memory allocated for an int as a float or double. There's a clear case of mismtach which leads to the UB.
To elaborate, int and float (and double) has different internal representations, so trying to cast a pointer to another type and then an attempt to dereference to get the value in other type won't work.
Related , C11, chapter §6.5.3.3
[...] If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. If an
invalid value has been assigned to the pointer, the behavior of the unary * operator is
undefined.
and for the invalid value part, (emphasis mine)
Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an
address inappropriately aligned for the type of object pointed to, and the address of an object after the
end of its lifetime.
In addition to the answers before, I think that what you were expecting could not be accomplished because of the way the float numbers are represented.
Integers are typically stored in Two's complement way, basically it means that the number is stored as one piece. Floats on the other hand are stored using a different way using a sign, base and exponent, Read here.
So the main idea of convertion is impossible since you try to take a number represented as raw bits (for positive) and look at it as if it was encoded differently, this will result in unexpected results even if the convertion was legit.
So... here's probably what's going on.
However the output of dereferencing pf(float pointer) is 0.000
It's not 0. It's just really tiny.
You have 4-byte integers. Your integer looks like this in memory...
5 0 0 0
00000101 00000000 00000000 00000000
Which interpreted as a float looks like...
sign exponent fraction
0 00001010 0000000 00000000 00000000
+ 2**-117 * 1.0
So, you're outputting a float, but it's incredibly tiny. It's 2^-117, which is virtually indistinguishable from 0.
If you try printing the float with printf("*pf: %e\n", *pf); then it should give you something meaningful, but small. 7.006492e-45
Also dereferncing pd(double pointer) outputs a negative fraction that keeps changing?
Doubles are 8-bytes, but you're only defining 4-bytes. The negative fraction change is the result of looking at uninitialized memory. The value of uninitialized memory is arbitrary and it's normal to see it change with every run.
There are two kinds of UBs going on here:
1) Strict aliasing
What is the strict aliasing rule?
"Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)"
However, strict aliasing can be turned off as a compiler extension, like -fno-strict-aliasing in GCC. In this case, your pf version would function well, although implementation defined, assuming nothing else has gone wrong (usually float and int are both 32 bit types and 32 bit aligned on most computers, usually). If your computer uses IEEE754 single, you can get a very small denorm floating point number, which explains for the result you observe.
Strict aliasing is a controversial feature of recent versions of C (and considered a bug by a lot of people) and makes it very difficult and more hacky than before to do reinterpret cast (aka type punning) in C.
Before you are very aware of type punning and how it behaves with your version of compiler and hardware, you shall avoid doing it.
2) Memory out of bound
Your pointer points to a memory space as large as int, but you dereference it as double, which is usually twice of the size of an int, you are basically reading half a double of garbage from somewhere in the computer, which is why your double keeps changing.
The types int, float, and double have different memory layouts, representations, and interpretations.
On my machine, int is 4 bytes, float is 4 bytes, and double is 8 bytes.
Here is how you explain the results you are seeing.
Derefrencing the int pointer works, obviously, because the original data was an int.
Derefrencing the float pointer, the compiler generates code to interpret the contents of 4 bytes in memory as a float. The value in the 4 bytes, when interpreted as a float, gives you 0.00. Lookup how float is represented in memory.
Derefrencing the double pointer, the compiler generates code to interpret the contents in memory as a double. Because a double is larger than an int, this accesses the 4 bytes of the original int, and an extra 4 bytes on the stack. Because the contents of these extra 4 bytes is dependent on the state of the stack, and is unpredictable from run to run, you see the varying values that correspond to interpreting the entire 8 bytes as a double.
In the following,
printf("x: n%d\n", x); //OK
printf("*p: %d\n", *(int *)p); //OK
printf("*pi: %d\n", *pi); //OK
printf("*pf: %f\n", *pf); // UB
printf("*pd: %f\n", *pd); // UB
The accesses in the first 3 printfs are fine as you are accessing int through the lvalue type of type int. But the next 2 are not fine as the violate 6.5, 7, Expressions.
An int * is not a compatible type with a float * or double *. So the accesses in the last two printf() calls cause undefined behaviour.
C11, $6.5, 7 states:
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.
The term "C" is used to describe two languages: one invented by K&R in which pointers identify physical memory locations, and one which is derived from that which works the same in cases where pointers are either read and written in ways that abide by certain rules, but may behave in arbitrary fashion if they are used in other ways. While the latter language is defined the by the Standards, the former language is what became popular for microcomputer programming in the 1980s.
One of the major impediments to generating efficient machine code from C code is that compilers can't tell what pointers might alias what variables. Thus, any time code accesses a pointer that might point to a given variable, generated code is required to ensure that the contents of the memory identified by the pointer and the contents of the variable match. That can be very expensive. The people writing the C89 Standard decided that compilers should be allowed to assume that named variables (static and automatic) will only be accessed using pointers of their own type or character types; the people writing C99 decided to add additional restrictions for allocated storage as well.
Some compilers offer means by which code can ensure that accesses using different types will go through memory (or at least behave as though they are doing so), but unfortunately I don't think there's any standard for that. C14 added a memory model for use with multi-threading which should be capable of achieving required semantics, but I don't think compilers are required to honor such semantics in cases where they can tell that there's no way for outside threads to access something [even if going through memory would be necessary to achieve correct single-thread semantics].
If you're using gcc and want to have memory semantics that work as K&R intended, use the "-fno-strict-aliasing" command-line option. To make code efficient it will be necessary to make substantial use of the "restrict" qualifier which was added in C99. While the authors of gcc seem to have focused more on type-based aliasing rules than "restrict", the latter should allow more useful optimizations.

Pointer difference across members of a struct?

The C99 standard states that:
When two pointers are subtracted, both shall point to elements of the same array object, or one past the last element of the array object
Consider the following code:
struct test {
int x[5];
char something;
short y[5];
};
...
struct test s = { ... };
char *p = (char *) s.x;
char *q = (char *) s.y;
printf("%td\n", q - p);
This obviously breaks the above rule, since the p and q pointers are pointing to different "array objects", and, according to the rule, the q - p difference is undefined.
But in practice, why should such a thing ever result in undefined behaviour? After all, the struct members are laid out sequentially (just as array elements are), with any potential padding between the members. True, the amount of padding will vary across implementations and that would affect the outcome of the calculations, but why should that result be "undefined"?
My question is, can we suppose that the standard is just "ignorant" of this issue, or is there a good reason for not broadening this rule? Couldn't the above rule be rephrased to "both shall point to elements of the same array object or members of the same struct"?
My only suspicion are segmented memory architectures where the members might end up in different segments. Is that the case?
I also suspect that this is the reason why GCC defines its own __builtin_offsetof, in order to have a "standards compliant" definition of the offsetof macro.
EDIT:
As already pointed out, arithmetic on void pointers is not allowed by the standard. It is a GNU extension that fires a warning only when GCC is passed -std=c99 -pedantic. I'm replacing the void * pointers with char * pointers.
Subtraction and relational operators (on type char*) between addresses of member of the same struct are well defined.
Any object can be treated as an array of unsigned char.
Quoting N1570 6.2.6.1 paragraph 4:
Values stored in non-bit-field objects of any other object type
consist of n × CHAR_BIT bits, where n is the size of an object of that
type, in bytes. The value may be copied into an object of type
unsigned char [ n ] (e.g., by memcpy); the resulting set of bytes is
called the object representation of the value.
...
My only suspicion are segmented memory architectures where the members
might end up in different segments. Is that the case?
No. For a system with a segmented memory architecture, normally the compiler will impose a restriction that each object must fit into a single segment. Or it can permit objects that occupy multiple segments, but it still has to ensure that pointer arithmetic and comparisons work correctly.
Pointer arithmetic requires that the two pointers being added or subtracted to be part of the same object because it doesn't make sense otherwise.
The quoted section of standard specifically refers to two unrelated objects such as int a[b]; and int b[5]. The pointer arithmetic requires to know the type of the object that the pointers pointing to (I am sure you are aware of this already).
i.e.
int a[5];
int *p = &a[1]+1;
Here p is calculated by knowing the that the &a[1] refers to an int object and hence incremented to 4 bytes (assuming sizeof(int) is 4).
Coming to the struct example, I don't think it can possibly be defined in a way to make pointer arithmetic between struct members legal.
Let's take the example,
struct test {
int x[5];
char something;
short y[5];
};
Pointer arithmatic is not allowed with void pointers by C standard (Compiling with gcc -Wall -pedantic test.c would catch that). I think you are using gcc which assumes void* is similar to char* and allows it.
So,
printf("%zu\n", q - p);
is equivalent to
printf("%zu", (char*)q - (char*)p);
as pointer arithmetic is well defined if the pointers point to within the same object and are character pointers (char* or unsigned char*).
Using correct types, it would be:
struct test s = { ... };
int *p = s.x;
short *q = s.y;
printf("%td\n", q - p);
Now, how can q-p be performed? based on sizeof(int) or sizeof(short) ? How can the size of char something; that's in the middle of these two arrays be calculated?
That should explain it's not possible to perform pointer arithmetic on objects of different types.
Even if all members are of same type (thus no type issue as stated above), then it's better to use the standard macro offsetof (from <stddef.h>) to get the difference between struct members which has the similar effect as pointer arithmetic between members:
printf("%zu\n", offsetof(struct test, y) - offsetof(struct test, x));
So I see no necessity to define pointer arithmetic between struct members by the C standard.
Yes, you are allowed to perform pointer arithmetric on structure bytes:
N1570 - 6.3.2.3 Pointers p7:
... When a pointer to an object is converted to a pointer to a character type,
the result points to the lowest addressed byte of the object. Successive increments of the
result, up to the size of the object, yield pointers to the remaining bytes of the object.
This means that for the programmer, bytes of the stucture shall be seen as a continuous area, regardless how it may have been implemented in the hardware.
Not with void* pointers though, that is non-standard compiler extension. As mentioned on paragraph from the standard, it applies only to character type pointers.
Edit:
As mafso pointed out in comments, above is only true as long as type of substraction result ptrdiff_t, has enough range for the result. Since range of size_t can be larger than ptrdiff_t, and if structure is big enough, it's possible that addresses are too far apart.
Because of this it's preferable to use offsetof macro on structure members and calculate result from those.
I believe the answer to this question is simpler than it appears, the OP asks:
but why should that result be "undefined"?
Well, let's see that the definition of undefined behavior is in the draft C99 standard section 3.4.3:
behavior, upon use of a nonportable or erroneous program construct or
of erroneous data, for which this International Standard imposes no
requirements
it is simply behavior for which the standard does not impose a requirement, which perfectly fits this situation, the results are going to vary depending on the architecture and attempting to specify the results would have probably been difficult if not impossible in a portable manner. This leaves the question, why would they choose undefined behavior as opposed to let's say implementation of unspecified behavior?
Most likely it was made undefined behavior to limit the number of ways an invalid pointer could be created, this is consistent with the fact that we are provided with offsetof to remove the one potential need for pointer subtraction of unrelated objects.
Although the standard does not really define the term invalid pointer, we get a good description in Rationale for International Standard—Programming Languages—C which in section 6.3.2.3 Pointers says (emphasis mine):
Implicit in the Standard is the notion of invalid pointers. In
discussing pointers, the Standard typically refers to “a pointer to an
object” or “a pointer to a function” or “a null pointer.” A special
case in address arithmetic allows for a pointer to just past the end
of an array. Any other pointer is invalid.
The C99 rationale further adds:
Regardless how an invalid pointer is created, any use of it yields
undefined behavior. Even assignment, comparison with a null pointer
constant, or comparison with itself, might on some systems result in
an exception.
This strongly suggests to us that a pointer to padding would be an invalid pointer, although it is difficult to prove that padding is not an object, the definition of object says:
region of data storage in the execution environment, the contents of
which can represent values
and notes:
When referenced, an object may be interpreted as having a particular
type; see 6.3.2.1.
I don't see how we can reason about the type or the value of padding between elements of a struct and therefore they are not objects or at least is strongly indicates padding is not meant to be considered an object.
I should point out the following:
from the C99 standard, section 6.7.2.1:
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared. A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
It isn't so much that the result of pointer subtraction between members is undefined so much as it is unreliable (i.e. not guaranteed to be the same between different instances of the same struct type when the same arithmetic is applied).

Resources