Storing Pointers difference in integers?

Storing Pointers difference in integers? - c

This is the code:
#include<stdio.h>
#include<conio.h>
int main()
{
int *p1,*p2;
int m=2,n=3;
m=p2-p1;
printf("\np2=%u",p2);
printf("\np1=%u",p1);
printf("\nm=%d",m);
getch();
return 0;
}
This gives the output as:
p2= 2686792
p1= 1993645620
m= -497739707
I have two doubts with the code and output:
Since 'm' is an int, it shouldn't take p2-p1 as an input since p1 and p2 both are pointers and m is an integer it should give an error like "invalid conversion from 'int' to 'int' " but it isn't. why?
Even after it takes the input, the difference isn't valid. Why is it?

Since 'm' is an int, it shouldn't take p2-p1 as an input since p1 and
p2 both are pointers and m is an integer it should give an error like
"invalid conversion from 'int' to 'int' " but it isn't. why?
This type of error or warning is depends on the compiler you are using. C compilers often times give programmers plenty of rope to hang themselves with...
Even after it takes the input, the difference isn't valid. Why is it?
Actually, the difference is correct! It is using pointer arithmetic to perform the calculation. So for this example..
p2= 2686792
p1= 1993645620
Since the pointers are not initialized, they are assigned some garbage values like the ones above. Now, you want to perform the operation p2 - p1, i.e. you are asking for the memory address that comes exactly p1 memory blocks before p2. Since p1 and p2 are pointers to integers, the size of a memory block is sizeof(int) (almost always 4 bytes). Therefore:
p2 - p1 = (2686792 - 1993645620) / sizeof(int) = (2686792 - 1993645620) / 4 = -497739707

Since 'm' is an int, it shouldn't take p2-p1 as an input since p1 and p2 both are pointers and m is an integer it should give an error like "invalid conversion from 'int' to 'int' " but it isn't. why?
The C++ spec declares this to be legal.
From the C++11 spec 5.7.6:
When two pointers to elements of the same array object are subtracted, the result is the difference of the subscripts of the two array elements. The type of the result is an implementation-defined signed integral type; this type shall be the same type that is defined as std::ptrdiff_t in the header (18.2).
... later in that paragraph...
Unless both pointers point to elements of the same array object, or
one past the last element of the array object, the behavior is undefined.
The result of p2 - p1 is of the same type as std::ptrdiff_t, but nothing says the compiler cannot have defined it as
namspace std {
typedef ptrdiff_t int;
}
However, you also do not get a guarentee that this will work on all platforms. For example, some platforms (especially 64-bit ones) will use a long for ptrdiff_t. On those platforms your code would fail to compile because it depended on an implementation defined type for ptrdiff_t.
As for your second question, the wording in the C++ spec 5.7.6 suggests why they work the way they work. The wording indicates that the writers of the language wanted pointer arithmetic to support quick arithmetic while working ones way through an array. Accordingly, the defined the result of pointer subtraction to yield convenient results when used in the context of manipulating an array. You could build a language where the difference between two pointers is the difference between their memory addresses in bytes, and you would have a consistent working language. However, a good language always gets the best bang for their buck. The authors felt clean manipulation of arrays was more valuable than being able to get the byte-difference. For example:
double* findADouble(double* begin, double* end, double valueToSearchFor)
{
for (double* iter = begin; iter != end; iter++) {
if (*iter == valueToSearchFor)
return iter;
}
return 0;
}
Would have to have a sizeof in it, and we would have to use += instead of ++.
double* findADouble(double* begin, double* end, double valueToSearchFor)
{
for (double* iter = begin; iter != end; iter += sizeof(double)) {
if (*iter == valueToSearchFor)
return iter;
}
return 0;
}
Also worth noting in their decision: when the rule was created in C, optimizing compilers were not very good at their job. iter += sizeof(double) could be compiled down to a much less efficient assembly code than ++iter, even if the two operations are fundamentally doing the same thing. Modern optimizers have no trouble with this, but the syntax stays.

Related

Does casting actually DO anything?

Consider the following snippet:
char x[100];
double *p = &x;
As expected, this yields this warning:
f.c:3:15: warning: initialization of ‘double *’ from incompatible pointer type ‘char (*)[100]’
[-Wincompatible-pointer-types]
3 | double *p = &x;
| ^
This is very easy to solve by just changing to
double *p = (double*)&x;
My question here is, does the casting actually DO anything? Would the code be invalid without the cast? Or is it just a way to make the compiler quiet? When is casting necessary?
I know that you can have some effect with snippets like this:
int x = 666;
int y = (char)x;
But isn't this the same as this?
int x = 666;
char c = x;
int y = c;
If it is the same, then the casting does something, but it's not necessary. Right?
Please help me understand this.

Casting can do several different things. As other answers have mentioned, it almost always changes the type of the value being cast (or, perhaps, an attribute of the type, such as const). It may also change the numeric value in some way. But there are many possible interpretations:
Sometimes it merely silences a warning, performing no "real" conversion at all (as in many pointer casts).
Sometimes it silences a warning, leaving only a type change but no value change (as in other pointer casts).
Sometimes the type change, although it involves no obvious value change, implies very different semantics for use of the value later (again, as in many pointer casts).
Sometimes it requests a conversion which is meaningless or impossible.
Sometimes it performs a conversion that the compiler would have performed by itself (with or without a warning).
But sometimes it forces a conversion that the compiler wouldn't have performed.
Also, sometimes those warnings that the compiler tried to make, that a cast silences, are innocuous and/or a nuisance, but sometimes they're quite real, and the code is likely to fail (that is, as the silenced warning was trying to tell you).
For some more specific examples:
A pointer cast that changes the type, but not the value:
char *p1 = ... ;
const char *p2 = (const char *)p;
And another:
unsigned char *p3 = (unsigned char *)p;
A pointer cast that changes the type in a more significant way, but that's guaranteed to be okay (on some architectures this might also change the value):
int i;
int *ip = &i;
char *p = (char *)ip;
A similarly significant pointer cast, but one that's quite likely to be not okay:
char c;
char *cp = &c;
int *ip = (int *)cp;
*ip = 5; /* likely to fail */
A pointer cast that's so meaningless that the compiler refuses to perform it, even with an explicit cast:
float f = 3.14;
char *p = (char)f; /* guaranteed to fail */
A pointer cast that makes a conversion, but one that the compiler would have made anyway:
int *p = (int *)malloc(sizeof(int));
(This one is considered a bad idea, because in the case where you forget to include <stdlib.h> to declare malloc(), the cast can silence a warning that might alert you to the problem.)
Three casts from an integer to a pointer, that are actually well-defined, due to a very specific special case in the C language:
void *p1 = (void *)0;
char *p2 = (void *)0;
int *p3 = (int *)0;
Two casts from integer to pointer that are not necessarily valid, although the compiler will generally do something obvious, and the cast will silence the otherwise warning:
int i = 123;
char *p1 = (char *)i;
char *p2 = (char *)124;
*p1 = 5; /* very likely to fail, except when */
*p2 = 7; /* doing embedded or OS programming */
A very questionable cast from a pointer back to an int:
char *p = ... ;
int i = (int)p;
A less-questionable cast from a pointer back to an integer that ought to be big enough:
char *p = ... ;
uintptr_t i = (uintptr_t)p;
A cast that changes the type, but "throws away" rather than "converting" a value, and that silences a warning:
(void)5;
A cast that makes a numeric conversion, but one that the compiler would have made anyway:
float f = (float)0;
A cast that changes the type and the interpreted value, although it typically won't change the bit pattern:
short int si = -32760;
unsigned short us = (unsigned short)si;
A cast that makes a numeric conversion, but one that the compiler probably would have warned about:
int i = (int)1.5;
A cast that makes a conversion that the compiler would not have made:
double third = (double)1 / 3;
The bottom line is that casts definitely do things: some of them useful, some of them unnecessary but innocuous, some of them dangerous.
These days, the consensus among many C programmers is that most casts are or should be unnecessary, meaning that it's a decent rule to avoid explicit casts unless you're sure you know what you're doing, and it's reasonable to be suspicious of explicit casts you find in someone else's code, since they're likely to be a sign of trouble.
As one final example, this was the case that, back in the day, really made the light bulb go on for me with respect to pointer casts:
char *loc;
int val;
int size;
/* ... */
switch(size) {
case 1: *loc += val; break;
case 2: *(int16_t *)loc += val; break;
case 4: *(int32_t *)loc += val; break;
}
Those three instances of loc += val do three pretty different things: one updates a byte, one updates a 16-bit int, and one updates a 32-bit int. (The code in question was a dynamic linker, performing symbol relocation.)

The cast does at least 1 thing - it satisfies the following constraint on assignment:
6.5.16.1 Simple assignment
Constraints
1 One of the following shall hold:112)
...
— the left operand has atomic, qualified, or unqualified pointer type, and (considering
the type the left operand would have after lvalue conversion) both operands are
pointers to qualified or unqualified versions of compatible types, and the type pointed
to by the left has all the qualifiers of the type pointed to by the right;
112) The asymmetric appearance of these constraints with respect to type qualifiers is due to the conversion
(specified in 6.3.2.1) that changes lvalues to ‘‘the value of the expression’’ and thus removes any type
qualifiers that were applied to the type category of the expression (for example, it removes const but
not volatile from the type int volatile * const).
That's a compile-time constraint - it affects whether or not the source code is translated to an executable, but it doesn't necessarily affect the translated machine code.
It may result in an actual conversion being performed at runtime, but that depends on the types involved in the expression and the host system.

Casting changes the type, which can be very important when signed or unsigned type matters,
For example, character handling functions such as isupper() are defined as taking an unsigned char value or EOF:
The header <ctype.h> declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.
Thus code such as
int isNumber( const char *input )
{
while ( *input )
{
if ( !isdigit( *input ) )
{
return( 0 );
}
input++;
}
// all digits
return( 1 );
}
should properly cast the const char value of *input to unsigned char:
int isNumber( const char *input )
{
while ( *input )
{
if ( !isdigit( ( unsigned char ) *input ) )
{
return( 0 );
}
input++;
}
// all digits
return( 1 );
}
Without the cast to unsigned char, when *input is promoted to int, an char value (assuming char is signed and smaller than int) that is negative will be sign-extended to a negative value that can not be represented as an unsigned char value and therefore invoke undefined behavior.
So yes, the cast in this case does something. It changes the type and therefore - on almost all current systems - avoids undefined behavior for input char values that are negative.
There are also cases where float values can be cast to double (or the reverse) to force code to behave in a desired manner.*
* - I've seen such cases recently - if someone can find an example, feel free to add your own answer...

The cast may or may not change the actual binary value. But that is not its main purpose, just a side effect.
It tells the compiler to interpret a value as a value of a different type. Any changing of binary value is a side effect of that.
It is for you (the programmer) to let the compiler know: I know what I'm doing. So you can shoot yourself in the foot without the compiler questioning you.
Don't get me wrong, cast are absolutely necessary in real world code, but they must be used with care and knowledge. Never cast just to get rid of a warning, make sure you understand the consequences.

It is theoretically possible that a system would use a different representation for a void * and a char * than for some other pointer type. The possibility exists that there could be a system that normally uses a narrow width register to hold pointer values. But, the narrow width may be insufficient if the code needed to address every single byte, and so a void * or char * would use a wider representation.
One case where casting a pointer value is useful is when the function takes a variable number of pointer arguments, and is terminated by NULL, such as with the execl() family of functions.
execl("/bin/sh", "sh", "-c", "echo Hello world!", (char *)NULL);
Without the cast, the NULL may expand to 0, which would be treated as an int argument. When the execl() function retrieves the last parameter, it may extract the expected pointer value incorrectly, since an int value was passed.

My question here is, does the casting actually DO anything?
Yes. It tells the compiler, and also other programmers including the future you, that you think you know what you're doing and you really do intend to treat a char as an int or whatever. It may or may not change the compiled code.
Would the code be invalid without the cast?
That depends on the cast in question. One example that jumps to mind involves division:
int a = 3;
int b = 5;
float c = a / b;
Questions about this sort of thing come up all the time on SO: people wonder why c gets a value of 0. The answer, of course, is that both a and b are int, and the result of integer division is also an integer that's only converted to a float upon assignment to c. To get the expected value of 0.6 in c, cast a or b to float:
float c = a / (float)b;
You might not consider the code without the cast to be invalid, but the next computation might involve division by c, at which point a division by zero error could occur without the cast above.
Or is it just a way to make the compiler quiet?
Even if the cast is a no-op in terms of changing the compiled code, preventing the compiler from complaining about a type mismatch is doing something.
When is casting necessary?
It's necessary when it changes the object code that the compiler generates. It might also be necessary if your organization's coding standards require it.

An example where cast makes a difference.
int main(void)
{
unsigned long long x = 1 << 33,y = (unsigned long long)1 << 33;
printf("%llx, %llx\n", x, y);
}
https://godbolt.org/z/b3qcPn

A cast is simply a type conversion: The implementation will represent the argument value by means of the target type. The expression of the new type (let's assume the target type is different) may have
a different size
a different value
and/or a different bit pattern representing the value.
These three changes are orthogonal to each other. Any subset, including none and all of them, can occur (all examples assume two's bit complement):
None of them: (unsigned int)1;
Size only: (char)1
Value only: (unsigned int)-1
Bit pattern only: (float)1 (my machine has sizeof(int) == sizeof(float))
Size and value, but not bit pattern (the bits present in the original value): (unsigned int)(char)-4
Size and bit pattern, but not value: (float)1l
value and bit pattern, but not size: (float)1234567890) (32 bit ints and floats)
All of them: (float)1234567890l (long is 64 bits).
The new type may, of course, influence expressions in which it is used, and will often have different text representations (e.g. via printf), but that's not really surprising.
Pointer conversions may deserve a little more discussion: The new pointer typically has the same value, size and bit representation (although, as Eric Postpischli correctly pointed out, it theoretically may not). The main intent and effect of a pointer cast is not to change the pointer; it is to change the semantics of the memory accessed through it.
In some cases a cast is the only means to perform a conversion (non-compatible pointer types, pointer vs. integer).
In other cases like narrowing arithmetic conversions (which may lose information or produce an overflow) a cast indicates intent and thus silences warnings. In all cases where a conversion could also be implicitly performed the cast does not alter the program's behavior — it is optional decoration.

What treatment can a pointer undergo and still be valid?

Which of the following ways of treating and trying to recover a C pointer, is guaranteed to be valid?
1) Cast to void pointer and back
int f(int *a) {
void *b = a;
a = b;
return *a;
}
2) Cast to appropriately sized integer and back
int f(int *a) {
uintptr_t b = a;
a = (int *)b;
return *a;
}
3) A couple of trivial integer operations
int f(int *a) {
uintptr_t b = a;
b += 99;
b -= 99;
a = (int *)b;
return *a;
}
4) Integer operations nontrivial enough to obscure provenance, but which will nonetheless leave the value unchanged
int f(int *a) {
uintptr_t b = a;
char s[32];
// assume %lu is suitable
sprintf(s, "%lu", b);
b = strtoul(s);
a = (int *)b;
return *a;
}
5) More indirect integer operations that will leave the value unchanged
int f(int *a) {
uintptr_t b = a;
for (uintptr_t i = 0;; i++)
if (i == b) {
a = (int *)i;
return *a;
}
}
Obviously case 1 is valid, and case 2 surely must be also. On the other hand, I came across a post by Chris Lattner - which I unfortunately can't find now - saying something similar to case 5 is not valid, that the standard licenses the compiler to just compile it to an infinite loop. Yet each case looks like an unobjectionable extension of the previous one.
Where is the line drawn between a valid case and an invalid one?
Added based on discussion in comments: while I still can't find the post that inspired case 5, I don't remember what type of pointer was involved; in particular, it might have been a function pointer, which might be why that case demonstrated invalid code whereas my case 5 is valid code.
Second addition: okay, here's another source that says there is a problem, and this one I do have a link to. https://www.cl.cam.ac.uk/~pes20/cerberus/notes30.pdf - the discussion about pointer provenance - says, and backs up with evidence, that no, if the compiler loses track of where a pointer came from, it's undefined behavior.

According to the C11 draft standard:
Example 1
Valid, by §6.5.16.1, even without an explicit cast.
Example 2
The intptr_t and uintptr_t types are optional. Assigning a pointer to an integer requires an explicit cast (§6.5.16.1), although gcc and clang will only warn you if you don’t have one. With those caveats, the round-trip conversion is valid by §7.20.1.4. ETA: John Bellinger brings up that the behavior is only specified when you do an intermediate cast to void* both ways. However, both gcc and clang allow the direct conversion as a documented extension.
Example 3
Safe, but only because you’re using unsigned arithmetic, which cannot overflow, and are therefore guaranteed to get the same object representation back. An intptr_t could overflow! If you want to do pointer arithmetic safely, you can convert any kind of pointer to char* and then add or subtract offsets within the same structure or array. Remember, sizeof(char) is always 1. ETA: The standard guarantees that the two pointers compare equal, but your link to Chisnall et al. gives examples where compilers nevertheless assume the two pointers do not alias each other.
Example 4
Always, always, always check for buffer overflows whenever you read from and especially whenever you write to a buffer! If you can mathematically prove that overflow cannot happen by static analysis? Then write out the assumptions that justify that, explicitly, and assert() or static_assert() that they haven’t changed. Use snprintf(), not the deprecated, unsafe sprintf()! If you remember nothing else from this answer, remember that!
To be absolutely pedantic, the portable way to do this would be to use the format specifiers in <inttypes.h> and define the buffer length in terms of the maximum value of any pointer representation. In the real world, you would print pointers out with the %p format.
The answer to the question you intended to ask is yes, though: all that matters is that you get back the same object representation. Here’s a less contrived example:
#include <assert.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
int i = 1;
const uintptr_t u = (uintptr_t)(void*)&i;
uintptr_t v;
memcpy( &v, &u, sizeof(v) );
int* const p = (int*)(void*)v;
assert(p == &i);
*p = 2;
printf( "%d = %d.\n", i, *p );
return EXIT_SUCCESS;
}
All that matter are the bits in your object representation. This code also follows the strict aliasing rules in §6.5. It compiles and runs fine on the compilers that gave Chisnall et al trouble.
Example 5
This works, same as above.
One extremely pedantic footnote that will never ever be relevant to your coding: some obsolete esoteric hardware has one’s-complement or sign-and-magnitude representation of signed integers, and on these, there might be a distinct value of negative zero that might or might not trap. On some CPUs, this might be a valid pointer or null pointer representation distinct from positive zero. And on some CPUs, positive and negative zero might compare equal.
PS
The standard says:
Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.
Furthermore, if the two array objects are consecutive rows of the same multidimensional array, one past the end of the first row is a valid pointer to the start of the next row. Therefore, even a pathological implementation that deliberately sets out to cause as many bugs as the standard allows could only do so if your manipulated pointer compares equal to the address of an array object, in which case the implementation might in theory decide to interpret it as one-past-the-end of some other array object instead.
The intended behavior is clearly that the pointer comparing equal both to &array1+1 and to &array2 is equivalent to both: it means to let you compare it to addresses within array1 or dereference it to get array2[0]. However, the Standard does not actually say that.
PPS
The standards committee has addressed some of these issues and proposes that the C standard explicitly add language about pointer provenance. This would nail down whether a conforming implementation is allowed to assume that a pointer created by bit manipulation does not alias another pointer.
Specifically, the proposed corrigendum would introduce pointer provenance and allow pointers with different provenance not to compare equal. It would also introduce a -fno-provenance option, which would guarantee that any two pointers compare equal if and only if they have the same numeric address. (As discussed above, two object pointers that compare equal alias each other.)

1) Cast to void pointer and back
This yields a valid pointer equal to the original. Paragraph 6.3.2.3/1 of the standard is clear on this:
A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.
2) Cast to appropriately sized integer and back
3) A couple of trivial integer operations
4) Integer operations nontrivial enough to obscure provenance, but which will nonetheless leave the value unchanged
5) More indirect integer operations that will leave the value unchanged
[...] Obviously case 1 is valid, and case 2 surely must be also. On the other hand, I came across a post by Chris Lattner - which I unfortunately can't find now - saying case 5 is not valid, that the standard licenses the compiler to just compile it to an infinite loop.
C does require a cast when converting either way between pointers and integers, and you have omitted some of those in your example code. In that sense your examples (2) - (5) are all non-conforming, but for the rest of this answer I'll pretend the needed casts are there.
Still, being very pedantic, all of these examples have implementation-defined behavior, so they are not strictly conforming. On the other hand, "implementation-defined" behavior is still defined behavior; whether that means your code is "valid" or not depends on what you mean by that term. In any event, what code the compiler might emit for any of the examples is a separate matter.
These are the relevant provisions of the standard from section 6.3.2.3 (emphasis added):
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.
The definition of uintptr_t is also relevant to your particular example code. The standard describes it this way (C2011, 7.20.1.4/1; emphasis added):
an unsigned integer type with the property that any valid pointer to void can be converted to this type, then converted back to pointer to void, and the result will compare equal to the original pointer.
You are converting back and forth between int * and uintptr_t. int * is not void *, so 7.20.1.4/1 does not apply to these conversions, and the behavior is implementation-defined per section 6.3.2.3.
Suppose, however, that you convert back and forth via an intermediate void *:
uintptr_t b = (uintptr_t)(void *)a;
a = (int *)(void *)b;
On an implementation that provides uintptr_t (which is optional), that would make your examples (2 - 5) all strictly conforming. In that case, the result of the integer-to-pointer conversions depends only on the value of the uintptr_t object, not on how that value was obtained.
As for the claims you attribute to Chris Lattner, they are substantially incorrect. If you have represented them accurately, then perhaps they reflect a confusion between implementation-defined behavior and undefined behavior. If the code exhibited undefined behavior then the claim might hold some water, but that is not, in fact, the case.
Regardless of how its value was obtained, b has a definite value of type uintptr_t, and the loop must eventually increment i to that value, at which point the if block will run. In principle, the implementation-defined behavior of a conversion from uintptr_t directly to int * could be something crazy such as skipping the next statement (thus causing an infinite loop), but such behavior is entirely implausible. Every implementation you ever meet will either fail at that point, or store some value in variable a, and then, if it didn't crash, it will execute the return statement.

Because different application fields require the ability to manipulate pointers in different ways, and because the best implementations for some purposes may be totally unsuitable for some others, the C Standard treats the support (or lack thereof) for various kinds of manipulations as a Quality of Implementation issue. In general, people writing an implementation for a particular application field should be more familiar than the authors of the Standard with what features would be useful to programmers in that field, and people making a bona fide effort to produce quality implementations suitable for writing applications in that field will support such features whether the Standard requires them to or not.
In the pre-Standard language invented by Dennis Ritchie, all pointers of a particular type that were identified the same address were equivalent. If any sequence of operations on a pointer would end up producing another pointer of the same type that identified the same address, that pointer would--essentially by definition--be equivalent to the first. The C Standard specifies some situations, however, where pointers can identify the same location in storage and be indistinguishable from each other without being equivalent. For example, given:
int foo[2][4] = {0};
int *p = foo[0]+4, *q=foo[1];
both p and q will compare equal to each other, and to foo[0]+4, and to foo[1]. On the other hand, despite the fact that evaluation of p[-1] and q[0] would have defined behavior, evaluation of p[0] or q[-1] would invoke UB. Unfortunately, while the Standard makes clear that p and q would not be equivalent, it does nothing to clarify whether performing various sequences of operations on e.g. p will yield a pointer that is usable in all cases where p was usable, a pointer that's usable in all cases where either p or q would be usable, a pointer that's only usable in cases where q would be usable, or a pointer that's only usable in cases where both p and q would be usable.
Quality implementations intended for low-level programming should generally process pointer manipulations other than those involving restrict pointers in a way that would yield a pointer that would be usable in any case where a pointer that compares equal to it could usable. Unfortunately, the Standard provides no means by which a program can determine if it's being processed by a quality implementations that is suitable for low-level programming and refuse to run if it isn't, so most forms of systems programming must rely upon quality implementations processing certain actions in documented manner characteristic of the environment even when the Standard would impose no requirements.
Incidentally, even if the normal constructs for manipulating pointers wouldn't have any way of creating pointers where the principles of equivalence shouldn't apply, some platforms may define ways of creating "interesting" pointers. For example, if an implementation that would normally trap operations on null pointers were run on in an environment where it might sometimes be necessary to access an object at address zero, it might define a special syntax for creating a pointer that could be used to access any address, including zero, within the context where it was created. A "legitimate pointer to address zero" would likely compare equal to a null pointer (even though they're not equivalent), but performing a round-trip conversion to another type and back would likely convert what had been a legitimate pointer to address zero into a null pointer. If the Standard had mandated that a round-trip conversion of any pointer must yield one that's usable in the same way as the original, that would require that the compiler omit null traps on any pointers that could have been produced in such a fashion, even if they would far more likely have been produced by round-tripping a null pointer.
Incidentally, from a practical perspective, "modern" compilers, even in -fno-strict-aliasing, will sometimes attempt to track provenance of pointers through pointer-integer-pointer conversions in such a way that pointers produced by casting equal integers may sometimes be assumed incapable of aliasing.
For example, given:
#include <stdint.h>
extern int x[],y[];
int test(void)
{
if (!x[0]) return 999;
uintptr_t upx = (uintptr_t)x;
uintptr_t upy = (uintptr_t)(y+1);
//Consider code with and without the following line
if (upx == upy) upy = upx;
if ((upx ^ ~upy)+1) // Will return if upx != upy
return 123;
int *py = (int*)upy;
*py += 1;
return x[0];
}
In the absence of the marked line, gcc, icc, and clang will all assume--even when using -fno-strict-aliasing, that an operation on *py can't affect *px, even though the only way that code could be reached would be if upx and upy held the same value (meaning that both px and py were produced by casting the same uintptr_t value). Adding the marked line causes icc and clang to recognize that px and py could identify the same object, but gcc assumes that the assignment can be optimized out, even though it should mean that py would be derived from px--a situation a quality compiler should have no trouble recognizing as implying possible aliasing.
I'm not sure what realistic benefit compiler writers expect from their efforts to track provenance of uintptr_t values, given that I don't see much purpose for doing such conversions outside of cases where the results of conversions may be used in "interesting" ways. Given compiler behavior, however, I'm not sure I see any nice way to guarantee that conversions between integers and pointers will behave in a fashion consistent with the values involved.

void pointer = int pointer = float pointer

I have a void pointer pointing to a memory address. Then, I do
int pointer = the void pointer
float pointer = the void pointer
and then, dereference them go get the values.
{
int x = 25;
void *p = &x;
int *pi = p;
float *pf = p;
double *pd = p;
printf("x: n%d\n", x);
printf("*p: %d\n", *(int *)p);
printf("*pi: %d\n", *pi);
printf("*pf: %f\n", *pf);
printf("*pd: %f\n", *pd);
return 0;
}
The output of dereferencing pi(int pointer) is 25.
However the output of dereferencing pf(float pointer) is 0.000.
Also dereferncing pd(double pointer) outputs a negative fraction that keeps
changing?
Why is this and is it related to endianness(my CPU is little endian)?

As per C standard, you'er allowed to convert any pointer to void * and convert it back, it'll have the same effect.
To quote C11, chapter §6.3.2.3
[...] A pointer to
any object type may be converted to a pointer to void and back again; the result shall
compare equal to the original pointer.
That is why, when you cast the void pointer to int *, de-reference and print the result, it prints properly.
However, standard does not guarantee that you can dereference that pointer to be of a different data type. It is essentially invoking undefined behaviour.
So, dereferencing pf or pd to get a float or double is undefined behavior, as you're trying to read the memory allocated for an int as a float or double. There's a clear case of mismtach which leads to the UB.
To elaborate, int and float (and double) has different internal representations, so trying to cast a pointer to another type and then an attempt to dereference to get the value in other type won't work.
Related , C11, chapter §6.5.3.3
[...] If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. If an
invalid value has been assigned to the pointer, the behavior of the unary * operator is
undefined.
and for the invalid value part, (emphasis mine)
Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an
address inappropriately aligned for the type of object pointed to, and the address of an object after the
end of its lifetime.

In addition to the answers before, I think that what you were expecting could not be accomplished because of the way the float numbers are represented.
Integers are typically stored in Two's complement way, basically it means that the number is stored as one piece. Floats on the other hand are stored using a different way using a sign, base and exponent, Read here.
So the main idea of convertion is impossible since you try to take a number represented as raw bits (for positive) and look at it as if it was encoded differently, this will result in unexpected results even if the convertion was legit.

So... here's probably what's going on.
However the output of dereferencing pf(float pointer) is 0.000
It's not 0. It's just really tiny.
You have 4-byte integers. Your integer looks like this in memory...
5 0 0 0
00000101 00000000 00000000 00000000
Which interpreted as a float looks like...
sign exponent fraction
0 00001010 0000000 00000000 00000000
+ 2**-117 * 1.0
So, you're outputting a float, but it's incredibly tiny. It's 2^-117, which is virtually indistinguishable from 0.
If you try printing the float with printf("*pf: %e\n", *pf); then it should give you something meaningful, but small. 7.006492e-45
Also dereferncing pd(double pointer) outputs a negative fraction that keeps changing?
Doubles are 8-bytes, but you're only defining 4-bytes. The negative fraction change is the result of looking at uninitialized memory. The value of uninitialized memory is arbitrary and it's normal to see it change with every run.

There are two kinds of UBs going on here:
1) Strict aliasing
What is the strict aliasing rule?
"Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias each other.)"
However, strict aliasing can be turned off as a compiler extension, like -fno-strict-aliasing in GCC. In this case, your pf version would function well, although implementation defined, assuming nothing else has gone wrong (usually float and int are both 32 bit types and 32 bit aligned on most computers, usually). If your computer uses IEEE754 single, you can get a very small denorm floating point number, which explains for the result you observe.
Strict aliasing is a controversial feature of recent versions of C (and considered a bug by a lot of people) and makes it very difficult and more hacky than before to do reinterpret cast (aka type punning) in C.
Before you are very aware of type punning and how it behaves with your version of compiler and hardware, you shall avoid doing it.
2) Memory out of bound
Your pointer points to a memory space as large as int, but you dereference it as double, which is usually twice of the size of an int, you are basically reading half a double of garbage from somewhere in the computer, which is why your double keeps changing.

The types int, float, and double have different memory layouts, representations, and interpretations.
On my machine, int is 4 bytes, float is 4 bytes, and double is 8 bytes.
Here is how you explain the results you are seeing.
Derefrencing the int pointer works, obviously, because the original data was an int.
Derefrencing the float pointer, the compiler generates code to interpret the contents of 4 bytes in memory as a float. The value in the 4 bytes, when interpreted as a float, gives you 0.00. Lookup how float is represented in memory.
Derefrencing the double pointer, the compiler generates code to interpret the contents in memory as a double. Because a double is larger than an int, this accesses the 4 bytes of the original int, and an extra 4 bytes on the stack. Because the contents of these extra 4 bytes is dependent on the state of the stack, and is unpredictable from run to run, you see the varying values that correspond to interpreting the entire 8 bytes as a double.

In the following,
printf("x: n%d\n", x); //OK
printf("*p: %d\n", *(int *)p); //OK
printf("*pi: %d\n", *pi); //OK
printf("*pf: %f\n", *pf); // UB
printf("*pd: %f\n", *pd); // UB
The accesses in the first 3 printfs are fine as you are accessing int through the lvalue type of type int. But the next 2 are not fine as the violate 6.5, 7, Expressions.
An int * is not a compatible type with a float * or double *. So the accesses in the last two printf() calls cause undefined behaviour.
C11, $6.5, 7 states:
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.

The term "C" is used to describe two languages: one invented by K&R in which pointers identify physical memory locations, and one which is derived from that which works the same in cases where pointers are either read and written in ways that abide by certain rules, but may behave in arbitrary fashion if they are used in other ways. While the latter language is defined the by the Standards, the former language is what became popular for microcomputer programming in the 1980s.
One of the major impediments to generating efficient machine code from C code is that compilers can't tell what pointers might alias what variables. Thus, any time code accesses a pointer that might point to a given variable, generated code is required to ensure that the contents of the memory identified by the pointer and the contents of the variable match. That can be very expensive. The people writing the C89 Standard decided that compilers should be allowed to assume that named variables (static and automatic) will only be accessed using pointers of their own type or character types; the people writing C99 decided to add additional restrictions for allocated storage as well.
Some compilers offer means by which code can ensure that accesses using different types will go through memory (or at least behave as though they are doing so), but unfortunately I don't think there's any standard for that. C14 added a memory model for use with multi-threading which should be capable of achieving required semantics, but I don't think compilers are required to honor such semantics in cases where they can tell that there's no way for outside threads to access something [even if going through memory would be necessary to achieve correct single-thread semantics].
If you're using gcc and want to have memory semantics that work as K&R intended, use the "-fno-strict-aliasing" command-line option. To make code efficient it will be necessary to make substantial use of the "restrict" qualifier which was added in C99. While the authors of gcc seem to have focused more on type-based aliasing rules than "restrict", the latter should allow more useful optimizations.

why to cast pointers?

I have a little basic question about C pointers and casting.
I didn't understand why should I cast the pointers?
I just tried this code and I got the same output for each option:
#include <stdio.h>
int main(void)
{
int x = 3;
int *y = &x;
double *s;
option 1: s = (double*)y;
option 1: printf("%p, %d\n", (int*)s, *((int*)s));
option 2: s = y;
option 2: printf("%p, %d", s, *s);
return 0;
}
My question is why do I have to do: s = (double*)y?
Intuitively, the address is the same for all the variables. The difference should be about how many bytes to read from this address. Also, about the printing - if I use %d it will take automatically as an int.
Why do I need to cast it as an int*?
Am I wrong?

There is a great deal of confusion in your question. int and double are different things: they represent numbers in memory using different bit patterns and in most cases even a different number of bytes. The compiler produces code that converts between these representations when you implicitly or explicitly require that:
int x = 3;
double d;
d = x; // implicit conversion from integer representation to double
d = (double)x; // explicit conversion generates the same code
Here is a more subtle example:
d = x / 2; // equivalent to d = (double)(x / 2);
will store the double representation of 1 into d. Whereas this:
d = (double)x / 2; // equivalent to d = (double)x / 2.0;
will store the double representation of 1.5 into d.
Casting pointers is a completely different thing. It is not a conversion, but merely a coertion. You are telling the compiler to trust you know what you are doing. It does not change the address they point to, not does it affect the bytes in memory at that address. C does not let you store a pointer to int into a pointer to double because it is most likely an error: When you dereference the pointer using the wrong type, the content of memory is interpreted the wrong way, potentially yielding a different value or even causing a crash.
You can override the compiler's refusal by using an explicit cast to (double*), but you are asking for trouble.
When you later dereference s, you may access invalid memory and the value read is definitely meaningless.
Further confusion involves the printf function: the format specifiers in the format string are promisses by the programmer about the actual value types he passes as extra parameters.
printf("%p,%d",s,*s)
Here you pass a double (with no meaningful value) and you tell printf you passed an int. A blatant case of multiple undefined behaviour. Anything can happen, including printf producing similar output as the other option, leading to complete confusion. On 64 bit Intel systems, the way doubles and ints are passed to printf is different and quite complicated.
To avoid this kind of mistake in printf, compile with -Wall. gcc and clang will complain about type mismatches between the format specifiers and the actual values passed to printf.
As others have commented, you should not use pointer casts in C. Even the typical case int *p = (int *)malloc(100 * sizeof(int)) does not require a cast in C.

Explicit pointer casting is necessary to avoid compiler warnings. And compiler warnings on pointer type conflict exist to show possible bugs. With pointer casting you tell the compiler: "I know what I am doing here..."

In C, whether the behavior of adding a number i to a pointer adds i*sizeof(datatype) is compiler dependent?

In the following pseudo code, when a number i is added to the pointer abc, the next array index is pointed.
My application is expected to work as compiler independent and also similar snippet may be used for different data types. So is this a compiler dependent behavior ?
disp( void* abc)
{
int i = 0;
int n = 8;
for (i=0; i<n; ++i)
{
printf("\n %d",*((int*)abc+i) );
}
}
int main()
{
int abc[]={1,2,3,4,5,6,7,8};
disp(abc);
return 0;
}

Nope, you are safe. It is given by the standard:
6.5.6 Additive operators:
"8. When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
[...]"
This basically means, that a pointer to a Type is interpreted as an array of Type, and additions and subtractions can be viewed as operations on the index of such an array.
Since you cast the void* to something meaningful it's ok, because the pointer gets reinterpreted.

No, it's defined by the C standard. It will work the same on all compilers that implement C.
It's the basis for all pointer arithmetic in C, which is very commonly used.

Assuming the code is well-formed (you're missing a return type but it's okay for a pseudocode)
void disp( void* abc )
the answer is: you're using pointers arithmetic which is a standard-compliant behavior implemented by (I believe) every C compiler, thus your code is not compiler dependent. There are some differences (mostly regarding issues where the standard doesn't dictate something specific or leaves it as implementation dependent or no diagnostic required) where compilers might differ, but if you follow the C standard you should be good to go.
A word of advice on platform dependent issues: if you did something like
// Assuming char = 1 byte, int = 4 bytes
printf("\n %d", *((char*)abc+4*i) );
that would assume an integer == 4 bytes. And that is not platform independent (you can't assume an integer is always 4 bytes on every machine on the planet).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Storing Pointers difference in integers? - c

Related

Does casting actually DO anything?

What treatment can a pointer undergo and still be valid?

void pointer = int pointer = float pointer

why to cast pointers?

In C, whether the behavior of adding a number i to a pointer adds i*sizeof(datatype) is compiler dependent?

Categories

Resources