Does casting actually DO anything? - c

Consider the following snippet:
char x[100];
double *p = &x;
As expected, this yields this warning:
f.c:3:15: warning: initialization of ‘double *’ from incompatible pointer type ‘char (*)[100]’
[-Wincompatible-pointer-types]
3 | double *p = &x;
| ^
This is very easy to solve by just changing to
double *p = (double*)&x;
My question here is, does the casting actually DO anything? Would the code be invalid without the cast? Or is it just a way to make the compiler quiet? When is casting necessary?
I know that you can have some effect with snippets like this:
int x = 666;
int y = (char)x;
But isn't this the same as this?
int x = 666;
char c = x;
int y = c;
If it is the same, then the casting does something, but it's not necessary. Right?
Please help me understand this.

Casting can do several different things. As other answers have mentioned, it almost always changes the type of the value being cast (or, perhaps, an attribute of the type, such as const). It may also change the numeric value in some way. But there are many possible interpretations:
Sometimes it merely silences a warning, performing no "real" conversion at all (as in many pointer casts).
Sometimes it silences a warning, leaving only a type change but no value change (as in other pointer casts).
Sometimes the type change, although it involves no obvious value change, implies very different semantics for use of the value later (again, as in many pointer casts).
Sometimes it requests a conversion which is meaningless or impossible.
Sometimes it performs a conversion that the compiler would have performed by itself (with or without a warning).
But sometimes it forces a conversion that the compiler wouldn't have performed.
Also, sometimes those warnings that the compiler tried to make, that a cast silences, are innocuous and/or a nuisance, but sometimes they're quite real, and the code is likely to fail (that is, as the silenced warning was trying to tell you).
For some more specific examples:
A pointer cast that changes the type, but not the value:
char *p1 = ... ;
const char *p2 = (const char *)p;
And another:
unsigned char *p3 = (unsigned char *)p;
A pointer cast that changes the type in a more significant way, but that's guaranteed to be okay (on some architectures this might also change the value):
int i;
int *ip = &i;
char *p = (char *)ip;
A similarly significant pointer cast, but one that's quite likely to be not okay:
char c;
char *cp = &c;
int *ip = (int *)cp;
*ip = 5; /* likely to fail */
A pointer cast that's so meaningless that the compiler refuses to perform it, even with an explicit cast:
float f = 3.14;
char *p = (char)f; /* guaranteed to fail */
A pointer cast that makes a conversion, but one that the compiler would have made anyway:
int *p = (int *)malloc(sizeof(int));
(This one is considered a bad idea, because in the case where you forget to include <stdlib.h> to declare malloc(), the cast can silence a warning that might alert you to the problem.)
Three casts from an integer to a pointer, that are actually well-defined, due to a very specific special case in the C language:
void *p1 = (void *)0;
char *p2 = (void *)0;
int *p3 = (int *)0;
Two casts from integer to pointer that are not necessarily valid, although the compiler will generally do something obvious, and the cast will silence the otherwise warning:
int i = 123;
char *p1 = (char *)i;
char *p2 = (char *)124;
*p1 = 5; /* very likely to fail, except when */
*p2 = 7; /* doing embedded or OS programming */
A very questionable cast from a pointer back to an int:
char *p = ... ;
int i = (int)p;
A less-questionable cast from a pointer back to an integer that ought to be big enough:
char *p = ... ;
uintptr_t i = (uintptr_t)p;
A cast that changes the type, but "throws away" rather than "converting" a value, and that silences a warning:
(void)5;
A cast that makes a numeric conversion, but one that the compiler would have made anyway:
float f = (float)0;
A cast that changes the type and the interpreted value, although it typically won't change the bit pattern:
short int si = -32760;
unsigned short us = (unsigned short)si;
A cast that makes a numeric conversion, but one that the compiler probably would have warned about:
int i = (int)1.5;
A cast that makes a conversion that the compiler would not have made:
double third = (double)1 / 3;
The bottom line is that casts definitely do things: some of them useful, some of them unnecessary but innocuous, some of them dangerous.
These days, the consensus among many C programmers is that most casts are or should be unnecessary, meaning that it's a decent rule to avoid explicit casts unless you're sure you know what you're doing, and it's reasonable to be suspicious of explicit casts you find in someone else's code, since they're likely to be a sign of trouble.
As one final example, this was the case that, back in the day, really made the light bulb go on for me with respect to pointer casts:
char *loc;
int val;
int size;
/* ... */
switch(size) {
case 1: *loc += val; break;
case 2: *(int16_t *)loc += val; break;
case 4: *(int32_t *)loc += val; break;
}
Those three instances of loc += val do three pretty different things: one updates a byte, one updates a 16-bit int, and one updates a 32-bit int. (The code in question was a dynamic linker, performing symbol relocation.)

The cast does at least 1 thing - it satisfies the following constraint on assignment:
6.5.16.1 Simple assignment
Constraints
1 One of the following shall hold:112)
...
— the left operand has atomic, qualified, or unqualified pointer type, and (considering
the type the left operand would have after lvalue conversion) both operands are
pointers to qualified or unqualified versions of compatible types, and the type pointed
to by the left has all the qualifiers of the type pointed to by the right;
112) The asymmetric appearance of these constraints with respect to type qualifiers is due to the conversion
(specified in 6.3.2.1) that changes lvalues to ‘‘the value of the expression’’ and thus removes any type
qualifiers that were applied to the type category of the expression (for example, it removes const but
not volatile from the type int volatile * const).
That's a compile-time constraint - it affects whether or not the source code is translated to an executable, but it doesn't necessarily affect the translated machine code.
It may result in an actual conversion being performed at runtime, but that depends on the types involved in the expression and the host system.

Casting changes the type, which can be very important when signed or unsigned type matters,
For example, character handling functions such as isupper() are defined as taking an unsigned char value or EOF:
The header <ctype.h> declares several functions useful for classifying and mapping characters. In all cases the argument is an int, the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF. If the argument has any other value, the behavior is undefined.
Thus code such as
int isNumber( const char *input )
{
while ( *input )
{
if ( !isdigit( *input ) )
{
return( 0 );
}
input++;
}
// all digits
return( 1 );
}
should properly cast the const char value of *input to unsigned char:
int isNumber( const char *input )
{
while ( *input )
{
if ( !isdigit( ( unsigned char ) *input ) )
{
return( 0 );
}
input++;
}
// all digits
return( 1 );
}
Without the cast to unsigned char, when *input is promoted to int, an char value (assuming char is signed and smaller than int) that is negative will be sign-extended to a negative value that can not be represented as an unsigned char value and therefore invoke undefined behavior.
So yes, the cast in this case does something. It changes the type and therefore - on almost all current systems - avoids undefined behavior for input char values that are negative.
There are also cases where float values can be cast to double (or the reverse) to force code to behave in a desired manner.*
* - I've seen such cases recently - if someone can find an example, feel free to add your own answer...

The cast may or may not change the actual binary value. But that is not its main purpose, just a side effect.
It tells the compiler to interpret a value as a value of a different type. Any changing of binary value is a side effect of that.
It is for you (the programmer) to let the compiler know: I know what I'm doing. So you can shoot yourself in the foot without the compiler questioning you.
Don't get me wrong, cast are absolutely necessary in real world code, but they must be used with care and knowledge. Never cast just to get rid of a warning, make sure you understand the consequences.

It is theoretically possible that a system would use a different representation for a void * and a char * than for some other pointer type. The possibility exists that there could be a system that normally uses a narrow width register to hold pointer values. But, the narrow width may be insufficient if the code needed to address every single byte, and so a void * or char * would use a wider representation.
One case where casting a pointer value is useful is when the function takes a variable number of pointer arguments, and is terminated by NULL, such as with the execl() family of functions.
execl("/bin/sh", "sh", "-c", "echo Hello world!", (char *)NULL);
Without the cast, the NULL may expand to 0, which would be treated as an int argument. When the execl() function retrieves the last parameter, it may extract the expected pointer value incorrectly, since an int value was passed.

My question here is, does the casting actually DO anything?
Yes. It tells the compiler, and also other programmers including the future you, that you think you know what you're doing and you really do intend to treat a char as an int or whatever. It may or may not change the compiled code.
Would the code be invalid without the cast?
That depends on the cast in question. One example that jumps to mind involves division:
int a = 3;
int b = 5;
float c = a / b;
Questions about this sort of thing come up all the time on SO: people wonder why c gets a value of 0. The answer, of course, is that both a and b are int, and the result of integer division is also an integer that's only converted to a float upon assignment to c. To get the expected value of 0.6 in c, cast a or b to float:
float c = a / (float)b;
You might not consider the code without the cast to be invalid, but the next computation might involve division by c, at which point a division by zero error could occur without the cast above.
Or is it just a way to make the compiler quiet?
Even if the cast is a no-op in terms of changing the compiled code, preventing the compiler from complaining about a type mismatch is doing something.
When is casting necessary?
It's necessary when it changes the object code that the compiler generates. It might also be necessary if your organization's coding standards require it.

An example where cast makes a difference.
int main(void)
{
unsigned long long x = 1 << 33,y = (unsigned long long)1 << 33;
printf("%llx, %llx\n", x, y);
}
https://godbolt.org/z/b3qcPn

A cast is simply a type conversion: The implementation will represent the argument value by means of the target type. The expression of the new type (let's assume the target type is different) may have
a different size
a different value
and/or a different bit pattern representing the value.
These three changes are orthogonal to each other. Any subset, including none and all of them, can occur (all examples assume two's bit complement):
None of them: (unsigned int)1;
Size only: (char)1
Value only: (unsigned int)-1
Bit pattern only: (float)1 (my machine has sizeof(int) == sizeof(float))
Size and value, but not bit pattern (the bits present in the original value): (unsigned int)(char)-4
Size and bit pattern, but not value: (float)1l
value and bit pattern, but not size: (float)1234567890) (32 bit ints and floats)
All of them: (float)1234567890l (long is 64 bits).
The new type may, of course, influence expressions in which it is used, and will often have different text representations (e.g. via printf), but that's not really surprising.
Pointer conversions may deserve a little more discussion: The new pointer typically has the same value, size and bit representation (although, as Eric Postpischli correctly pointed out, it theoretically may not). The main intent and effect of a pointer cast is not to change the pointer; it is to change the semantics of the memory accessed through it.
In some cases a cast is the only means to perform a conversion (non-compatible pointer types, pointer vs. integer).
In other cases like narrowing arithmetic conversions (which may lose information or produce an overflow) a cast indicates intent and thus silences warnings. In all cases where a conversion could also be implicitly performed the cast does not alter the program's behavior — it is optional decoration.

Related

Is conversion of a function pointer to a uintptr_t / intptr_t invalid?

Microsoft extensions to C and C++:
To perform the same cast and also maintain ANSI compatibility, you can cast the function pointer
to a uintptr_t before you cast it to a data pointer:
int ( * pfunc ) ();
int *pdata;
pdata = ( int * ) (uintptr_t) pfunc;
Rationale for C, Revision 5.10, April-2003:
Even with an explicit cast, it is invalid to convert a function pointer to an object pointer
or a pointer to void, or vice versa.
C11:
7.20.1.4 Integer types capable of holding object pointers
Does it mean that pdata = ( int * ) (uintptr_t) pfunc; in invalid?
As Steve Summit says:
The C standard is written to assume that pointers to different object types, and especially pointers to function as opposed to object types, might have different representations.
While pdata = ( int * ) pfunc; leads to UB, it seems that pdata = ( int * ) (uintptr_t) pfunc; leads to IB. This is because "Any pointer type may be converted to an integer type" and "An integer may be converted to any pointer type" and uintptr_t is integer type.
Given the definitions
int (*pfunc)();
int *pdata;
, the assignments
pdata = (int *)pfunc;
pdata = (int *)(uintptr_t)pfunc;
are, IMO, equivalent. On a platform where data pointers are of the same size as, or larger than, function pointers, both assignments will work as desired. But on a platform where data pointers are smaller than function pointers, both assignments will inevitably scrape off some of the bits of the function pointer, resulting in a data pointer which can not be converted back to the original function pointer later.
In particular, I believe that both assignments are equivalent despite the presence of the (uintptr_t) cast in the second one. I believe that cast accomplishes precisely nothing.
On a platform where data pointers are smaller than function pointers, and where type uintptr_t is of the same size as data pointers, in the assignment
pdata = (int *)(uintptr_t)pfunc;
, the cast to (uintptr_t) will scrape off some of the bits of pfunc's value.
On a platform where data pointers are smaller than function pointers, and where type uintptr_t is of the same size as function pointers, in the assignment
pdata = (int *)(uintptr_t)pfunc;
, the cast to (int *) will scrape off some of the bits of pfunc's value.
In both cases pdata will end up with only some fraction of pfunc's original value.
(Here I disregard the possibility of architectures with padding bits or the like. On some bizarre, hypothetical platform where function pointers are larger than data pointers, but the extra bits are always 0, both assignments would again work.)
(I've also disregarded the possibility that int * is a different size than void *. I'm not sure whether that would affect the answer, whether a "detour" via void * is more or less un- or necessary when attempting a conversion from int (*)() to int *.)
Casting to uintptr_t only works if this type is defined, which may not be the case on legacy systems using ancient compilers. Note however that uintptr_t must be large enough for any object pointer, especially char * or void *, but may be smaller than function pointers. Such architectures are rare today and Microsoft compilers probably no longer support them, but they were common place in the 16-bit world (MS/DOS, Windows 1, 2 and 3.x) where the medium model had 32-bit segmented code pointers and 16-bit data pointers.
Note also that the C Standard allows for int * and void * to have a different size and representation albeit no Microsoft compiler supports such exotic targets.
On current systems with modern compilers, all data pointers and code pointers have almost always the same size and representation. This is actually a requirement for POSIX compatibility, so the recommendation to use an intermediary cast to (uintptr_t) is valid and effective.
For complete portability, if the goal is to pass a function pointer via an opaque void *, you can always allocate an object of the proper function pointer type, initialize it with pfunc and pass its address:
// setting up the void *
int (*pfunc)();
void *pdata = malloc(sizeof pfunc);
memcpy(pdata, &pfunc, sizeof pfunc);
// using the void *
int (**ppfunc)() = pdata;
(*ppfunc)(); // equivalent to (**ppfunc)();
Is conversion of a function pointer to a uintptr_t / intptr_t invalid?
No. It may be valid. It may be undefined behavior.
Conversion of a function pointer to ìnt* is not defined. Nor to any object pointer. Nor to void *.
pdata = ( int * ) pfunc; is undefined behavior.
Conversion of a function pointer to an integer type is allowed, with restrictions:
Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type. C17dr 6.3.2.3 6
Also integer to a pointer type is allowed.
An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation. C17dr 6.3.2.3 6
void * to integer to void * is defined. Object pointer to/from void* is defined. Then the optional (u)intptr_t types are sufficient for round-trip success. Yet we are concerned about a function pointer. Often enough function pointers are wider than an int *.
Thus converting a function pointer to int * only makes sense through an integer type, wider the better.
VS may recommend through the optional type uintptr_t and is likely sufficient if information is lossless on other platforms. Yet uintmax_t may afford less loss of information, especially in the function pointer to integer step, so I pedantically suggest:
pdata = ( int * ) (uintmax_t) pfunc;
Regardless of the steps taken, code is likely to become implementation specific and deserves guards.
#ifdef this && that
pdata = ( int * ) (uintmax_t) pfunc;
#else
#error TBD code
#endif
Migrating the solution from the question to an answer:
Here is the answer from Microsoft:
Q: How exactly "cast the function pointer to a uintptr_t before you cast it to a data pointer" leads to maintaining "ANSI compatibility"?
A: Without the cast to uintptr_t it’s possible that the code will fail to compile with other compilers, even if they use the same pointer model. For example: https://gcc.godbolt.org/z/9EjTe1s4x - if you add the uintptr_t it compiles without warnings/errors.

Is comparing two pointers with < undefined behavior if they are both cast to an integer type?

Let's say I have this code that copies one block of memory to another in a certain order based on their location:
void *my_memmove(void *dest, const void *src, size_t len)
{
const unsigned char *s = (const unsigned char *)src;
unsigned char *d = (unsigned char *)dest;
if(dest < src)
{
/* copy s to d forwards */
}
else
{
/* copy s to d backwards */
}
return dest;
}
This is undefined behavior if src and dest do not point to members of the same array(6.8.5p5).
However, let's say I cast these two pointers to uintptr_t types:
#include <stdint.h>
void *my_memmove(void *dest, const void *src, size_t len)
{
const unsigned char *s = (const unsigned char *)src;
unsigned char *d = (unsigned char *)dest;
if((uintptr_t)dest < (uintptr_t)src)
{
/* copy s to d forwards */
}
else
{
/* copy s to d backwards */
}
return dest;
}
Is this still undefined behavior if they're not members of the same array? If it is, what are some ways that I could compare these two locations in memory legally?
I've seen this question, but it only deals with equality, not the other comparison operators (<, >, etc).
The conversion is legal but there is, technically, no meaning defined for the result. If instead you convert the pointer to void * and then convert to uintptr_t, there is slight meaning defined: Performing the reverse operations will reproduce the original pointer (or something equivalent).
It particular, you cannot rely on the fact that one integer is less than another to mean it is earlier in memory or has a lower address.
The specification for uintptr_t (C 2018 7.20.1.4 1) says it has the property that any valid void * can be converted to uintptr_t, then converted back to void *, and the result will compare equal to the original pointer.
However, when you convert an unsigned char * to uintptr_t, you are not converting a void * to uintptr_t. So 7.20.1.4 does not apply. All we have is the general definition of pointer conversions in 6.3.2.3, in which paragraphs 5 and 6 say:
An integer may be converted to any pointer type. Except as previously specified [involving zero for null pointers], the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.
Any pointer type may be converted to an integer type. Except as previously specified [null pointers again], the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.
So these paragraphs are no help except they tell you that the implementation documentation should tell you whether the conversions are useful. Undoubtedly they are in most C implementations.
In your example, you actually start with a void * from a parameter and convert it to unsigned char * and then to uintptr_t. So the remedy there is simple: Convert to uintptr_t directly from the void *.
For situations where we have some other pointer type, not void *, then 6.3.2.3 1 is useful:
A pointer to void may be converted to or from a pointer to any object type. A pointer to any object type may be converted to a pointer to void and back again; the result shall compare equal to the original pointer.
So, converting to and from void * is defined to preserve the original pointer, so we can combine it with a conversion from void * to uintptr_t:
(uintptr_t) (void *) A < (uintptr_t) (void *) B
Since (void *) A must be able to produce the original A upon conversion back, and (uintptr_t) (void *) A must be able to produce its (void *) A, then (uintptr_t) (void *) A and (uintptr_t) (void *) B must be different if A and B are different.
And that is all we can say from the C standard about the comparison. Converting from pointers to integers might produce the address bits out of order or some other oddities. For example, they might produce a 32-bit integer contain a 16-bit segment address and a 16-bit offset. Some of those integers might have higher values for lower addresses while others have lower values for lower addresses. Worse, the same address might have two representations, so the comparison might indicate “less than” even though A and B refer to the same object.
No. Each results in an implementation-defined value, and comparison of integers is always well-defined (as long as their values are not indeterminate). Since the values are implementation-defined, the result of the comparison need not be particularly meaningful in regard to the pointers; however, it must be consistent with the properties of integers and the values that the implementation-defined conversions produced. Moreover, the C standard expresses an intent that conversions of pointers to integers should respect the address model of the implementation, making them somewhat meaningful if this is followed. See footnote 67 under 6.3.2.3 Pointers:
The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.
However, some current compilers wrongly treat this as undefined behavior, at least under certain conditions, and there is a movement from compiler folks to sloppily formalize that choice via a notion of "provenance", which is gratuitously internally inconsistent and a disaster in the making (it could be made internally consistent and mostly non-problematic with trivial changes that are cost-free to code where it matters, but the people who believe in this stuff are fighting that for Reasons(TM)).
I'm not up-to-date on the latest developments in the matter, but you can search for "pointer provenance" and find the draft documents.
Comparing two pointers converted to uintptr_t should not have undefined behaviour at all. It does not even should have unspecified behaviour. Note that you should first cast the values to void * to ensure the same presentation, before casting to uintptr_t. However, compilers have had behaviour where two pointers were deemed to be unequal even though they pointed to the same address, and likewise, these pointers cast to uintptr_t compared unequal to each other (GCC 4.7.1 - 4.8.0). The latter is however not allowed by the standard. However there is *ongoing debate on the extent of pointer provenance tracking and this is part of it.
The intent of the standard according to C11 footnote 67 is that this is "to be consistent with the addressing structure of the execution environment". The conversion from pointer to integer is implementation-defined and you must check the implementation for the meaning of the cast. For example for GCC, it is defined as follows:
The result of converting a pointer to an integer or vice versa (C90
6.3.4, C99 and C11 6.3.2.3).
A cast from pointer to integer discards most-significant bits if the
pointer representation is larger than the integer type, sign-extends 2)
if the pointer representation is smaller than the integer type,
otherwise the bits are unchanged.
A cast from integer to pointer discards most-significant bits if the
pointer representation is smaller than the integer type, extends
according to the signedness of the integer type if the pointer
representation is larger than the integer type, otherwise the bits are
unchanged.
When casting from pointer to integer and back again, the resulting
pointer must reference the same object as the original pointer,
otherwise the behavior is undefined. That is, one may not use integer
arithmetic to avoid the undefined behavior of pointer arithmetic as
proscribed in C99 and C11 6.5.6/8.
For example on x86-32, x86-64 and GCC we can be assured that the behaviour of a pointer converted to uintptr_t is that the linear offset is converted as-is.
The last clause refers to pointer provenance, i.e. the compiler can track the identity of pointer stored in an (u)intptr_t, just like it can track the identity of a pointer in any other variable. This is totally allowed by C standard as it states just that you are ever guaranteed to be able to cast a pointer to void to (u)intptr_t and back again.
I.e.
char foo[4] = "abc";
char bar[4] = "def";
if (foo + 4 == bar) {
printf("%c\n", foo[4]); // undefined behaviour
}
and given that foo + 4 compares equal to bar (allowed by the C standard), you cannot dereference foo[4] because it does not alias bar[0]. Likewise even if foo + 4 == bar you cannot do
uintptr_t foo_as_int = (uintptr_t)(void *)foo;
if (foo_as_int + 4 == (uintptrt_t)(void *)bar) {
char *bar_alias = (void *)(foo_as_int + 4);
printf("%c\n", bar_alias[0]); // undefined behaviour
}
There is no guarantee that the numeric value produced by converting a pointer to uintptr_t have any meaningful relationship to the pointer in question. A conforming implementation with enough storage could make the first pointer-to-integer conversion yield 1, the second one 2, etc. if it kept a list of all the pointers that were converted.
Practical implementations, of course, almost always perform pointer-to-uintptr_t conversions in representation-preserving fashion, but because the authors of the Standard didn't think it necessary to officially recognize a category of programs that would be portable among commonplace implementations for commonplace platforms, some people regard any such code as "non-portable" and "broken". This completely contradicts the intention of the Standard's authors, who made it clear that they did not wish to demean programs that were merely conforming but not strictly conforming, but it is unfortunately the prevailing attitude among some compiler maintainers who don't need to satisfy customers in order to get paid.
No, it's only implementation-defined behavior. However, if you use == to make sure the objects overlap before comparing them with < or >, then it is neither implementation-defined behavior or undefined behavior. This is how you would implement such a solution:
#include <string.h>
void *my_memmove(void *dest, const void *src, size_t len)
{
const unsigned char *s = src;
unsigned char *d = dest;
size_t l;
if(dest == src)
goto end;
/* Check for overlap */
for( l = 0; l < len; l++ )
{
if( s + l == d || s + l == d + len - 1 )
{
/* The two objects overlap, so we're allowed to
use comparison operators. */
if(s > d)
{
/* copy forwards */
break;
}
else /* (s < d) */
{
/* copy backwards */
s += len;
d += len;
while(len--)
{
*--d = *--s;
}
goto end;
}
}
}
/* They don't overlap or the source is after
the destination, so copy forwards */
while(len--)
{
*s++ = *d++;
}
end:
return dest;
}

Why typecasting between char *ptr to int and int *ptr to char works completely fine? when they are designed to point to specific datatyped variables

I have just started learning pointers. I have some questions regarding pointers typecasting. Consider below program.
int main(){
int a = 0xff01;
char *s = &a;
char *t = (int *) &a;
printf("%x",*(int *)s);
printf(" %x",*(int *)t);
return 0;
}
The statement char *s = &a gives
warning: incompatible pointer conversion type.
But noticed that the two printf() statements works fine they give me the right output. The question is
char *t , char *s both are pointers to character type.
Why does 'C' compilers lets me to assign integer variable to char *p ? why dont they raise an error and restrict the programmer?
We have int *ptr to point to integer variables, then why do they still allow programmer to make char *ptr point to integer variable?
// Another sample code
char s = 0x02;
int *ptr = (char *)&s;
printf("%x",*(char *)ptr); // prints the right output
Why does an int *ptr made point to character type? it works. why compiler dont restrict me?
I really think this leads me to confusion. If the pointer types are interchangeable with a typecast then what is the point to have two different pointers char *ptr , int *ptr ?
when we could retrieve values using (int *) or (char *).
All pointers are of same size 4bytes(on 32bite machine). Then one could use void pointer.
Yes people told me, void pointers always needs typecasting when retrieving values from memory. When we know the type of variable we go for that specific pointer which eliminates the use of casting.
int a = 0x04;
int *ptr = &a;
void *p = &a;
printf("%x",*ptr); // does not require typecasting.
printf("%x",*(int *)p); // requires typecasting.
Yes, I have read that back in old days char *ptrs played role of void pointers. Is this one good reason? why still compilers support typecasting between pointers? Any help is greatly appreciated.
Compiling with GCC 4.9.1 on Mac OS X 10.9.5, using this mildly modified version of your code (different definition of main() so it compiles with my preferred compiler options, and included <stdio.h> which I assume was omitted for brevity in the question — nothing critical) in file ptr.c:
#include <stdio.h>
int main(void)
{
int a = 0xff01;
char *s = &a;
char *t = (int *) &a;
printf("%x",*(int *)s);
printf(" %x",*(int *)t);
return 0;
}
I get the compilation errors:
$ gcc -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes \
-Wold-style-definition -Werror ptr.c -o ptr
ptr.c: In function ‘main’:
ptr.c:6:15: error: initialization from incompatible pointer type [-Werror]
char *s = &a;
^
ptr.c:7:14: error: initialization from incompatible pointer type [-Werror]
char *t = (int *) &a;
^
cc1: all warnings being treated as errors
$
So, both assignments are the source of a warning; my compilation options turn that into an error.
All pointers other than void * are typed; they point to an object of a particular type. Void pointers don't point to any type of object and must be cast to a type before they can be dereferenced.
In particular, char * and int * point to different types of data, and even when they hold the same address, they are not the same pointer. Under normal circumstances (most systems, most compilers — but there are probably exceptions if you work hard enough, but you're unlikely to be running into one of them)…as I was saying, under normal circumstances, the types char * and int * are not compatible because they point to different types.
Given:
int data = 0xFF01;
int *ip = &data;
char *cp = (char *)&data;
the code would compile without complaint. The int data line is clearly unexceptional (unless you happen to have 16-bit int types — but I will assume 32-bit systems). The int *ip line assigns the address of data to ip; that is assigning a pointer to int to a pointer to int, so no cast is necessary.
The char *cp line forces the compiler's hand to treat the address of data as a char pointer. On most modern systems, the value in cp is the same as the value in ip. On the system I learned C on (ICL Perq), the value of a char * address to a memory location was different from the 'anything else pointer' address to the same memory location. The machine was word-oriented, and byte-aligned addresses had extra bits set in the high end of the address. (This was in the days when the expansion of memory from 1 MiB to 2 MiB made a vast improvement because 750 KiB were used by the O/S, so we actually got about 5 times as much memory after as before for programs to use! Gigabytes and gibibytes were both fantasies, whether for disk or memory.)
Your code is:
int a = 0xff01;
char *s = &a;
char *t = (int *) &a;
Both the assignments have an int * on the RHS. The cast in the second line is superfluous; the type of &a is int * both before and after the cast. Assigning an int * to a char * is a warnable offense — hence the compiler warned. The types are different. Had you written char *t = (char *)&a;, then you would have gotten no warning from the compiler.
The printing code works because you take the char * values that were assigned to s and t and convert them back to the original int * before dereferencing them. This will usually work; the standard guarantees it for conversions to void * (instead of char *), but in practice it will normally work for anything * where anything is an object type, not a function type. (You are not guaranteed to be able to convert function pointers to data pointers and back again.)
The statement char *s = &a gives
warning: incompatible pointer conversion type.
In this case, the warning indicates a constraint violation: A compiler must complain and may refuse to compile. For initialization (btw, a declaration is not a statement), the same conversion rules as for assignment apply, and there is no implicit conversion from int * to char * (or the other way round). That is, an explicit cast is required:
char *s = (char *)&a;
Why do C compilers let me assign an integer variable to char *p? Why don’t they raise an error and restrict the programmer?
Well, you’ve got a warning. At the very least, a warning means you must understand why it is there before you ignore it. And as said above, in this case a compiler may refuse to compile.*)
We have int *ptr to point to integer variables, then why do they still allow programmer to make char *ptr point to integer variable?
Pointers to a character type are special, they are allowed to point to objects of every type. That you’re allowed to do so, doesn’t mean it’s a good idea, the cast is required to keep you from doing such a conversion accidently. For pointer-to-pointer conversions in general, see below.
int *ptr = (char *)&s;
Here, ptr is of type int *, and is initialized with a value of type char *. This is, again, a constraint violation.
printf("%x",*(char *)ptr); // prints the right output
If a conversion from a pointer to another is valid, the conversion back also is and always yields the original value.
If the pointer types are interchangeable with a typecast then what is the point to have two different pointers char *ptr, int *ptr?
Types exist to save you from errors. Casts exist to give you a way to tell the compiler that you know what you’re doing.
All pointers are of same size 4bytes(on 32bite machine). Then one could use void pointer.
That’s true for many architectures, but quite not for all the C standard addresses. Having only void pointers would be pretty useless, as you cannot really do anything with them: no arithmetic, no dereferencing.
Yes, I have read that back in old days char *ptrs played role of void pointers. Is this one good reason?
Perhaps a reason. (If a good one, is another question…)
When pointer-to-pointer conversions are allowed:
C11 (N1570) 6.3.2.3 p7
A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned×) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.
×) In general, the concept “correctly aligned” is transitive: if a pointer to type A is correctly aligned for a pointer to type B, which in turn is correctly aligned for a pointer to type C, then a pointer to type A is correctly aligned for a pointer to type C.
Pointers to character types and pointers to void, as mentioned above, are always correctly aligned (and so are int8_t and uint8_t if they exist). There are platforms, on which a conversion from an arbitrary char pointer to an int pointer may violate alignment restrictions and cause a crash if executed.
If a converted pointer satisfies alignment requirements, this does not imply that it’s allowed to dereference that pointer, the only guarantee is that it’s allowed to convert it back to what it originally pointed to. For more information, look for strict-aliasing; in short, this means you’re not allowed to access an object with an expression of the wrong type, the only exception being the character types.
*) I don’t know the reasons in your particular case, but as an example, where it’s useful to give implementations such latitude in how to treat ill-formed programmes, see for example object-pointer-to-function-pointer conversions: They are a constraint violation (so they require a diagnostic message from the compiler) but are valid on POSIX systems (which guarantess well-defined semantics for such conversions). If the C standard required a conforming implementation to abort compilation, POSIX had to contradict ISO C (cf. POSIX dlsym for an example why these conversions can be useful), which it explicitly doesn’t intend to.
Pointers are not having any types, types described with pointer in program actually means that to which kind of data pointer is pointing. Pointers will be of same size.
When you write,
char *ptr;
it means that is is pointer to character type data and when dereferenced, it will fetch one bytes data from memory
Similarly,
double *ptr;
is pointer to double type data. So when you dereference, they will fetch 8 bytes starting from the location pointed by pointer.
Now remember that all the pointer are of 4 bytes on 32 bit machines irrespective of the type of data to which it is pointing. So if you store integer variable's address to a pointer which is pointing to character, it is completely legal and if you dereference it, it will get only one byte from memory. That is lowest byte of integer on little endian pc and highest byte of integer on big endian pc.
Now you are type casting your pointer to int type explicitly. So while dereferencing it will get while integer and print it. There is nothing wrong with this and this is how pointers work in c.
In your second example you are doing the same. Assigning address of character type variable to pointer which is pointing to integer. Again you are type casting pointer to character type so by dereference it will get only one byte from that location which is your character.
And frankly speaking, i dont know any practical usage of void pointer but as far as i know, void pointers are used when many type of data is to be dereferenced using a single pointer.
Consider that you want to store an integer variable's address to pointer. So you will declare pointer of integer type. Now later in program there is a need to store a double variable's address to pointer. So instead of declaring a new pointer you store its address in int type pointer then if you dereference using it, there will be a big problem and result in logical error which may get unnoticed by you if you have forgot to type cast it to double type. This is not case with void pointer. If you use void pointer, you have to compulsarily type cast it to particular type inorder to fetch data from memory. Otherwise compiler will show error. So in such cases using void pointer reminds you that you have to type cast it to proper type every time otherwise compiler will show you error. But no error will be shown in previous case

Is the strict aliasing rule really a "two-way street"?

In these comments user #Deduplicator insists that the strict aliasing rule permits access through an incompatible type if either of the aliased or the aliasing pointer is a pointer-to-character type (qualified or unqualified, signed or unsigned char *). So, his assertion is basically that both
long long foo;
char *p = (char *)&foo;
*p; // just in order to dereference 'p'
and
char foo[sizeof(long long)];
long long *p = (long long *)&foo[0];
*p; // just in order to dereference 'p'
are conforming and have defined behavior.
In my read, however, it is only the first form that is valid, that is, when the aliasing pointer is a pointer-to-char; however, one can't do that in the other direction, i. e. when the aliasing pointer points to an incompatible type (other than a character type), the aliased pointer being a char *.
So, the second snippet above would have undefined behavior.
What's the case? Is this correct? For the record, I have already read this question and answer, and there the accepted answer explicitly states that
The rules allow an exception for char *. It's always assumed that char * aliases other types. However this won't work the other way, there's no assumption that your struct aliases a buffer of chars.
(emphasis mine)
You are correct to say that this is not valid. As you yourself have quoted (so I shall not re-quote here) the guaranteed valid cast is only from any other type to char*.
The other form is indeed against standard and causes undefined behaviour. However as a little bonus let us discuss a little behind this standard.
Chars, on every significant architecture is the only type that allows completely unaligned access, this is due to the read byte instructions having to work on any byte, otherwise they would be all but useless. This means that an indirect read to a char will always be valid on every CPU I know of.
However the other way around this will not apply, you cannot read a uint64_t unless the pointer is aligned to 8 bytes on most arches.
However, there is a very common compiler extension allowing you to cast properly aligned pointers from char to other types and access them, however this is non-standard. Also note, if you cast a pointer to any type to a pointer to char and then cast it back the resultant pointer is guaranteed to be equal to the original object. Therefore this is ok:
struct x *mystruct = MakeAMyStruct();
char * foo = (char *)mystruct;
struct x *mystruct2 = (struct mystruct *)foo;
And mystruct2 will equal mystruct. This also guarantees the struct is properly aligned for it's needs.
So basically, if you want a pointer to char and a pointer to another type, always declare the pointer to the other type then cast to char. Or even better use a union, that is what they are basically for...
Note, there is a notable exception to the rule however. Some old implementations of malloc used to return a char*. This pointer is always guaranteed to be castable to any type successfully without breaking aliasing rules.
Deduplicator is correct. The undefined behaviour that allows compilers to implement "strict aliasing" optimizations doesn't apply when character values are being used to produce a representation of an object.
Certain object representations need not represent a value of the object type. If the
stored value of an object has such a representation and is read by an lvalue expression
that does not have character type, the behavior is undefined. If such a representation
is produced by a side effect that modifies all or any part of the object by an lvalue
expression that does not have character type, the behavior is undefined. Such a
representation is called a trap representation.
However your second example has undefined behaviour because foo is uninitialized. If you initialize foo then it only has implementation defined behaviour. It depends on the implementation defined alignment requirements of long long and whether long long has any implementation defined pad bits.
Consider if you change your second example to this:
long long bar() {
char *foo = malloc(sizeof(long long));
char c;
for(c = 0; c < sizeof(long long); c++)
foo[c] = c;
long long *p = (long long *) p;
return *p;
}
Now alignment is no longer issue and this example is only dependent of the implementation defined representation of long long. What value is returned depends on the representation of long long but if that representation is defined as having no pad bits them this function must always return the same value and it must also always be a valid value. Without pad bits this function can't generate a trap representation, and so the compiler cannot perform any strict aliasing type optimizations on it.
You have to look pretty hard to find a standard conforming implementation of C that has implementation defined pad bits in any of its integer types. I doubt you'll find one that implements any sort of strict aliasing type of optimization. In other words, compilers don't use the undefined behaviour caused by accessing a trap representation to allow strict-aliasing optimizations because no compiler that implements strict-aliasing optimizations has defined any trap representations.
Note also that had buf been initialized with all zeros ('\0' characters) then this function wouldn't have any undefined or implementation defined behaviour. An all-bits-zero representation of a integer type is guaranteed not to be a trap representation and guaranteed to have the value 0.
Now for a strictly conforming example that uses char values to create a guaranteed valid (possibly non-zero) representation of a long long value:
#include <stdio.h>
#include <stdlib.h>
int
main(int argc, char **argv) {
int i;
long long l;
char *buf;
if (argc < 2) {
return 1;
}
buf = malloc(sizeof l);
if (buf == NULL) {
return 1;
}
l = strtoll(argv[1], NULL, 10);
for (i = 0; i < sizeof l; i++) {
buf[i] = ((char *) &l)[i];
}
printf("%lld\n", *(long long *)buf);
return 0;
}
This example has no undefined behaviour and is not dependent on the alignment or representation of long long. This is the sort of code that the character type exception on accessing objects was created for. In particular this means that Standard C lets you implement your own memcpy function in portable C code.

What's the difference between "(type)variable" and "*((type *)&variable)", if any?

I would like to know if there is a difference between:
Casting a primitive variable to another primitive type
Dereferencing a cast of a primitive variable's address to a pointer of another primitive type
I would also like to know if there is a good reason to ever use (2) over (1). I have seen (2) in legacy code which is why I was wondering. From the context, I couldn't understand why (2) was being favored over (1). And from the following test I wrote, I have concluded that at least the behavior of an upcast is the same in either case:
/* compile with gcc -lm */
#include <stdio.h>
#include <math.h>
int main(void)
{
unsigned max_unsigned = pow(2, 8 * sizeof(unsigned)) - 1;
printf("VALUES:\n");
printf("%u\n", max_unsigned + 1);
printf("%lu\n", (unsigned long)max_unsigned + 1); /* case 1 */
printf("%lu\n", *((unsigned long *)&max_unsigned) + 1); /* case 2 */
printf("SIZES:\n");
printf("%d\n", sizeof(max_unsigned));
printf("%d\n", sizeof((unsigned long)max_unsigned)); /* case 1 */
printf("%d\n", sizeof(*((unsigned long *)&max_unsigned))); /* case 2 */
return 0;
}
Output:
VALUES:
0
4294967296
4294967296
SIZES:
4
8
8
From my perspective, there should be no differences between (1) and (2), but I wanted to consult the SO experts for a sanity check.
The first cast is legal; the second cast may not be legal.
The first cast tells the compiler to use the knowledge of the type of the variable to make a conversion to the desired type; the compiler does it, provided that a proper conversion is defined in the language standard.
The second cast tells the compiler to forget its knowledge of the variable's type, and re-interpret its internal representation as that of a different type *. This has limited applicability: as long as the binary representation matches that of the type pointed by the target pointer, this conversion will work. However, this is not equivalent to the first cast, because in this situation value conversion never takes place.
Switching the type of the variable being cast to something with a different representation, say, a float, illustrates this point well: the first conversion produces a correct result, while the second conversion produces garbage:
float test = 123456.0f;
printf("VALUES:\n");
printf("%f\n", test + 1);
printf("%lu\n", (unsigned long)test + 1);
printf("%lu\n", *((unsigned long *)&test) + 1); // Undefined behavior
This prints
123457.000000
123457
1206984705
(demo)
* This is valid only when one of the types is a character type and the pointer alignment is valid, type conversion is trivial (i.e. when there is no conversion), when you change qualifiers or signedness, or when you cast to/from a struct/union with the first member being a valid conversion source/target. Otherwise, this leads to undefined behavior. See C 2011 (N1570), 6.5 7, for complete description. Thanks, Eric Postpischil, for pointing out the situations when the second conversion is defined.
Let's look at two simple examples, with int and float on modern hardware (no funny business).
float x = 1.0f;
printf("(int) x = %d\n", (int) x);
printf("*(int *) &x = %d\n", *(int *) &x);
Output, maybe... (your results may differ)
(int) x = 1
*(int *) &x = 1065353216
What happens with (int) x is you convert the value, 1.0f, to an integer.
What happens with *(int *) &x is you pretend that the value was already an integer. It was NOT an integer.
The floating point representation of 1.0 happens to be the following (in binary):
00111111 100000000 00000000 0000000
Which is the same representation as the integer 1065353216.
This:
(type)variable
takes the value of variable and converts it to type type. This conversion does not necessarily just copy the bits of the representation; it follows the language rules for conversions. Depending on the source and target types, the result may have the same mathematical value as variable, but it may be represented completely differently.
This:
*((type *)&variable)
does something called aliasing, sometimes informally called type-punning. It takes the chunk of memory occupied by variable and treats it as if it were an object of type type. It can yield odd results, or even crash your program, if the source and target types have different representations (say, an integer and a floating-point type), or even if they're of different sizes. For example, if variable is a 16-bit integer (say, it's of type short), and type is a 32-bit integer type, then at best you'll get a 32-bit result containing 16 bits of garbage -- whereas a simple value conversion would have given you a mathematically correct result.
The pointer cast form can also give you alignment problems. If variable is byte-aligned and type requires 2-byte or 4-byte alignment, for example, you can get undefined behavior, which could result either in a garbage result or a program crash. Or, worse yet, it might appear to work (which means you have a hidden bug that may show up later and be very difficult to track down).
You can examine the representation of an object by taking its address and converting it to unsigned char*; the language specifically permits treating any object as an array of character type.
But if a simple value conversion does the job, then that's what you should use.
If variable and type are both arithmetic, the cast is probably unnecessary; you can assign an expression of any arithmetic type to an object of any arithmetic type, and the conversion will be done implicitly.
Here's an example where the two forms have very different behavior:
#include <stdio.h>
int main(void) {
float x = 123.456;
printf("d = %g, sizeof (float) = %zu, sizeof (unsigned int) = %zu\n",
x, sizeof (float), sizeof (unsigned int));
printf("Value conversion: %u\n", (unsigned int)x);
printf("Aliasing : %u\n", *(unsigned int*)&x);
}
The output on my system (it may be different on yours) is:
d = 123.456, sizeof (float) = 4, sizeof (unsigned int) = 4
Value conversion: 123
Aliasing : 1123477881
What's the difference between “(type)variable” and “*((type *)&variable)”, if any?
The second expression may lead to alignment and aliasing issues.
The first form is the natural way to convert a value to another type. But assuming there is no violation of alignment or aliasing, in some cases the second expression has an advantage over the first form. *((type *)&variable) will yield a lvalue whereas (type)variable will not yield a lvalue (the result of a cast is never a lvalue).
This allows you do things like:
(*((type *)& expr)))++
See for example this option from Apple gcc manual which performs a similar trick:
-fnon-lvalue-assign (APPLE ONLY): Whenever an lvalue cast or an lvalue conditional expression is encountered, the compiler will issue a deprecation warning
and then rewrite the expression as follows:
(type)expr ---becomes---> *(type *)&expr
cond ? expr1 : expr2 ---becomes---> *(cond ? &expr1 : &expr2)
Casting the pointer makes a difference when working on a structure:
struct foo {
int a;
};
void foo()
{
int c;
((struct foo)(c)).a = 23; // bad
(*(struct foo *)(&c)).a = 42; // ok
}
First one ((type)variable is simple casting a variable to desired type and second one (*(type*)&variable) is derefencing a pointer after being casted by the desired pointer type.
The difference is that in the second case you may have undefined behavior. The reason being that unsinged is the same as unsigned int and an unsigned long may be larger than the the unsigned int, and when casting to a pointer which you dereference you read also the uninitialized part of the unsigned long.
The first case simply converts the unsigned int to an unsigned long with extends the unsigned int as needed.

Resources