tricky Pointer arithmetic in C: **k - c

Assume k is a pointer to an integer in C.
For the expression **k, when we try to evaluate this on the right side of an assignment operator("="), would the value be illegal?
Here is my thought:
**k is actually *(*k). When we dereference k, we get the value of an integer. Then we try to dereference an integer, which is an illegal operation.
But my textbook says this expression on the right side is actually legal.
Why so?

The C 2018 standard says, in clause 6.5.3.2, paragraph 2, “The operand of the unary * operator shall have pointer type.” If k is a pointer to an integer, then *k is an integer, which is not a pointer type, so it cannot be the operand of a unary * operator. Thus an expression such as x = **k violates this rule.
The rule ins 6.5.3.2 2 is a constraint, meaning that a conforming compiler is required to produce a diagnostic message for and that the C standard does not define the behavior.
Technically, a C compiler could, in addition to issuing the diagnostic message, accept the expression and define as it pleases. I am not aware of any that do so, and no common compiler does so.
It is possible the characters **k might appear in some larger expression where they do not both act as unary * operators, such as in x = y**k, which is equivalent to x = y * *k, in which the first * is a binary multiplication operator. You should show the exact text shown in your textbook.

Beyond what the standard says, dereferencing an int fundamentally makes no sense. Let's say the compiler is willing to assume that ints can be converted to directly to pointers (a BIG assumption). What TYPE of pointer will the compiler assume it is? The only safe assumption is void*. And dereferencing a void* makes no sense because even when assigning to a known type, the type of pointer still matters:
unsigned int n = 0xFFFFFFFF;
void *pN = &n;
unsigned int fromIntPtr = *(unsigned int*)pN;
unsigned int fromCharPtr = *(unsigned char*)pN;
printf("%X\n", fromIntPtr);
printf("%X\n", fromCharPtr);
Output:
FFFFFFFF
FF
Lacking a type of pointer, the compiler could perhaps infer unsigned int* based on the LHS expression. That's 1) a wildly stupid inference and 2) C doesn't really infer types. (Shoving a typed RHS into a differently typed LHS value isn't inference :))

Related

Why can't we use * on non-pointers?

Let's say I have the code:
int x = 5;
int* p = &x;
then writing *p will return 5 and allow me to modify x (as expected). Say, for whatever reason that I then write:
int y = p; // y holds x's address
*y = 3; // this is invalid and throws an error when compiling
*((int*)y) = 3; // this is okay
(when compiling on gcc 9.2)
My question is: why does C not allow us to use * on non-pointer types?
C is a strongly typed language, which means that the operations which are allowed on an object (and the interpretation of those operations) is a function of the object's type. That's literally what it means for an object to have a type: the type determines the operations you can do with the object.
Unary * (the pointer indirection operator) is defined for pointer types, and it's not defined for integer types.
If you want to treat an integer's value as if it were a pointer, you can use an explicit cast, as in the *((int *)y) = 3; example you mentioned in your question.
There are two reasons the unary * operator is not defined for integers:
Taking an integer and pretending it's a pointer is generally a bad idea, not something to be encouraged. If you really want to do it, the extra cost imposed on you -- namely that you have to use that pointer cast -- is appropriate.
The bare expression *y doesn't contain enough information to know how big the pointed-to object might be. If you write *y = 3 and it were legal, how would the compiler know to assign an int, a short, or a char?
Point 2 is key. It's important to remember that C does not have one "pointer" type. Every pointer type incorporates a specification of the type of object which the pointer will point to. That's no accident, it's fundamental, and there's no way around it.
So you can't implicitly treat an integer as if it were a pointer, and even if you do it explicitly -- that is, with a cast, as in *((int *)y) = 3, you may still be on shaky ground, especially if integers and pointers don't have the same size on your machine.
These days, this is all generally such a bad idea that the compilers are slowly dropping their old "the programmer must know what he's doing" attitude, and getting somewhat hissy with warnings. For example, int y = p will generally get you a warning about a pointer-to-int assignment, and even with the explicit cast, *((int *)y) = 3 might get you a warning about "cast to pointer from integer of different size".
Boring answer - because that's how the language is defined:
6.5.3.2 Address and indirection operators
Constraints
1 The operand of the unary & operator shall be either a function designator, the result of a
[] or unary * operator, or an lvalue that designates an object that is not a bit-field and is
not declared with the register storage-class specifier.
2 The operand of the unary * operator shall have pointer type.
C 2011 Online Draft
Slightly-less boring answer:
Pointers are not integers; they do not behave like integers. The operations on pointers and integers are different. While there is such a thing as pointer arithmetic, it does not behave like integer arithmetic. Pointers are abstractions of memory addresses, which do not have to have integer representation.
Type matters in C (not as much as in some other languages, but it does matter). Operations on integer types do not apply to pointer types and vice versa, just like operations on aggregate (struct or array types) do not apply to integer types.
You can't use * on an integer operand for the same reason you can't use [] or () or . or -> on an integer operand; those operations are not defined for integer types.
The reason that we can't use * on integer because pointer and integer both are different.
you can not use int y to store address because integer can,t hold addresses it can only take integer value.
int y = p; // y holds x's address
*y = 3; // this is invalid and throws an error when compiling
*((int*)y) = 3; // this is okay
On the other hand pointer are designed to store address of variable so you can use * to access the value store at that address while integer can not store address.
operations for both integer and pointer are different from each other
*((int*)y) = 3; // this is okay No, this is not ok, it causes UB. Just because the compiler does not complain does not mean the code is ok or is free from UB.

Didn't understand the output of this code [duplicate]

When a pointer to a particular type (say int, char, float, ..) is incremented, its value is increased by the size of that data type. If a void pointer which points to data of size x is incremented, how does it get to point x bytes ahead? How does the compiler know to add x to value of the pointer?
Final conclusion: arithmetic on a void* is illegal in both C and C++.
GCC allows it as an extension, see Arithmetic on void- and Function-Pointers (note that this section is part of the "C Extensions" chapter of the manual). Clang and ICC likely allow void* arithmetic for the purposes of compatibility with GCC. Other compilers (such as MSVC) disallow arithmetic on void*, and GCC disallows it if the -pedantic-errors flag is specified, or if the -Werror=pointer-arith flag is specified (this flag is useful if your code base must also compile with MSVC).
The C Standard Speaks
Quotes are taken from the n1256 draft.
The standard's description of the addition operation states:
6.5.6-2: For addition, either both
operands shall have arithmetic type,
or one operand shall be a pointer to
an object type and the other shall
have integer type.
So, the question here is whether void* is a pointer to an "object type", or equivalently, whether void is an "object type". The definition for "object type" is:
6.2.5.1: Types are partitioned into object types (types that fully describe objects) , function types (types that describe functions), and incomplete types (types that describe objects but lack information needed to determine their sizes).
And the standard defines void as:
6.2.5-19: The void type comprises
an empty set of values;
it is an incomplete type that cannot
be completed.
Since void is an incomplete type, it is not an object type. Therefore it is not a valid operand to an addition operation.
Therefore you cannot perform pointer arithmetic on a void pointer.
Notes
Originally, it was thought that void* arithmetic was permitted, because of these sections of the C standard:
6.2.5-27: A pointer to void shall have the same representation and alignment
requirements as a pointer to a
character type.
However,
The same representation and alignment
requirements are meant to imply
interchangeability as arguments to
functions, return values from
functions, and members of unions.
So this means that printf("%s", x) has the same meaning whether x has type char* or void*, but it does not mean that you can do arithmetic on a void*.
Pointer arithmetic is not allowed on void* pointers.
cast it to a char pointer an increment your pointer forward x bytes ahead.
The C standard does not allow void pointer arithmetic. However, GNU C is allowed by considering the size of void is 1.
C11 standard §6.2.5
Paragraph - 19
The void type comprises an empty set of values; it is an incomplete
object type that cannot be completed.
Following program is working fine in GCC compiler.
#include<stdio.h>
int main()
{
int arr[2] = {1, 2};
void *ptr = &arr;
ptr = ptr + sizeof(int);
printf("%d\n", *(int *)ptr);
return 0;
}
May be other compilers generate an error.
You can't do pointer arithmetic on void * types, for exactly this reason!
Void pointers can point to any memory chunk. Hence the compiler does not know how many bytes to increment/decrement when we attempt pointer arithmetic on a void pointer. Therefore void pointers must be first typecast to a known type before they can be involved in any pointer arithmetic.
void *p = malloc(sizeof(char)*10);
p++; //compiler does how many where to pint the pointer after this increment operation
char * c = (char *)p;
c++; // compiler will increment the c by 1, since size of char is 1 byte.
You have to cast it to another type of pointer before doing pointer arithmetic.
[answer copied from a comment on a later, duplicate question]
Allowing arithmetic on void pointers is a controversial, nonstandard extension. If you're thinking in assembly language, where pointers are just addresses, arithmetic on void pointers makes sense, and adding 1 just adds 1. But if you're thinking in C terms, using C's model of pointer arithmetic, adding 1 to any pointer p actually adds sizeof(*p) to the address, and this is what you want pointer arithmetic to do, but since sizeof(void) is 0, it breaks down for void pointers.
If you're thinking in C terms you don't mind that it breaks down, and you don't mind inserting explicit casts to (char *) if that's the arithmetic you want. But if you're thinking in assembler you want it to just work, which is why the extension (though a departure from the proper definition of pointer arithmetic in C) is desirable in some circles, and provided by some compilers.
Pointer arithmetic is not allowed in the void pointer.
Reason: Pointer arithmetic is not the same as normal arithmetic, as it happens relative to the base address.
Solution: Use the type cast operator at the time of the arithmetic, this will make the base data type known for the expression doing the pointer arithmetic.
ex: point is the void pointer
*point=*point +1; //Not valid
*(int *)point= *(int *)point +1; //valid
Compiler knows by type cast. Given a void *x:
x+1 adds one byte to x, pointer goes to byte x+1
(int*)x+1 adds sizeof(int) bytes, pointer goes to byte x + sizeof(int)
(float*)x+1 addres sizeof(float) bytes,
etc.
Althought the first item is not portable and is against the Galateo of C/C++, it is nevertheless C-language-correct, meaning it will compile to something on most compilers possibly necessitating an appropriate flag (like -Wpointer-arith)

Using Increment operator with de-referencing in C

To my function i get a void pointer, I would like to point to the next location considering the incoming pointer is of char type.
int doSomething( void * somePtr )
{
((char*)somePtr)++; // Gives Compilation error
}
I get the following compilation error:
Error[Pe137]: expression must be a modifiable lvalue
Is this an issue with the priority of operators?
A cast does not yield an lvalue (see section 6.5.4 footnote 104 of C11 standard), therefore you can't apply post increment ++ operator to its result.
c-faq: 4.5:
In C, a cast operator does not mean "pretend these bits have a different type, and treat them accordingly"; it is a conversion operator, and by definition it yields an rvalue, which cannot be assigned to, or incremented with ++. (It is either an accident or a deliberate but nonstandard extension if a particular compiler accepts expressions such as the above.)
Try this instead
char *charPtr = ((char*)somePtr);
charPtr++;
If you want to move the pointer to next then you can use:
*ptr++;
If you want to Change copy the pointer position to another variable then:
char *abc = (char*)(def + 1);
It really depends on your motive to do things

Pass integer address in method expecting unsigned char pointer

Recently I had code (in C) where I passed the address of an int to a function expecting a pointer to unsigned char. Is this not valid? Is this UB or what?
e.g.,
void f(unsigned char*p)
{
// do something
}
// Call it somewhere
int x = 0; // actually it was uint32_t if it makes difference
f(&x);
I did get a warning though ... Compiled in Xcode
int * and unsigned char * are not considered compatible types, so implicit conversion will issue a diagnostic. However, the standard does allow explicit casting between different pointers, subject to two rules (C11 section 6.3.2.3):
Converting a type "pointer to A" to type "pointer to B" and back to "pointer to A" shall result in the same original pointer. (i.e., if p is of type int *, then (int *)(double *)p will yield p)
Converting any pointer to a char * will point to the lowest-addressable byte of the object.
So, in your case, an explicit (unsigned char *) cast will yield a conforming program without any undefined behavior.
The cast is required, see C11 (n1570) 6.5.2.2 p.2:
[…] Each argument shall have a type such that its value may be assigned to an object with the unqualified version of the type of its corresponding parameter.
This refers to the rules for assignment, the relevant part is (ibid. 6.5.16.1 p.1)
One of the following shall hold:
[…]
the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right.
[…]
And unsigned char isn’t compatible to int.
These rules both appear in a “constraint” section, where “shall” means that the compiler has to give a “diagnostic message” (cf. C11 5.1.1.3) and may stop compiling (or whatever, everything beyond that diagnostic is, strictly speaking, out of the scope of the C standard). Your code is an example of a constraint violation.
Other examples of constraint violations are calling a (prototyped and non-variadic) function with the wrong number of arguments, using bitwise operators on doubles, or redeclaring an identifier with an incompatible type in the same scope, ibid. 5.1.1.3 p.2:
Example
An implementation shall issue a diagnostic for the translation unit:
char i;
int i;
because in those cases where wording in this International Standard describes the behavior for a construct as being both a constraint error and resulting in undefined behavior, the constraint error shall be diagnosed.
Syntax violations are treated equally.
So, strictly speaking, your program is as invalid as
int foo(int);
int main() {
It's my birthday!
foo(0.5 ^ 42, 12);
}
which a conforming implementation very well may compile, maybe to a program having undefined behavior, as long as it gives at least one diagnostic (e.g. a warning).
For e.g. gcc, a warning is a diagnostic (you can turn syntax and constraint violations into errors with -pedantic-errors).
The term ill-formed may be used to refer to either a syntax or a constraint violation, the C standard doesn't use this term, but cf. C++11 (n3242):
1.3.9
ill-formed program
program that is not well formed
1.3.26
well-formed program
C++ program constructed according to the syntax rules, diagnosable semantic rules, and the One Definition Rule.
The language-lawyer attitude aside, your code will probably always either be not compiled at all (which should be reason enough to do the cast), or show the expected behavior.
C11, §6.5.2.2:
2 Each argument shall have a type such that its value may be assigned to an object with the unqualified version of the type of its corresponding parameter.
§6.5.16.1 describes assignment in terms of a list of constraints, including
the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right
int and unsigned char are not compatible types, so the program is not well-formed and the Standard doesn't even guarantee that it will compile.
Although some would say "it is undefined behavior according to the standard", here is what happens de-facto (answering by an example):
Safe:
void f(char* p)
{
char r, w = 0;
r = p[0]; // read access
p[0] = w; // write access
}
...
int x = 0;
f((char*)&x); // the casting is just in order to emit the compilation warning
This code is safe as long as you access memory with p[i], where 0 <= i <= sizeof(int)-1.
Unsafe:
void f(int* p)
{
int r, w = 0;
r = p[0]; // read access
p[0] = w; // write access
}
...
char x[sizeof(int)] = {0};
f((int*)&x); // the casting is just in order to emit the compilation warning
This code is unsafe because although the allocated variable is large enough to accommodate an int, its address in memory is not necessarily a multiple of sizeof(int). As a result, unless the compiler (as well as the underlying HW architecture) supports unaligned load/store operations, a memory access violation will occur during runtime if the address of this variable in memory is indeed not properly aligned.

Is (int *)0 a null pointer?

This could be thought of as an extension to this question (I'm interested in C only, but adding C++ to complete the extension)
The C11 standard at 6.3.2.3.3 says:
An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant.
What my take on this personally is that 0 and (void *)0 represent the null pointer, whose integer value may not actually be 0, but that doesn't cover 0 cast to any other type.
But, the standard then continues:
If a null pointer constant is converted to a pointer type, the resulting pointer, called a null pointer, ...
which covers (int *)0 as null pointer since cast is an explicit conversion (C11, 6.3) which is listed under conversion methods.
However, what still makes me wonder is the following phrase
... or such an expression cast to type void * ...
With the above semantics, this phrase seems completely useless. The question is, is this phrase completely useless? If not, what implications does it have? Consequently, is (int *)0 the null pointer or not?
Another question that can help the discussion is the following. Is (long long)123 considered "123 converted to long long", or "123 with type long long". In other words, is there any conversion in (long long)123? If there is none, then the second quote above doesn't cover (int *)0 as a null pointer.
Short answer:
In both C and C++, (int *)0 is a constant expression whose value is a null pointer. It is not, however, a null pointer constant. The only observable difference between a constant-expression-whose-value-is-a-null-pointer and a null-pointer-constant, that I know of, is that a null-pointer-constant can be assigned to an lvalue of any pointer type, but a constant-expression-whose-value-is-a-null-pointer has a specific pointer type and can only be assigned to an lvalue with a compatible type. In C, but not C++, (void *)0 is also a null pointer constant; this is a special case for void * consistent with the general C-but-not-C++ rule that void * is assignment compatible with any other pointer-to-object type.
For example:
long *a = 0; // ok, 0 is a null pointer constant
long *b = (long *)0; // ok, (long *)0 is a null pointer with appropriate type
long *c = (void *)0; // ok in C, invalid conversion in C++
long *d = (int *)0; // invalid conversion in both C and C++
And here's a case where the difference between the null pointer constant (void *)0 and a constant-expression-whose-value-is-a-null-pointer with type void * is visible, even in C:
typedef void (*fp)(void); // any pointer-to-function type will show this effect
fp a = 0; // ok, null pointer constant
fp b = (void *)0; // ok in C, invalid conversion in C++
fp c = (void *)(void *)0; // invalid conversion in both C and C++
Also, it's moot nowadays, but since you brought it up: No matter what the bit representation of long *'s null pointer is, all of these assertions behave as indicated by the comments:
// 'x' is initialized to a null pointer
long *x = 0;
// 'y' is initialized to all-bits-zero, which may or may not be the
// representation of a null pointer; moreover, it might be a "trap
// representation", UB even to access
long *y;
memset(&y, 0, sizeof y);
assert (x == 0); // must succeed
assert (x == (long *)0); // must succeed
assert (x == (void *)0); // must succeed in C, unspecified behavior in C++
assert (x == (int *)0); // invalid comparison in both C and C++
assert (memcmp(&x, &y, sizeof y) == 0); // unspecified
assert (y == 0); // UNDEFINED BEHAVIOR: y may be a trap representation
assert (y == x); // UNDEFINED BEHAVIOR: y may be a trap representation
"Unspecified" comparisons do not provoke undefined behavior, but the standard doesn't say whether they evaluate true or false, and the implementation is not required to document which of the two it is, or even to pick one and stick to it. It would be perfectly valid for the above memcmp to alternate between returning 0 and 1 if you called it many times.
Long answer with standard quotes:
To understand what a null pointer constant is, you first have to understand what an integer constant expression is, and that's pretty hairy -- a complete understanding requires you to read sections 6.5 and 6.6 of C99 in detail. This is my summary:
A constant expression is any C expression which the compiler can evaluate to a constant without knowing the value of any object (const or otherwise; however, enum values are fair game), and which has no side effects. (This is a drastic simplification of roughly 25 pages of standardese and may not be exact.)
Integer constant expressions are a restricted subset of constant expressions, conveniently defined in a single paragraph, C99 6.6p6 and its footnote:
An integer constant expression96 shall have integer type and shall only have operands that are integer constants, enumeration constants, character constants, sizeof expressions whose results are integer constants, and floating constants that are the immediate operands of casts. Cast operators in an integer constant expression shall only convert arithmetic types to integer types, except as part of an operand to the sizeof
operator.
96 An integer constant expression is used to specify the size of a bit-field member of a structure, the value of an enumeration constant, the size of an array, or the value of a case constant. Further constraints that apply to the integer constant expressions used in [#if] are discussed in 6.10.1.
For purpose of this discussion, the important bit is
Cast operators ... shall only convert arithmetic types to integer types
which means that (int *)0 is not an integer constant expression, although it is a constant expression.
The C++98 definition appears to be more or less equivalent, modulo C++ features and deviations from C. For instance, the stronger separation of character and boolean types from integer types in C++ means that the C++ standard speaks of "integral constant expressions" rather than "integer constant expressions", and then sometimes requires not just an integral constant expression, but an integral constant expression of integer type, excluding char, wchar_t, and bool (and maybe also signed char and unsigned char? it's not clear to me from the text).
Now, the C99 definition of null pointer constant is what this question is all about, so I'll repeat it: 6.3.2.3p3 says
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant. If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.
Standardese is very, very literal. Those two sentences mean exactly the same thing as:
An integer constant expression with the value 0 is called a null pointer constant.
An integer constant expression with the value 0, cast to type void *, is also a null pointer constant.
When any null pointer constant is converted to a pointer type, the resulting pointer is called a null pointer and is guaranteed to compare unequal ...
(Italics - definition of term. Boldface - my emphasis.) So what that means is, in C, (long *)0 and (long *)(void *)0 are two ways of writing exactly the same thing, namely the null pointer with type long *.
C++ is different. The equivalent text is C++98 4.10 [conv.ptr]:
A null pointer constant is an integral constant expression (5.19) rvalue of integer type that evaluates to zero.
That's all. "Integral constant expression rvalue of integer type" is very nearly the same thing as C99's "integer constant expression", but there are a few things that qualify in C but not C++: for instance, in C the character literal '\x00' is an integer constant expression, and therefore a null pointer constant, but in C++ it is not an integral constant expression of integer type, so it is not a null pointer constant either.
More to the point, though, C++ doesn't have the "or such an expression cast to void *" clause. That means that ((void *)0) is not a null pointer constant in C++. It is still a null pointer, but it is not assignment compatible with any other pointer type. This is consistent with C++'s generally pickier type system.
C++11 (but not, AFAIK, C11) revised the concept of "null pointer", adding a special type for them (nullptr_t) and a new keyword which evaluates to a null pointer constant (nullptr). I do not fully understand the changes and am not going to try to explain them, but I am pretty sure that a bare 0 is still a valid null pointer constant in C++11.
Evaluating the expression (int*)0 yields a null pointer of type int*.
(int*)0 is not a null pointer constant.
A null pointer constant is a particular kind of expression that may appear in C source code. A null pointer is a value that may occur in a running program.
C and C++ (being two distinct languages) have slightly different rules in this area. C++ doesn't have the "or such an expression cast to type void*" wording. But I don't think that affects the answer to your question.
As for your question about (long long)123, I'm not sure how it's related, but the expression 123 is of type int, and the cast specifies a conversion from int to long long.
I think the core confusion is an assumption that the cast in (int*)0 does not specify a conversion, since 0 is already a null pointer constant. But a null pointer constant is not necessarily an expression of pointer type. In particular, the expression 0 is both a null pointer constant and an expression of type int; it is not of any pointer type. The term null pointer constant needs to be thought of as a single concept, not a phrase whose meaning depends on the individual words that make it up.

Resources