Variables in memory - c

If i have two variables a i b both int, and one pointer ptr that points to &b. If we would increment ptr++ like that it should be pointing at a,if i'm not wrong. I thought it's possible because when compiling a i b are in stack and b has 4 bytes less than a. But when i print that pointer in next line i only get address.
Code:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int a = 52;
int b = 12;
int *ptr;
ptr = &b;
printf("%d\n",*ptr);
ptr++;
printf("\n%d",*ptr);
return 0;
}
but if i put printf("%d",&a); then last printf is printed good and it prints value of a
Code:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int a = 52;
int b = 12;
printf("%d\n",&a);
int *ptr;
ptr = &b;
printf("%d\n",*ptr);
ptr++;
printf("\n%d",*ptr);
return 0;
}
Can someone explain me why this happens?
Pictures:

The compiler is free to arrange local variables in any order it chooses on the stack. In fact the C standard doesn't even mention a stack. That's an implementation detail left up to the compiler.
Adding a seemingly unrelated line of code can result in the compiler deciding to place variables on the stack in a different order than it did without the additional code. So you can't depend on this behavior when writing your code. Doing so is undefined behavior, which you have experienced.
Also, performing pointer arithmetic on variables that are not part of the same array is also undefined behavior.

C11 draft standard n1570:
6.5.2.4 Postfix increment and decrement operators
2
[...] See the discussions of additive operators and compound assignment for
information on constraints, types, and conversions and the effects of operations on
pointers.[...]
6.5.6 Additive operators
7
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
8
[...] If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
After ptr = &b; and ptr++;, dereferencing ptr in printf("\n%d",*ptr); is undefined behavior.

You can't guarantee that a and b variables are stored anywhere near in memory, and it's plain unsafe to try to "travel" from one to another by pointer increments, and rely on the results. What you're doing is dwelling into the realm of undefined behavior, you shouldn't do that.

Related

Question on "array objects" and undefined behavior

In C, suppose for a pointer p we do *p++ = 0. If p points to an int variable, is this defined behavior?
You can do arithmetic resulting in pointing one past the end of an "array object" per the standard, but I am unable to find a really precise definition of "array object" in the standard. I don't think in this context it means just an object explicitly defined as an array, because p=malloc(sizeof(int)); ++p; pretty clearly is intended to be defined behavior.
If a variable does not qualify as an "array object", then as far as I can tell *p++ = 0 is undefined behavior.
I am using the C23 draft, but an answer citing the C11 standard would probably answer the question too.
Yes it is well-defined. Pointer arithmetic is defined by the additive operators so that's where you need to look.
C17 6.5.6/7
For the purposes of these operators, a pointer to an object that is not an element of an array behaves
the same as a pointer to the first element of an array of length one with the type of the object as its
element type.
That is, int x; is to be regarded as equivalent to int x[1]; for the purpose of determining valid pointer arithmetic.
Given int x; int* p = &x; *p++ = 0; then it is fine to point 1 item past it but not to de-reference that item:
C17 6.5.6/8
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation
shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
This behavior has not changed in the various revisions of the standard. It's the very same from C90 to C23.
There are two separate questions: 1. What constructs does the Standard specify that correct conforming implementations should process meaningfully, and 2. What constructs do clang and gcc actually process meaningfully. The clear intention of the Standard is to define the behavior of a pointer "one past" an array object and a pointer to the start of another array object that happens to immediately follow it. The actual behavior of clang and gcc tells another story, however.
Given the source code:
#include <stdint.h>
extern int x[],y[];
int test1(int *p)
{
y[0] = 1;
if (p == x+1)
*p = 2;
return y[0];
}
int test2(int *p)
{
y[0] = 1;
uintptr_t p1 = 3*(uintptr_t)(x+1);
uintptr_t p2 = 5*(uintptr_t)p;
if (5*p1 == 3*p2)
*p = 2;
return y[0];
}
both clang and gcc will recognize in both functions that the *p=2 assignment will only run if p happens to be equal to a one-past pointer to x, and will conclude as a consequence that it would be impossible for p to equal y. Construction of an executable example where clang and gcc would erroneously make this assumption is difficult without the ability to execute a program containing two compilation units, but examination of the generated machine code at https://godbolt.org/z/x78GMqbrv will reveal that every ret instruction is immediately preceded by mov eax,1, which loads the return value with 1.
Note that the code in test2 doesn't compare pointers, nor even compare integers that are directly formed from pointers, but the fact that clang and gcc are able to show that the numbers being compared can only be equal if the pointers happened to be equal is sufficient for test2() to, as perceived by clang or gcc, invoke UB if the function is passed a pointer to y, and y happens to equal x+1.

does decrementing a NULL pointer lead to undefined behavior?

Decrementing a NULL pointer on my machine still gives a NULL pointer, I wonder if this is well defined.
char *p = NULL;
--p;
Yes, the behavior is undefined.
--p is equivalent to p = p - 1 (except that p is only evaluated once, which doesn't matter in this case).
N1570 6.5.6 paragraph 8, discussing additive operators, says:
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
[...]
If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
Since your pointer value p doesn't point to an element of an array object or one past the last element of an array object, the behavior of p - 1 is undefined.
(Incidentally, I'd be surprised if your code caused p to be a null pointer -- though since the behavior is undefined the language certainly permits it. I can imagine an optimizing compiler ignoring the --p; because it knows its behavior is undefined, but I haven't seen that myself. How do you know p is null?)
As far as I see with GCC it does not generate a null pointer. Decrementing is just subtracting a number. With underflow the number just wraps around. You can see that here.
#include "stdio.h"
#include <inttypes.h>
int main()
{
char *p = NULL;
printf("%zx\n", (uintptr_t)p);
--p;
printf("%zx\n", (uintptr_t)p);
}
Output is
0
ffffffffffffffff
https://wandbox.org/permlink/gNzc38RWGSBi9tS3

C - Assigning pointers to arrays? Based on four examples

Assume following Code:
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char **argv)
{
int arrayXYZ[10];
int i;
int *pi;
int intVar;
for (i=0; i<10; i++){
arrayXYZ[i] = i;
}
pi = arrayXYZ; // Reference 1
pi++; // Reference 2
arrayXYZ++; // Reference 3
arrayXYZ = pi; // Reference 4
}
Reference 1 is correct: pi points to first element in arrayXYZ -> *pi = 0
Reference 2 is correct: element to which pi points is incremented -> *pi = 1
Reference 3 is not correct: I am not completely sure why. Every integer needs 4 bits of memory. Hence, we cannot increment the address of the head of the array by just one? Assume, we had a char array with sizeof(char)=1 -> Would the head of the array point to the next bucket?
Reference 4 is not correct: I am not completely sure why. Why cannot the head of the array point to the address to which pi points?
Thanks for all clarifications!
I am a new member, so if my question doesn't follow the Stackoverflow guidelines, feel free to tell me how I can improve my next questions!
arrayXYZ++;
This is equivalent to:
arrayXYZ += 1;
which is equivalent to:
arrayXYZ = arrayXYZ + 1;
This is not allowed because the C language does not allow it. An array can not be assigned to.
arrayXYZ = pi;
This fails for the same reason. An array can not be assigned to.
The other assignments work because you are allowed to assign to a pointer.
Also keep in mind that arrays and pointers are distinct datatypes. In C, there are circumstances where arrays decay into a pointer to their first element for convenience purposes. Which is why this works:
pi = arrayXYZ;
However, this is just an automatic conversion, so that you don't have to write:
pi = &arrayXYZ[0];
This automatic conversion does not mean that arrays are the same thing as pointers.
From C11 standard §6.3.2.1 (N1570)
An lvalue is an expression (with an object type other than void) that potentially designates an object;64) if an lvalue does not designate an object when it is evaluated, the behavior is undefined. When an object is said to have a particular type, the type is specified by the lvalue used to designate the object. A modifiable lvalue is an lvalue that does not have array type, does not have an incomplete type, does not have a const- qualified type, and if it is a structure or union, does not have any member (including, recursively, any member or element of all contained aggregates or unions) with a const- qualified type.
And also From §6.5.2.4
The operand of the postfix increment or decrement operator shall have atomic, qualified, or unqualified real or pointer type, and shall be a modifiable lvalue.
As pointed out here these are the reasons why those statements are illegal. Same way for assignment operation the left one has to be modifiable. Here it is not. That's why the problem.
Now to explain why the other two works - there is a thing called array decay. Array in most situations (exceptions are when used in operand of &, sizeof etc) are converted to pointer to the first element of the array and that pointer is being assigned to the pi. This is modifiable. And that's why you can easily apply ++ over it.

Is it OK to access past the size of a structure via member address, with enough space allocated?

Specifically, is the following code, the line below the marker, OK?
struct S{
int a;
};
#include <stdlib.h>
int main(){
struct S *p;
p = malloc(sizeof(struct S) + 1000);
// This line:
*(&(p->a) + 1) = 0;
}
People have argued here, but no one has given a convincing explanation or reference.
Their arguments are on a slightly different base, yet essentially the same
typedef struct _pack{
int64_t c;
} pack;
int main(){
pack *p;
char str[9] = "aaaaaaaa"; // Input
size_t len = offsetof(pack, c) + (strlen(str) + 1);
p = malloc(len);
// This line, with similar intention:
strcpy((char*)&(p->c), str);
// ^^^^^^^
The intent at least since the standardization of C in 1989 has been that implementations are allowed to check array bounds for array accesses.
The member p->a is an object of type int. C11 6.5.6p7 says that
7 For the purposes of [additive operators] a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
Thus
&(p->a)
is a pointer to an int; but it is also as if it were a pointer to the first element of an array of length 1, with int as the object type.
Now 6.5.6p8 allows one to calculate &(p->a) + 1 which is a pointer to just past the end of the array, so there is no undefined behaviour. However, the dereference of such a pointer is invalid. From Appendix J.2 where it is spelt out, the behaviour is undefined when:
Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond the array object and is used as the operand of a unary * operator that is evaluated (6.5.6).
In the expression above, there is only one array, the one (as if) with exactly 1 element. If &(p->a) + 1 is dereferenced, the array with length 1 is accessed out of bounds and undefined behaviour occurs, i.e.
behavior [...], for which [The C11] Standard imposes no requirements
With the note saying that:
Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
That the most common behaviour is ignoring the situation completely, i.e. behaving as if the pointer referenced the memory location just after, doesn't mean that other kind of behaviour wouldn't be acceptable from the standard's point of view - the standard allows every imaginable and unimaginable outcome.
There has been claims that the C11 standard text has been written vaguely, and the intention of the committee should be that this indeed be allowed, and previously it would have been alright. It is not true. Read the part from the committee response to [Defect Report #017 dated 10 Dec 1992 to C89].
Question 16
[...]
Response
For an array of arrays, the permitted pointer arithmetic in
subclause 6.3.6, page 47, lines 12-40 is to be understood by
interpreting the use of the word object as denoting the specific
object determined directly by the pointer's type and value, not other
objects related to that one by contiguity. Therefore, if an expression
exceeds these permissions, the behavior is undefined. For example, the
following code has undefined behavior:
int a[4][5];
a[1][7] = 0; /* undefined */
Some conforming implementations may
choose to diagnose an array bounds violation, while others may
choose to interpret such attempted accesses successfully with the
obvious extended semantics.
(bolded emphasis mine)
There is no reason why the same wouldn't be transferred to scalar members of structures, especially when 6.5.6p7 says that a pointer to them should be considered to behave the same as a pointer to the first element of an array of length one with the type of the object as its element type.
If you want to address the consecutive structs, you can always take the pointer to the first member and cast that as the pointer to the struct and advance that instead:
*(int *)((S *)&(p->a) + 1) = 0;
This is undefined behavior, as you are accessing something that is not an array (int a within struct S) as an array, and out of bounds at that.
The correct way to achieve what you want, is to use an array without a size as the last struct member:
#include <stdlib.h>
typedef struct S {
int foo; //avoid flexible array being the only member
int a[];
} S;
int main(){
S *p = malloc(sizeof(*p) + 2*sizeof(int));
p->a[0] = 0;
p->a[1] = 42; //Perfectly legal.
}
C standard guarantees that
§6.7.2.1/15:
[...] A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
&(p->a) is equivalent to (int *)p. &(p->a) + 1 will be address of the element of the second struct. In this case, only one element is there, there will not be any padding in the structure so this will work but where there will be padding this code will break and leads to undefined behaviour.

Does applying post-decrement on a pointer already addressing the base of an array invoke undefined behavior?

After hunting for a related or duplicate question concerning the following to no avail (I can only do marginal justice to describe the sheer number of pointer-arithmetic and post-decrement questions tagged with C, but suffice it to say "boatloads" does a grave injustice to that result set count) I toss this in the ring in hopes of clarification or a referral to a duplicate that eluded me.
If the post-decrement operator is applied to a pointer such as below, a simple reverse-iteration of an array sequence, does the following code invoke undefined behavior?
#include <stdio.h>
#include <string.h>
int main()
{
char s[] = "some string";
const char *t = s + strlen(s);
while(t-->s)
fputc(*t, stdout);
fputc('\n', stdout);
return 0;
}
It was recently proposed to me that 6.5.6.p8 Additive operators, in conjunction with 6.5.2.p4, Postfix increment and decrement operators, specifies even performing a post-decrement upon t when it already contains the base-address of s invokes undefined behavior, regardless of whether the resulting value of t (not the t-- expression result) is evaluated or not. I simply want to know if that is indeed the case.
The cited portions of the standard were:
6.5.6 Additive Operators
If both the pointer operand and the result point to elements of the
same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior
is undefined.
and its nearly tightly coupled relationship with...
6.5.2.4 Postfix increment and decrement operators Constraints
The operand of the postfix increment or decrement operator shall have
atomic, qualified, or unqualified real or pointer type, and shall be a
modifiable lvalue.
Semantics
The result of the postfix ++ operator is the value of the operand. As a side effect, the value of the operand object is incremented (that is, the value 1 of the appropriate type is added to it). See the discussions of additive operators and compound assignment for information on constraints, types, and conversions and the effects of operations on pointers. The value computation of the result is sequenced before the side effect of updating the stored value of the operand. With respect to an indeterminately-sequenced function call, the operation of postfix ++ is a single evaluation. Postfix ++ on an object with atomic type is a read-modify-write operation with memory_order_seq_cst memory order semantics.98)
The postfix -- operator is analogous to the postfix ++ operator, except that the value of the operand is decremented (that is, the value 1 of the appropriate type is subtracted from it).
Forward references: additive operators (6.5.6), compound assignment (6.5.16.2).
The very reason for using the post-decrement operator in the posted sample is to avoid evaluating an eventually-invalid address value against the base address of the array. For example, the code above was a refactor of the following:
#include <stdio.h>
#include <string.h>
int main()
{
char s[] = "some string";
size_t len = strlen(s);
char *t = s + len - 1;
while(t >= s)
{
fputc(*t, stdout);
t = t - 1;
}
fputc('\n', stdout);
}
Forgetting for a moment this has a non-zero-length string for s, this general algorithm clearly has issues (perhaps not as clearly to some). If s[] were instead "", then t would be assigned a value of s-1, which itself is not in the valid range of s through its one-past-address, and the evaluation for comparison against s that ensues is no good. If s has non-zero length, that addresses the initial s-1 problem, but only temporarily, as eventually this is still counting on that value (whatever it is) being valid for comparison against s to terminate the loop. It could be worse. it could have naively been:
size_t len = strlen(s) - 1;
char *t = s + len;
This has disaster written all over it if s were a zero-length string. The refactored code of this question opened with was intended to address all of these issues. But...
My paranoia may be getting to me, but it isn't paranoia if they're really all out to get you. So, per the standard (these sections, or perhaps others), does the original code (scroll to the top of this novel if you forgot what it looks like by now) indeed invoke undefined behavior or not?
I am pretty certain that the result of the post-decrement in this case is indeed undefined behaviour. The post-decrement clearly subtracts one from a pointer to the beginning of an object, so the result does not point to an element of the same array, and by the definition of pointer arithmetic (§6.5.6/8, as cited in the OP) that's undefined behaviour. The fact that you never use the resulting pointer is irrelevant.
What's wrong with:
char *t = s + strlen(s);
while (t > s) fputc(*--t, stdout);
Interesting but irrelevant fact: The implementation of reverse iterators in the standard C++ library usually holds in the reverse iterator a pointer to one past the target element. This allows the reverse iterator to be used normally without ever involving a pointer to "one before the beginning" of the container, which would be UB, as above.

Resources