Question on "array objects" and undefined behavior

Question on "array objects" and undefined behavior - arrays

In C, suppose for a pointer p we do *p++ = 0. If p points to an int variable, is this defined behavior?
You can do arithmetic resulting in pointing one past the end of an "array object" per the standard, but I am unable to find a really precise definition of "array object" in the standard. I don't think in this context it means just an object explicitly defined as an array, because p=malloc(sizeof(int)); ++p; pretty clearly is intended to be defined behavior.
If a variable does not qualify as an "array object", then as far as I can tell *p++ = 0 is undefined behavior.
I am using the C23 draft, but an answer citing the C11 standard would probably answer the question too.

Yes it is well-defined. Pointer arithmetic is defined by the additive operators so that's where you need to look.
C17 6.5.6/7
For the purposes of these operators, a pointer to an object that is not an element of an array behaves
the same as a pointer to the first element of an array of length one with the type of the object as its
element type.
That is, int x; is to be regarded as equivalent to int x[1]; for the purpose of determining valid pointer arithmetic.
Given int x; int* p = &x; *p++ = 0; then it is fine to point 1 item past it but not to de-reference that item:
C17 6.5.6/8
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation
shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
This behavior has not changed in the various revisions of the standard. It's the very same from C90 to C23.

There are two separate questions: 1. What constructs does the Standard specify that correct conforming implementations should process meaningfully, and 2. What constructs do clang and gcc actually process meaningfully. The clear intention of the Standard is to define the behavior of a pointer "one past" an array object and a pointer to the start of another array object that happens to immediately follow it. The actual behavior of clang and gcc tells another story, however.
Given the source code:
#include <stdint.h>
extern int x[],y[];
int test1(int *p)
{
y[0] = 1;
if (p == x+1)
*p = 2;
return y[0];
}
int test2(int *p)
{
y[0] = 1;
uintptr_t p1 = 3*(uintptr_t)(x+1);
uintptr_t p2 = 5*(uintptr_t)p;
if (5*p1 == 3*p2)
*p = 2;
return y[0];
}
both clang and gcc will recognize in both functions that the *p=2 assignment will only run if p happens to be equal to a one-past pointer to x, and will conclude as a consequence that it would be impossible for p to equal y. Construction of an executable example where clang and gcc would erroneously make this assumption is difficult without the ability to execute a program containing two compilation units, but examination of the generated machine code at https://godbolt.org/z/x78GMqbrv will reveal that every ret instruction is immediately preceded by mov eax,1, which loads the return value with 1.
Note that the code in test2 doesn't compare pointers, nor even compare integers that are directly formed from pointers, but the fact that clang and gcc are able to show that the numbers being compared can only be equal if the pointers happened to be equal is sufficient for test2() to, as perceived by clang or gcc, invoke UB if the function is passed a pointer to y, and y happens to equal x+1.

Related

does decrementing a NULL pointer lead to undefined behavior?

Decrementing a NULL pointer on my machine still gives a NULL pointer, I wonder if this is well defined.
char *p = NULL;
--p;

Yes, the behavior is undefined.
--p is equivalent to p = p - 1 (except that p is only evaluated once, which doesn't matter in this case).
N1570 6.5.6 paragraph 8, discussing additive operators, says:
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
[...]
If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.
Since your pointer value p doesn't point to an element of an array object or one past the last element of an array object, the behavior of p - 1 is undefined.
(Incidentally, I'd be surprised if your code caused p to be a null pointer -- though since the behavior is undefined the language certainly permits it. I can imagine an optimizing compiler ignoring the --p; because it knows its behavior is undefined, but I haven't seen that myself. How do you know p is null?)

As far as I see with GCC it does not generate a null pointer. Decrementing is just subtracting a number. With underflow the number just wraps around. You can see that here.
#include "stdio.h"
#include <inttypes.h>
int main()
{
char *p = NULL;
printf("%zx\n", (uintptr_t)p);
--p;
printf("%zx\n", (uintptr_t)p);
}
Output is
0
ffffffffffffffff
https://wandbox.org/permlink/gNzc38RWGSBi9tS3

Is it OK to access past the size of a structure via member address, with enough space allocated?

Specifically, is the following code, the line below the marker, OK?
struct S{
int a;
};
#include <stdlib.h>
int main(){
struct S *p;
p = malloc(sizeof(struct S) + 1000);
// This line:
*(&(p->a) + 1) = 0;
}
People have argued here, but no one has given a convincing explanation or reference.
Their arguments are on a slightly different base, yet essentially the same
typedef struct _pack{
int64_t c;
} pack;
int main(){
pack *p;
char str[9] = "aaaaaaaa"; // Input
size_t len = offsetof(pack, c) + (strlen(str) + 1);
p = malloc(len);
// This line, with similar intention:
strcpy((char*)&(p->c), str);
// ^^^^^^^

The intent at least since the standardization of C in 1989 has been that implementations are allowed to check array bounds for array accesses.
The member p->a is an object of type int. C11 6.5.6p7 says that
7 For the purposes of [additive operators] a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.
Thus
&(p->a)
is a pointer to an int; but it is also as if it were a pointer to the first element of an array of length 1, with int as the object type.
Now 6.5.6p8 allows one to calculate &(p->a) + 1 which is a pointer to just past the end of the array, so there is no undefined behaviour. However, the dereference of such a pointer is invalid. From Appendix J.2 where it is spelt out, the behaviour is undefined when:
Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that points just beyond the array object and is used as the operand of a unary * operator that is evaluated (6.5.6).
In the expression above, there is only one array, the one (as if) with exactly 1 element. If &(p->a) + 1 is dereferenced, the array with length 1 is accessed out of bounds and undefined behaviour occurs, i.e.
behavior [...], for which [The C11] Standard imposes no requirements
With the note saying that:
Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
That the most common behaviour is ignoring the situation completely, i.e. behaving as if the pointer referenced the memory location just after, doesn't mean that other kind of behaviour wouldn't be acceptable from the standard's point of view - the standard allows every imaginable and unimaginable outcome.
There has been claims that the C11 standard text has been written vaguely, and the intention of the committee should be that this indeed be allowed, and previously it would have been alright. It is not true. Read the part from the committee response to [Defect Report #017 dated 10 Dec 1992 to C89].
Question 16
[...]
Response
For an array of arrays, the permitted pointer arithmetic in
subclause 6.3.6, page 47, lines 12-40 is to be understood by
interpreting the use of the word object as denoting the specific
object determined directly by the pointer's type and value, not other
objects related to that one by contiguity. Therefore, if an expression
exceeds these permissions, the behavior is undefined. For example, the
following code has undefined behavior:
int a[4][5];
a[1][7] = 0; /* undefined */
Some conforming implementations may
choose to diagnose an array bounds violation, while others may
choose to interpret such attempted accesses successfully with the
obvious extended semantics.
(bolded emphasis mine)
There is no reason why the same wouldn't be transferred to scalar members of structures, especially when 6.5.6p7 says that a pointer to them should be considered to behave the same as a pointer to the first element of an array of length one with the type of the object as its element type.
If you want to address the consecutive structs, you can always take the pointer to the first member and cast that as the pointer to the struct and advance that instead:
*(int *)((S *)&(p->a) + 1) = 0;

This is undefined behavior, as you are accessing something that is not an array (int a within struct S) as an array, and out of bounds at that.
The correct way to achieve what you want, is to use an array without a size as the last struct member:
#include <stdlib.h>
typedef struct S {
int foo; //avoid flexible array being the only member
int a[];
} S;
int main(){
S *p = malloc(sizeof(*p) + 2*sizeof(int));
p->a[0] = 0;
p->a[1] = 42; //Perfectly legal.
}

C standard guarantees that
§6.7.2.1/15:
[...] A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
&(p->a) is equivalent to (int *)p. &(p->a) + 1 will be address of the element of the second struct. In this case, only one element is there, there will not be any padding in the structure so this will work but where there will be padding this code will break and leads to undefined behaviour.

Variables in memory

If i have two variables a i b both int, and one pointer ptr that points to &b. If we would increment ptr++ like that it should be pointing at a,if i'm not wrong. I thought it's possible because when compiling a i b are in stack and b has 4 bytes less than a. But when i print that pointer in next line i only get address.
Code:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int a = 52;
int b = 12;
int *ptr;
ptr = &b;
printf("%d\n",*ptr);
ptr++;
printf("\n%d",*ptr);
return 0;
}
but if i put printf("%d",&a); then last printf is printed good and it prints value of a
Code:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int a = 52;
int b = 12;
printf("%d\n",&a);
int *ptr;
ptr = &b;
printf("%d\n",*ptr);
ptr++;
printf("\n%d",*ptr);
return 0;
}
Can someone explain me why this happens?
Pictures:

The compiler is free to arrange local variables in any order it chooses on the stack. In fact the C standard doesn't even mention a stack. That's an implementation detail left up to the compiler.
Adding a seemingly unrelated line of code can result in the compiler deciding to place variables on the stack in a different order than it did without the additional code. So you can't depend on this behavior when writing your code. Doing so is undefined behavior, which you have experienced.
Also, performing pointer arithmetic on variables that are not part of the same array is also undefined behavior.

C11 draft standard n1570:
6.5.2.4 Postfix increment and decrement operators
2
[...] See the discussions of additive operators and compound assignment for
information on constraints, types, and conversions and the effects of operations on
pointers.[...]
6.5.6 Additive operators
7
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
8
[...] If both the pointer
operand and the result point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element of the array object, it
shall not be used as the operand of a unary * operator that is evaluated.
After ptr = &b; and ptr++;, dereferencing ptr in printf("\n%d",*ptr); is undefined behavior.

You can't guarantee that a and b variables are stored anywhere near in memory, and it's plain unsafe to try to "travel" from one to another by pointer increments, and rely on the results. What you're doing is dwelling into the realm of undefined behavior, you shouldn't do that.

Array declared as int v[100] but &(v[100]) gives no warning

I've the following program:
#include <stdio.h>
int main() {
int v[100];
int *p;
for (p = &(v[0]); p != &(v[100]); ++p)
if ((*p = getchar()) == EOF) {
--p;
break;
}
while (p != v)
putchar(*--p);
return 0;
}
And this is the output of gcc --version on the terminal:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.3.0
Thread model: posix
Why getting the address of the element after the last of an array gives me no warning but getting for example the address of v[101] gives me the following warning
test.c:8:29: warning: array index 101 is past the end of the array (which
contains 100 elements) [-Warray-bounds]
for(p = &(v[0]); p != &(v[101]); ++p)
^ ~~~
test.c:5:5: note: array 'v' declared here
int v[100];
^
1 warning generated.
I know that indexing elements out of the bounds of a buffer is undefined behaviour, so why isn't the compiler complaining about the first case?

Moving pointer to one past the last element of array is allowed unless you dereference the pointer, so your program is valid if one or more characters are read before hitting EOF.
N1256 6.5.2.1 Array subscripting
The definition of the subscript operator []
is that E1[E2] is identical to (*((E1)+(E2))).
N1256 6.5.3.2 Address and indirection operators
If the operand is the result of a unary * operator,
neither that operator nor the & operator is evaluated and the result is as if both were
omitted, except that the constraints on the operators still apply and the result is not an
lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor
the unary * that is implied by the [] is evaluated and the result is as if the & operator
were removed and the [] operator were changed to a + operator.
N1256 6.5.6 Additive operators
Moreover, if the expression P points to the last
element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object,
the expression (Q)-1 points to the last element of the array object

It's about compatibility with sloppily written code.
As MikeCAT cited, for an array int ar[N], the expression ar+N is valid and results in a pointer that points to the past-the-end position. While this pointer cannot be dereferenced, it can be compared to any other pointer into the array, which allows you to write the nice for (p = ar; p != ar+N; ++p) loop.
Also, programmers like to write readable code, and arguably, if you want a pointer to the ith element of an array, writing &ar[i] conveys your intention more clearly than writing ar + i.
Combine these two, and you will get programmers who write &ar[N] to get the past-the-end pointer, and while this is technically accessing an invalid array index, no compiler will ever implement this as anything else than ar + N - in fact, the compiler would have to go out of its way to do it differently. Quite far in fact.
So, since any compiler that doesn't reason very strictly about undefined behavior will do the thing programmers expect for the expression, there's no reason not to write it, and so lots of people wrote it. And now we have massive code bases that use this idiom, which means that even modern compilers with their value tracking and reasoning about undefined behavior have to support this idiom for compatibility. And since Clang's warnings are meant to be useful, this particular warning was written so as to not warn about a case that will work anyway, out of some sense of misplaced pedantry.

Undefined behavior: when attempting to access the result of function call

The following compiles and prints "string" as an output.
#include <stdio.h>
struct S { int x; char c[7]; };
struct S bar() {
struct S s = {42, "string"};
return s;
}
int main()
{
printf("%s", bar().c);
}
Apparently this seems to invokes an undefined behavior according to
C99 6.5.2.2/5 If an attempt is made to modify the result of a function
call or to access it after the next sequence point, the behavior is
undefined.
I don't understand where it says about "next sequence point". What's going on here?

You've run into a subtle corner of the language.
An expression of array type is, in most contexts, implicitly converted to a pointer to the first element of the array object. The exceptions, none of which apply here, are:
When the array expression is the operand of a unary & operator (which yields the address of the entire array);
When it's the operand of a unary sizeof or (as of C11) _Alignof operator (sizeof arr yields the size of the array, not the size of a pointer); and
When it's a string literal in an initializer used to initialize an array object (char str[6] = "hello"; doesn't convert "hello" to a char*.)
(The N1570 draft incorrectly adds _Alignof to the list of exceptions. In fact, for reasons that are not clear, _Alignof can only be applied to a type name, not to an expression.)
Note that there's an implicit assumption: that the array expression refers to an array object in the first place. In most cases, it does (the simplest case is when the array expression is the name of a declared array object) -- but in this one case, there is no array object.
If a function returns a struct, the struct result is returned by value. In this case, the struct contains an array, giving us an array value with no corresponding array object, at least logically. So the array expression bar().c decays to a pointer to the first element of ... er, um, ... an array object that doesn't exist.
The 2011 ISO C standard addresses this by introducing "temporary lifetime", which applies only to "A non-lvalue expression with structure or union type, where the structure or union
contains a member with array type" (N1570 6.2.4p8). Such an object may not be modified, and its lifetime ends at the end of the containing full expression or full declarator.
So as of C2011, your program's behavior is well defined. The printf call gets a pointer to the first element of an array that's part of a struct object with temporary lifetime; that object continues to exist until the printf call finishes.
But as of C99, the behavior is undefined -- not necessarily because of the clause you quote (as far as I can tell, there is no intervening sequence point), but because C99 doesn't define the array object that would be necessary for the printf to work.
If your goal is to get this program to work, rather than to understand why it might fail, you can store the result of the function call in an explicit object:
const struct s result = bar();
printf("%s", result.c);
Now you have a struct object with automatic, rather than temporary, storage duration, so it exists during and after the execution of the printf call.

The sequence point occurs at the end of the full expression- i.e., when printf returns in this example. There are other cases where sequence points occur
Effectively, this rule states that function temporaries do not live beyond the next sequence point- which in this case, occurs well after it's use, so your program has quite well-defined behaviour.
Here's a simple example of not well-defined behaviour:
char* c = bar().c; *c = 5; // UB
Here, the sequence point is met after c is created, and the memory it points to is destroyed, but we then attempt to access c, resulting in UB.

In C99 there is a sequence point at the call to a function, after the arguments have been evaluated (C99 6.5.2.2/10).
So, when bar().c is evaluated, it results in a pointer to the first element in the char c[7] array in the struct returned by bar(). However, that pointer gets copied into an argument (a nameless argument as it happens) to printf(), and by the time the call is actually made to the printf() function the sequence point mentioned above has occurred, so the member that the pointer was pointing to may no longer be alive.
As Keith Thomson mentions, C11 (and C++) make stronger guarantees about the lifetime of temporaries, so the behavior under those standards would not be undefined.