Let's consider the following code:
struct M {
unsigned char a;
unsigned char b;
};
void pass_by_value(struct M);
int main() {
struct M m;
m.a = 0;
pass_by_value(m);
return 0;
}
In the function pass_by_value m.b is initialized before used.
However, since m is passed by value the compiler copies it to the stack already.
No variable has storage class register here. a and b are of type unsigned char.
Does that have to be considered UB in C90? (Please note: I am specifically asking for C90)
This question is very similar to Returning a local partially initialized struct from a function and undefined behavior, but actually the other way around.
The C 1990 standard (and the C 199 standard) does not contain the sentence that first appears in C 2011 that makes the behavior of using some uninitialized values undefined.
C 2011 6.3.2.1 2 includes:
… If the lvalue has an incomplete type and does not have array type, the behavior is undefined. If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.
The whole of the corresponding paragraph in C 1990, clause 6.2.2.1, second paragraph, is:
Except when it is the operand of the sizeof operator, the unary & operator, the ++ operator, the -- operator, or the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue). If the lvalue has qualified type, the value has the unqualified version of the type of the lvalue; otherwise, the value has the type of the lvalue. If the lvalue has an incomplete type and does not have array type, the behavior is undefined.
Therefore, the behavior of the code in the question would seem to be defined, inasmuch that it passes the value stored in the structure.
In the absence of explicit statements in the standard, common practice helps guide interpretation. It is perfectly normal not to initialize all members of a structure yet to expect the structure to represent useful data, and therefore the behavior of using the structure as a value must be defined if at least one of its members is initialized. The equivalent question for C 2011 contains mention (from a C defect report) of the standard struct tm in one of its answers. The struct tm may be used to represent a specific date by filling in all of date fields (year, month, day of month) and possibly the time fields (hour, minute, second, even Daylight Savings Time indication) but leaving the day of week and day of year fields uninitialized.
In defining undefined behavior in 3.16, the 1990 standard does say it is “Behavior, upon use … of indeterminately valued objects, for which this International Standard imposes no requirements.” And 6.5.7 says “… If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate…” However, a structure with automatic storage duration in which one member, but not another, has been initialized is neither fully initialized nor not initialized. Given the intended uses of structures, I would say we should not consider use of the value of a partially initialized structure to be subject to being made undefined by 3.16.
Under C90, if an object held Indeterminate Value, each and every bit could independently be zero or one, regardless of whether or not they would in combination represent a valid bit pattern for the object's type. If an implementation specified the behavior of attempting to read each and every one of the 2ⁿ individual possible bit patterns an object could hold, the behavior of reading an object with Indeterminate Value would be equivalent to reading the value of an arbitrarily chosen bit pattern. If there were any bit patterns for which an implementation did not specify the effect of an attempted read, then the effects of trying to read an object that might hold such bit patterns would be likewise unspecified.
Code generation efficiency could be improved in some cases by specifying the behavior of uninitialized objects more loosely, in a way which would not otherwise be consistent with sequential program execution as specified but would nonetheless meet program requirements. For example, given something like:
struct foo { short dat[16]; } x,y,z;
void test1(int a, int b, int c, int d)
{
struct foo temp;
temp.dat[a] = 1;
temp.dat[b] = 2;
temp.dat[c] = 3;
temp.dat[d] = 4;
x=temp;
y=temp;
}
void test2(int a, int b, int c, int d)
{
test1(a,b,c,d);
z=x;
}
If client code only cares about the values of x and y that correspond to values of temp that were written, efficiency might be improved, while still meeting requirements, if the code were rewritten as:
void test1(int a, int b, int c, int d)
{
x.dat[a] = 1;
y.dat[a] = 1;
x.dat[b] = 2;
y.dat[b] = 1;
x.dat[c] = 3;
y.dat[c] = 1;
x.dat[d] = 4;
y.dat[d] = 1;
}
The fact that the original function test1 doesn't do anything to initialize temp suggests that it won't care about what is yielded by any individual attempt to read it. On the other hand, nothing within the code for test2 would imply that client code wouldn't care about whether all members of x held the same values as corresponding values of y. Thus, such an inference would more likely be dangerous there.
The C Standard makes no attempt to define behavior in situations where an optimization might yield program behavior which, although useful, would be inconsistent with sequential processing of non-optimized code. Instead, the principle that optimizations must never affect any defined behavior is taken to imply that the Standard must characterize as Undefined all actions whose behavior would be visibly affected by optimization, leaving implementor discretion the question of what aspects of behavior should or should not be defined in what circumstances. Ironically, the only time the Standard's laxity with regard to this behavior would allow more efficient code generation outside contrived scenarios would be in cases where implementations treat the behavior as at least loosely defined, and programmers are able to exploit that. If a programmer had to explicitly initialize all elements of temp to avoid having the compiler behave in completely nonsensical fashion, that would eliminate any possibility of optimizing out the unnecessary writes to unused elements of x and y.
Related
Consider the following function:
int bar(const int* __restrict x, void g())
{
int result = *x;
g();
result += *x;
return result;
}
Do we need to read twice from x because of the call to g()? Or is the __restriction enough to guarantee the invocation of g() does not access/does not alter the value at address x?
At this link we see the most popular compilers have to say about this (GodBolt; language standard C99, platform AMD64):
clang 7.0: Restriction respected.
GCC 8.3: No restriction.
MSVC 19.16: No restriction.
Is clang rightly optimizing the second read away, or isn't it? I'm asking both for C and C++ here, as the behavior is the same (thanks #PSkocik).
Related information and some notes:
Readers unfamiliar with __restrict (or __restrict__) may want to have a look at: What does the restrict keyword mean in C++?
GCC's documentation page on restricted pointers in C++.
I've opened a bug against GCC on this point.
The fact that x is marked const is not significant here - we get the same behavior if we drop the const and the question stands as is.
I think this is effectively a C question, since C is effectively the language that has restrict, with a formal spec attached to it.
The part of the C standard that governs the use of restrict is 6.7.3.1:
1 Let D be a declaration of an ordinary identifier that provides a
means of designating an object P as a restrict-qualified pointer to
type T.
2 If D appears inside a block and does not have storage class extern,
let B denote the block. If D appears in the list of parameter
declarations of a function definition, let B denote the associated
block. Otherwise, let B denote the block of main (or the block of
whatever function is called at program startup in a freestanding
environment).
3 In what follows, a pointer expression E is said to be based on
object P if (at some sequence point in the execution of B prior to the
evaluation of E) modifying P to point to a copy of the array object
into which it formerly pointed would change the value of E.137) Note
that ''based'' is defined only for expressions with pointer types.
4 During each execution of B, let L be any lvalue that has &L based on
P. If L is used to access the value of the object X that it
designates, and X is also modified (by any means), then the following
requirements apply: T shall not be const-qualified. Every other lvalue
used to access the value of X shall also have its address based on P.
Every access that modifies X shall be considered also to modify P, for
the purposes of this subclause. If P is assigned the value of a
pointer expression E that is based on another restricted pointer
object P2, associated with block B2, then either the execution of B2
shall begin before the execution of B, or the execution of B2 shall
end prior to the assignment. If these requirements are not met, then
the behavior is undefined.
5 Here an execution of B means that portion of the execution of the
program that would correspond to the lifetime of an object with scalar
type and automatic storage duration associated with B.
The way I read it, the execution of g() falls under the execution of the bar's block, so g() is disallowed from modifying *x and clang is right to optimize out the second load (IOW, if *x refers to a non-const global, g() must not modify that global).
In these two examples, does accessing members of the struct by offsetting pointers from other members result in Undefined / Unspecified / Implementation Defined Behavior?
struct {
int a;
int b;
} foo1 = {0, 0};
(&foo1.a)[1] = 1;
printf("%d", foo1.b);
struct {
int arr[1];
int b;
} foo2 = {{0}, 0};
foo2.arr[1] = 1;
printf("%d", foo2.b);
Paragraph 14 of C11 § 6.7.2.1 seems to indicate that this should be implementation-defined:
Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.
and later goes on to say:
There may be unnamed padding within a structure object, but not at its beginning.
However, code like the following appears to be fairly common:
union {
int arr[2];
struct {
int a;
int b;
};
} foo3 = {{0, 0}};
foo3.arr[1] = 1;
printf("%d", foo3.b);
(&foo3.a)[1] = 2; // appears to be illegal despite foo3.arr == &foo3.a
printf("%d", foo3.b);
The standard appears to guarantee that foo3.arr is the same as &foo3.a, and it doesn't make sense that referring to it one way is legal and the other not, but equally it doesn't make sense that adding the outer union with the array should suddenly make (&foo3.a)[1] legal.
My reasoning for thinking the first examples must also therefore be legal:
foo3.arr is guaranteed to be the same as &foo.a
foo3.arr + 1 and &foo3.b point to the same memory location
&foo3.a + 1 and &foo3.b must therefore point to the same memory location (from 1 and 2)
struct layouts are required to be consistent, so &foo1.a and &foo1.b should be laid out exactly the same as &foo3.a and &foo3.b
&foo1.a + 1 and &foo1.b must therefore point to the same memory location (from 3 and 4)
I've come across some outside sources that suggest that both the foo3.arr[1] and (&foo3.a)[1] examples are illegal, however I haven't been able to find a concrete statement in the standard that would make it so.
Even if they were both illegal though, it's also possible to construct the same scenario with flexible array pointers which, as far as I can tell, does have standard-defined behavior.
union {
struct {
int x;
int arr[];
};
struct {
int y;
int a;
int b;
};
} foo4;
The original application is considering whether or not a buffer overflow from one struct field into another is strictly speaking defined by the standard:
struct {
char buffer[8];
char overflow[8];
} buf;
strcpy(buf.buffer, "Hello world!");
println(buf.overflow);
I would expect this to output "rld!" on nearly any real-world compiler, but is this behavior guaranteed by the standard, or is it an undefined or implementation-defined behavior?
Introduction: The standard is inadequate in this area, and there is decades of history of argument on this topic and strict aliasing with no convincing resolution or proposal to fix.
This answer reflects my view rather than any imposition of the Standard.
Firstly: it's generally agreed that the code in your first code sample is undefined behaviour due to accessing outside the bounds of an array via direct pointer arithmetic.
The rule is C11 6.5.6/8 . It says that indexing from a pointer must remain within "the array object" (or one past the end). It doesn't say which array object but it is generally agreed that in the case int *p = &foo.a; then "the array object" is foo.a, and not any larger object of which foo.a is a subobject.
Relevant links:
one, two.
Secondly: it's generally agreed that both of your union examples are correct. The standard explicitly says that any member of a union may be read; and whatever the contents of the relevant memory location are are interpreted as the type of the union member being read.
You suggest that the union being correct implies that the first code should be correct too, but it does not. The issue is not with specifying the memory location read; the issue is with how we arrived at the expression specifying that memory location.
Even though we know that &foo.a + 1 and &foo.b are the same memory address, it's valid to access an int through the second and not valid to access an int through the first.
It's generally agreed that you can access the int by computing its address in other ways that don't break the 6.5.6/8 rule, e.g.:
((int *)((char *)&foo + offsetof(foo, b))[0]
or
((int *)((uintptr_t)&foo.a + sizeof(int)))[0]
Relevant links: one, two
It's not generally agreed on whether ((int *)&foo)[1] is valid. Some say it's basically the same as your first code, since the standard says "a pointer to an object, suitably converted, points to the element's first object". Others say it's basically the same as my (char *) example above because it follows from the specification of pointer casting. A few even claim it's a strict aliasing violation because it aliases a struct as an array.
Maybe relevant is N2090 - Pointer provenance proposal. This does not directly address the issue, and doesn't propose a repeal of 6.5.6/8.
According to C11 draft N1570 6.5p7, an attempt to access the stored value of a struct or union object using anything other than an lvalue of character type, the struct or union type, or a containing struct or union type, invokes UB even if behavior would otherwise be fully described by other parts of the Standard. This section contains no provision that would allow an lvalue of a non-character member type (or any non-character numeric type, for that matter) to be used to access the stored value of a struct or union.
According to the published Rationale document, however, the authors of the Standard recognized that different implementations offered different behavioral guarantees in cases where the Standard imposed no requirements, and regarded such "popular extensions" as a good and useful thing. They judged that questions of when and how such extensions should be supported would be better answered by the marketplace than by the Committee. While it may seem weird that the Standard would allow an obtuse compiler to ignore the possibility that someStruct.array[i] might affect the stored value of someStruct, the authors of the Standard recognized that any compiler whose authors aren't deliberately obtuse will support such a construct whether the Standard mandates or not, and that any attempt to mandate any kind of useful behavior from obtusely-designed compilers would be futile.
Thus, a compiler's level of support for essentially anything having to do with structures or unions is a quality-of-implementation issue. Compiler writers who are focused on being compatible with a wide range of programs will support a wide range of constructs. Those which are focused on maximizing the performance of code that needs only those constructs without which the language would be totally useless, will support a much narrower set. The Standard, however, is devoid of guidance on such issues.
PS--Compilers that are configured to be compatible with MSVC-style volatile semantics will interpret that qualifier as a indicating that an access to the pointer may have side-effects that interact with objects whose address has been taken and that aren't guarded by restrict, whether or not there is any other reason to expect such a possibility. Use of such a qualifier when accessing storage in "unusual" ways may make it more obvious to human readers that the code is doing something "weird" at the same time as it will thus ensure compatibility with any compiler that uses such semantics, even if such compiler would not otherwise recognize that access pattern. Unfortunately, some compiler writers refuse to support such semantics at anything other than optimization level 0 except with programs that demand it using non-standard syntax.
How can *i and u.i print different numbers in this code, even though i is defined as int *i = &u.i;? I can only assuming that I'm triggering UB here, but I can't see how exactly.
(ideone demo replicates if I select 'C' as the language. But as #2501 pointed out, not if 'C99 strict' is the language. But then again, I get the problem with gcc-5.3.0 -std=c99!)
// gcc -fstrict-aliasing -std=c99 -O2
union
{
int i;
short s;
} u;
int * i = &u.i;
short * s = &u.s;
int main()
{
*i = 2;
*s = 100;
printf(" *i = %d\n", *i); // prints 2
printf("u.i = %d\n", u.i); // prints 100
return 0;
}
(gcc 5.3.0, with -fstrict-aliasing -std=c99 -O2, also with -std=c11)
My theory is that 100 is the 'correct' answer, because the write to the union member through the short-lvalue *s is defined as such (for this platform/endianness/whatever). But I think that the optimizer doesn't realize that the write to *s can alias u.i, and therefore it thinks that *i=2; is the only line that can affect *i. Is this a reasonable theory?
If *s can alias u.i, and u.i can alias *i, then surely the compiler should think that *s can alias *i? Shouldn't aliasing be 'transitive'?
Finally, I always had this assumption that strict-aliasing problems were caused by bad casting. But there is no casting in this!
(My background is C++, I'm hoping I'm asking a reasonable question about C here. My (limited) understanding is that, in C99, it is acceptable to write through one union member and then reading through another member of a different type.)
The disrepancy is issued by -fstrict-aliasing optimization option. Its behavior and possible traps are described in GCC documentation:
Pay special attention to code like this:
union a_union {
int i;
double d;
};
int f() {
union a_union t;
t.d = 3.0;
return t.i;
}
The practice of reading from a different union member than the one
most recently written to (called “type-punning”) is common. Even with
-fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above works as expected.
See Structures unions enumerations and bit-fields implementation. However, this code might
not:
int f() {
union a_union t;
int* ip;
t.d = 3.0;
ip = &t.i;
return *ip;
}
Note that conforming implementation is perfectly allowed to take advantage of this optimization, as second code example exhibits undefined behaviour. See Olaf's and others' answers for reference.
C standard (i.e. C11, n1570), 6.5p7:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
...
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.
The lvalue expressions of your pointers are not union types, thus this exception does not apply. The compiler is correct exploiting this undefined behaviour.
Make the pointers' types pointers to the union type and dereference with the respective member. That should work:
union {
...
} u, *i, *p;
Strict aliasing is underspecified in the C Standard, but the usual interpretation is that union aliasing (which supersedes strict aliasing) is only permitted when the union members are directly accessed by name.
For rationale behind this consider:
void f(int *a, short *b) {
The intent of the rule is that the compiler can assume a and b don't alias, and generate efficient code in f. But if the compiler had to allow for the fact that a and b might be overlapping union members, it actually couldn't make those assumptions.
Whether or not the two pointers are function parameters or not is immaterial, the strict aliasing rule doesn't differentiate based on that.
This code indeed invokes UB, because you do not respect the strict aliasing rule. n1256 draft of C99 states in 6.5 Expressions §7:
An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the
object,
— a type that is the signed or unsigned type corresponding to a qualified version of the
effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively, a member of a subaggregate or contained union), or
— a character type.
Between the *i = 2; and the printf(" *i = %d\n", *i); only a short object is modified. With the help of the strict aliasing rule, the compiler is free to assume that the int object pointed by i has not been changed, and it can directly use a cached value without reloading it from main memory.
It is blatantly not what a normal human being would expect, but the strict aliasing rule was precisely written to allow optimizing compilers to use cached values.
For the second print, unions are referenced in same standard in 6.2.6.1 Representations of types / General §7:
When a value is stored in a member of an object of union type, the bytes of the object
representation that do not correspond to that member but do correspond to other members
take unspecified values.
So as u.s has been stored, u.i have taken a value unspecified by standard
But we can read later in 6.5.2.3 Structure and union members §3 note 82:
If the member used to access the contents of a union object is not the same as the member last used to
store a value in the object, the appropriate part of the object representation of the value is reinterpreted
as an object representation in the new type as described in 6.2.6 (a process sometimes called "type
punning"). This might be a trap representation.
Although notes are not normative, they do allow better understanding of the standard. When u.s have been stored through the *s pointer, the bytes corresponding to a short have been changed to the 2 value. Assuming a little endian system, as 100 is smaller that the value of a short, the representation as an int should now be 2 as high order bytes were 0.
TL/DR: even if not normative, the note 82 should require that on a little endian system of the x86 or x64 families, printf("u.i = %d\n", u.i); prints 2. But per the strict aliasing rule, the compiler is still allowed to assumed that the value pointed by i has not changed and may print 100
You are probing a somewhat controversial area of the C standard.
This is the strict aliasing rule:
An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to
the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
a character type.
(C2011, 6.5/7)
The lvalue expression *i has type int. The lvalue expression *s has type short. These types are not compatible with each other, nor both compatible with any other particular type, nor does the strict aliasing rule afford any other alternative that allows both accesses to conform if the pointers are aliased.
If at least one of the accesses is non-conforming then the behavior is undefined, so the result you report -- or indeed any other result at all -- is entirely acceptable. In practice, the compiler must produce code that reorders the assignments with the printf() calls, or that uses a previously loaded value of *i from a register instead of re-reading it from memory, or some similar thing.
The aforementioned controversy arises because people will sometimes point to footnote 95:
If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.
Footnotes are informational, however, not normative, so there's really no question which text wins if they conflict. Personally, I take the footnote simply as an implementation guidance, clarifying the meaning of the fact that the storage for union members overlaps.
Looks like this is a result of the optimizer doing its magic.
With -O0, both lines print 100 as expected (assuming little-endian). With -O2, there is some reordering going on.
gdb gives the following output:
(gdb) start
Temporary breakpoint 1 at 0x4004a0: file /tmp/x1.c, line 14.
Starting program: /tmp/x1
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x2aaaaaaab000
Temporary breakpoint 1, main () at /tmp/x1.c:14
14 {
(gdb) step
15 *i = 2;
(gdb)
18 printf(" *i = %d\n", *i); // prints 2
(gdb)
15 *i = 2;
(gdb)
16 *s = 100;
(gdb)
18 printf(" *i = %d\n", *i); // prints 2
(gdb)
*i = 2
19 printf("u.i = %d\n", u.i); // prints 100
(gdb)
u.i = 100
22 }
(gdb)
0x0000003fa441d9f4 in __libc_start_main () from /lib64/libc.so.6
(gdb)
The reason this happens, as others have stated, is because it is undefined behavior to access a variable of one type through a pointer to another type even if the variable in question is part of a union. So the optimizer is free to do as it wishes in this case.
The variable of the other type can only be read directly via a union which guarantees well defined behavior.
What's curious is that even with -Wstrict-aliasing=2, gcc (as of 4.8.4) doesn't complain about this code.
Whether by accident or by design, C89 includes language which has been interpreted in two different ways (along with various interpretations in-between). At issue is the question of when a compiler should be required to recognize that storage used for one type might be accessed via pointers of another. In the example given in the C89 rationale, aliasing is considered between a global variable which is clearly not part of any union and a pointer to a different type, and nothing in the code would suggest that aliasing could occur.
One interpretation horribly cripples the language, while the other would restrict the use of certain optimizations to "non-conforming" modes. If those who didn't to have their preferred optimizations given second-class status had written C89 to unambiguously match their interpretation, those parts of the Standard would have been widely denounced and there would have been some sort of clear recognition of a non-broken dialect of C which would honor the non-crippling interpretation of the given rules.
Unfortunately, what has happened instead is since the rules clearly don't require compiler writers apply a crippling interpretation, most compiler writers have for years simply interpreted the rules in a fashion which retains the semantics that made C useful for systems programming; programmers didn't have any reason to complain that the Standard didn't mandate that compilers behave sensibly because from their perspective it seemed obvious to everyone that they should do so despite the sloppiness of the Standard. Meanwhile, however, some people insist that since the Standard has always allowed compilers to process a semantically-weakened subset of Ritchie's systems-programming language, there's no reason why a standard-conforming compiler should be expected to process anything else.
The sensible resolution for this issue would be to recognize that C is used for sufficiently varied purposes that there should be multiple compilation modes--one required mode would treat all accesses of everything whose address was taken as though they read and write the underlying storage directly, and would be compatible with code which expects any level of pointer-based type punning support. Another mode could be more restrictive than C11 except when code explicitly uses directives to indicate when and where storage that has been used as one type would need to be reinterpreted or recycled for use as another. Other modes would allow some optimizations but support some code that would break under stricter dialects; compilers without specific support for a particular dialect could substitute one with more defined aliasing behaviors.
In trying to answer a recent question (Passing restrict qualified pointers to functions?), I could not find how the C11 standard is consistent with practice.
I'm not trying to call out the standard or anything, most things that look inconsistent I just am not understanding right, but my question is best posed as an argument against the definition used in the standard, so here it is.
It seems to be commonly accepted that a function can take a restrict qualified pointer and both work on it and have its own function calls work on it. For example,
// set C to componentwise sum and D to componentwise difference
void dif_sum(float* restrict C, float* restrict D, size_t n)
{
size_t i = 0;
while(i<n) C[i] = C[i] - D[i],
D[i] += C[i] + D[i],
++i;
}
// set A to componentwise sum of squares of A and B
// set B to componentwise product of A and B
void prod_squdif(float* restrict A, float* restrict B, size_t n)
{
size_t i = 0;
float x;
dif_sum(A,B,n);
while(i<n) x = ( (A[i]*=A[i]) - B[i]*B[i] )/2,
A[i] -= x,
B[i++] = x/2;
}
What seems to be the common understanding is that restrict pointers need to reference independent space within their declaring block. So, prod_sqdif is valid because nothing lexically within its defining block accesses the arrays identified by A or B other than those pointers.
To demonstrate my concern with the standard, here is the standard formal definition of restrict (according to the committee draft, if you have the final version and it is different, let me know!):
6.7.3.1 Formal definition of restrict
1 Let D be a declaration of an ordinary identifier that provides a means of designating an object P as a restrict-qualified pointer to type T.
2 If D appears inside a block and does not have storage class extern, let B denote the block. If D appears in the list of parameter declarations of a function definition, let B denote the associated block. Otherwise, let B denote the block of main (or the block of whatever function is called at program startup in a freestanding environment).
3 In what follows, a pointer expression E is said to be based on object P if (at some sequence point in the execution of B prior to the evaluation of E) modifying P to point to a copy of the array object into which it formerly pointed would change the value of E. Note that ‘‘based’’ is defined only for expressions with pointer types.
4 During each execution of B, let L be any lvalue that has &L based on P. If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: T shall not be const-qualified. Every other lvalue used to access the value of X shall also have its address based on P. Every access that modifies X shall be considered also to modify P, for the purposes of this subclause. If P is assigned the value of a pointer expression E that is based on another restricted pointer object P2, associated with block B2, then either the execution of B2 shall begin before the execution of B, or the execution of B2 shall end prior to the assignment. If these
requirements are not met, then the behavior is undefined.
5 Here an execution of B means that portion of the execution of the program that would correspond to the lifetime of an object with scalar type and automatic storage duration associated with B.
6 A translator is free to ignore any or all aliasing implications of uses of restrict.
[Examples not included because they are not formally significant.]
Identifying execution of B with expressions lexically contained therein might be seen as supported by the following excerpt from 6.2.4, item 6:
"...Entering an enclosed block or calling a function suspends, but does not end,
execution of the current block..."
However, part 5 of the formal definition of restrict explicitly defines the block B to correspond to the lifetime of an object with automatic storage declared in B (in my example, B is the body of prod_squdif). This clearly overrides any definition of the execution of a block found elsewhere in the standard. The following excerpt from the standard defines lifetime of an object.
6.2.4 Storage durations of objects, item 2
The lifetime of an object is the portion of program execution during which storage is guaranteed to be reserved for it. An object exists, has a constant address, and retains its last-stored value throughout its lifetime. 34) If an object is referred to outside of its lifetime, the behavior is undefined. The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.
Then the execution of dif_sum is clearly included in the execution of B. I don't think there is any question there. But then the lvalues in dif_sum that read and modify elements of A and B (via C and D) are clearly not based on A and B (they follow sequence points where A and B could have been repointed to copies of their content without changing the locations identified by the lvalues). This is undefined behavior. Note that what lvalues or sequence points are discussed in item 4 is not restricted; as it is stated, there is no reason to restrict lvalues and sequence points to those lexically corresponding to the block B, and so lvalues and sequence points within a function call play just like they do in the body of the calling function.
On the other hand, the generally accepted use of restrict seems implied by the fact that the formal definition explicitly allows C and D to be assigned the values of A and B. This suggests that some meaningful access to A and B through C and D is allowed. However, as argued above, such access is undefined for any element modified through either the outer or inner function call, and at least read by the inner call. This seems contrary to the apparent intent of allowing the assignment in the first place.
Of course, intent has no formal place in the standard, but it does seem suggestive that the common interpretation of restrict, rather than what seems actually defined, is what is intended.
In summary, interpreting the execution of B as the execution of each statement during the lifetime of B's automatic storage, then function calls can't work with the contents of restrict pointers passed to them.
It seems unavoidable to me that there should be some exception stating reads and writes within functions or sub blocks are not considered, but that at most one assignment within such a sub block (and other sub blocks, recursively) may be based on any particular restrict pointer in the outer block.
I have really gone over the standard, both today and yesterday. I really can't see how the formal definition of restrict could possibly be consistent with the way it seems to be understood and implemented.
EDIT: As has been pointed out, violating the restrict contract results in undefined behavior. My question is not about what happens when the contract is violated. My question can be restated as follows:
How can the formal definition of restrict be consistent with access to array elements through function calls? Does such access, within a calling function, not constitute access not based on the restrict pointer passed to the function?
I am looking for an answer based in the standard, as I agree that restrict pointers should be able to be passed through function calls. It just seems that this is not the consequence of the formal definition in the standard.
EDIT
I think the main problem with communicating my question is related to the definition of "based on". I will try to present my question a little differently here.
The following is an informal tracking of a particular call to prod_squdif. This is not intended as C code, it is just an informal description of the execution of the function's block.
Note that this execution includes the execution of the called function, per item 5 of the formal definition of restrict: "Here an execution of B means that portion of the execution of the program that would correspond to the lifetime of an object with scalar type and automatic storage duration associated with B."
// 1. prod_squdif is called
prod_squdif( (float[1]){2}, (float[1]){1}, 1 )
// 2. dif_sum is called
dif_sum(A,B,n) // assigns C=A and D=B
// 3. while condition is evaluated
0<1 // true
// 4. 1st assignment expression
C[0] = C[0] - D[0] // C[0] == 0
// 5. 2nd assignment expression
D[0] += C[0] + D[0] // D[0] == 1
// 6. increment
++i // i == 1
// 7. test
1<1 // false
// return to calling function
// 8. test
0<1 // true
// 9. 1st assignment expression
x = ( (A[0]*=A[0]) - B[1]*B[1] )/2 // x == -.5
// 10. 2nd assignment expression
A[0] -= -.5 // A[0] == .5
// 11. 3rd assignment expression
B[i++/*evaluates to 0*/] = -.5/2 // B[0] == -.25
// 12. test
1<1 // false
// prod_squdif returns
So, the test for the restrict contract is given by item 4 in the formal definition of restrict: "During each execution of B, let L be any lvalue that has &L based on P. If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: ... Every other lvalue used to access the value of X shall also have its address based on P..."
Let L be the lvalue on the left of the portion of the execution marked '4' above (C[0]). Is &L based on A? I.e., is C based on A?
See item 3 of the formal definition of restrict: "...a pointer expression E is said to be based on object P if (at some sequence point in the execution of B prior to the evaluation of E) modifying P to point to a copy of the array object into which it formerly pointed would change the value of E...".
Take as a sequence point the end of item 3 above. (At this sequence point) modifying A to point to a coppy of the array object into which it formerly pointed would NOT change the value of C.
Thus C is not based on A. So A[0] is modified by an lvalue not based on A. Since it is also read by an lvalue that is based on A (item 10), this is undefined behavior.
My question is: Is it correct to therefore conclude that my example invokes undefined behavior and thus the formal definition of restrict is not consistent with common implementation?
Suppose we have a function with nested blocks like this:
void foo()
{
T *restrict A = getTptr();
{
T *restrict B = A;
{
#if hypothetical
A = copyT(A);
#endif
useTptr(B + 1);
}
}
}
It would seem that, at the point where useTptr(B + 1) is called, the hypothetical change to A would no longer affect the value of B + 1. However, a different sequence point can be found, such that a change to A does affect the value of B + 1:
void foo()
{
T *restrict A = getTptr();
#if hypothetical
A = copyT(A);
#endif
{
T *restrict B = A;
{
useTptr(B + 1);
}
}
}
and C11 draft standard n1570 6.7.3.1 Formal definition of restrict only demands that there be some such sequence point, not that all sequence points exhibit this behavior.
I'm really not sure exactly what your question is.
It sounds like you're asking:
Q: Gee, will "restrict" still apply if I violate the "restrict"
contract? Like in the "remove_zeroes()" example?
The answer, of course, is "No - it won't".
Here are two links that might clarify the discussion. Please update your post with (a) more explicit question(s):
Realistic usage of the C99 'restrict' keyword?
Is it legal to assign a restricted pointer to another pointer, and use the second pointer to modify the value?
https://en.wikipedia.org/wiki/Restrict
A commonly-used macro in the linux kernel (and other places) is container_of, which is (basically) defined as follows:
#define container_of(ptr, type, member) (((type) *)((char *)(ptr) - offsetof((type), (member))))
Which basically allows recovery of a "parent" structure given a pointer to one of its members:
struct foo {
char ch;
int bar;
};
...
struct foo f = ...
int *ptr = &f.bar; // 'ptr' points to the 'bar' member of 'struct foo' inside 'f'
struct foo *g = container_of(ptr, struct foo, bar);
// now, 'g' should point to 'f', i.e. 'g == &f'
However, it's not entirely clear whether the subtraction contained within container_of is considered undefined behavior.
On one hand, because bar inside struct foo is only a single integer, then only *ptr should be valid (as well as ptr + 1). Thus, the container_of effectively produces an expression like ptr - sizeof(int), which is undefined behavior (even without dereferencing).
On the other hand, §6.3.2.3 p.7 of the C standard states that converting a pointer to a different type and back again shall produce the same pointer. Therefore, "moving" a pointer to the middle of a struct foo object, then back to the beginning should produce the original pointer.
The main concern is the fact that implementations are allowed to check for out-of-bounds indexing at runtime. My interpretation of this and the aforementioned pointer equivalence requirement is that the bounds must be preserved across pointer casts (this includes pointer decay - otherwise, how could you use a pointer to iterate across an array?). Ergo, while ptr may only be an int pointer, and neither ptr - 1 nor *(ptr + 1) are valid, ptr should still have some notion of being in the middle of a structure, so that (char *)ptr - offsetof(struct foo, bar) is valid (even if the pointer is equal to ptr - 1 in practice).
Finally, I came across the fact that if you have something like:
int arr[5][5] = ...
int *p = &arr[0][0] + 5;
int *q = &arr[1][0];
while it's undefined behavior to dereference p, the pointer by itself is valid, and required to compare equal to q (see this question). This means that p and q compare the same, but can be different in some implementation-defined manner (such that only q can be dereferenced). This could mean that given the following:
// assume same 'struct foo' and 'f' declarations
char *p = (char *)&f.bar;
char *q = (char *)&f + offsetof(struct foo, bar);
p and q compare the same, but could have different boundaries associated with them, as the casts to (char *) come from pointers to incompatible types.
To sum it all up, the C standard isn't entirely clear about this type of behavior, and attempting to apply other parts of the standard (or, at least my interpretations of them) leads to conflicts. So, is it possible to define container_of in a strictly-conforming manner? If so, is the above definition correct?
This was discussed here after comments on my answer to this question.
TLDR
It is a matter of debate among language lawyers as to whether programs using container_of are strictly conforming, but pragmatists using the container_of idiom are in good company and are unlikely to run into issues running programs compiled with mainstream tool chains on mainstream hardware. In other words:
strictly conforming: debated
conforming: yes, for all practical purposes, in most situations
What can be said today
There is no language in the standard C17 standard that unambiguously requires support for the container_of idiom.
There are defect reports that suggest the standard intends to allow implementations room to forbid the container_of idiom by tracking "provenance" (i.e. the valid bounds) of objects along with pointers. However, these alone are not normative.
There is recent activity in the C memory object model study group that aims to provide more rigor to this and similar questions. See Clarifying the C memory object model - N2012 from 2016, Pointers are more abstract than you might expect from 2018, and A Provenance-aware Memory Object Model for C - N2676 from 2021.
Depending on when you read this, there may be newer documents available at the WG14 document log. Additionally, Peter Sewell collects related reference material here: https://www.cl.cam.ac.uk/~pes20/cerberus/. These documents do not change what a strictly conforming program is today (in 2021, for versions C17 and older), but they suggest that the answer may change in newer versions of the standard.
Background
What is the container_of idiom?
This code demonstrates the idiom by expanding the contents of the macro usually seen implementing the idiom:
#include <stddef.h>
struct foo {
long first;
short second;
};
void container_of_idiom(void) {
struct foo f;
char* b = (char*)&f.second; /* Line A */
b -= offsetof(struct foo, second); /* Line B */
struct foo* c = (struct foo*)b; /* Line C */
}
In the above case, a container_of macro would typically take a short* argument intended to point to the second field of a struct foo. It would also take arguments for struct foo and second, and would expand to an expression returning struct foo*. It would employ the logic seen in lines A-C above.
The question is: is this code strictly conforming?
First, let's define "strictly conforming"
C17 4 (5-7) Conformance
A strictly conforming program shall use only those features of the language and library specified in this International Standard. It shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior, and shall not exceed any minimum implementation limit.
[...] A conforming hosted implementation shall accept any strictly conforming program. [...] A conforming implementation may have extensions (including additional library functions), provided they do not alter the behavior of any strictly conforming program.
A conforming program is one that is acceptable to a conforming implementation.
(For brevity I omitted the definition of "freestanding" implementations, as it concerns limitations on the standard library not relevant here.)
From this we see that strict conformance is quite strict, but a conforming implementation is allowed to define additional behavior as long as it does not alter the behavior of a strictly conforming program. In practice, almost all implementations do this; this is the "practical" definition that most C programs are written against.
For the purposes of this answer I'll contain my answer to strictly conforming programs, and talk about merely conforming programs at the end.
Defect reports
The language standard itself is somewhat unclear on the question, but several defect reports shed more light on the issue.
DR 51
DR 51 ask questions of this program:
#include <stdlib.h>
struct A {
char x[1];
};
int main() {
struct A *p = (struct A *)malloc(sizeof(struct A) + 100);
p->x[5] = '?'; /* This is the key line */
return p->x[5];
}
The response to the DR includes (emphasis mine):
Subclause 6.3.2.1 describes limitations on pointer arithmetic, in connection with array subscripting. (See also subclause 6.3.6.) Basically, it permits an implementation to tailor how it represents pointers to the size of the objects they point at. Thus, the expression p->x[5] may fail to designate the expected byte, even though the malloc call ensures that the byte is present. The idiom, while common, is not strictly conforming.
Here we have the first indication that the standard allows implementations to "tailor" pointer representations based on the objects pointed at, and that pointer arithmetic that "leaves" the valid range of the original object pointed to is not strictly conforming.
DR 72 ask questions of this program:
#include <stddef.h>
#include <stdlib.h>
typedef double T;
struct hacked {
int size;
T data[1];
};
struct hacked *f(void)
{
T *pt;
struct hacked *a;
char *pc;
a = malloc(sizeof(struct hacked) + 20 * sizeof(T));
if (a == NULL) return NULL;
a->size = 20;
/* Method 1 */
a->data[8] = 42; /* Line A /*
/* Method 2 */
pt = a->data;
pt += 8; /* Line B /*
*pt = 42;
/* Method 3 */
pc = (char *)a;
pc += offsetof(struct hacked, data);
pt = (T *)pc; /* Line C */
pt += 8; /* Line D */
*pt = 6 * 9;
return a;
}
Astute readers will notice that /* Method 3 */ above is much like the container_of idiom. I.e. it takes a pointer to a struct type, converts it to char*, does some pointer arithmetic that takes the char* outside the range of the original struct, and uses the pointer.
The committee responded by saying /* Line C */ was strictly conforming but /* Line D */ was not strictly conforming by the same argument given for DR 51 above. Further, the committee said that the answers "are not affected if T has char type."
Verdict: container_of is not strictly conforming (probably)
The container_of idiom takes a pointer to a struct's subobject, converts the pointer to char*, and performs pointer arithmetic that moves the pointer outside the subobject. This is the same set of operations discussed in DR 51 and 72 apply. There is clear intent on the part of the committee. They hold that the standard "permits an implementation to tailor how it represents pointers to the size of the objects they point at" and thus "the idiom, while common, is not strictly conforming."
One might argue that container_of side steps the issue by doing the pointer arithmetic in the domain of char* pointers, but the committee says the answer is "not affected if T has char type."
May the container_of idiom be used in practice?
No, if you want to be strict and use only code that is not clearly strictly conforming according to current language standards.
Yes, if you are a pragmatist and believe that an idiom widely used in Linux, FreeBSD, Microsoft Windows C code is enough to label the idiom conforming in practice.
As noted above, implementations are allowed to guarantee behavior in ways not required by the standard. On a practical note, the container_of idiom is used in the Linux kernel and many other projects. It is easy for implementations to support on modern hardware. Various "sanitizer" systems such as Address Sanitizer, Undefined Behavior Sanitizer, Purify, Valgrind, etc., all allow this behavior. On systems with flat address spaces, and even segmented ones, various "pointer games" are common (e.g. converting to integral values and masking off low order bits to find page boundaries, etc). These techniques are so common in C code today that it is very unlikely that such idioms will cease to function on any commonly supported system now or in the future.
In fact, I found one implementation of a bounds checker that gives a different interpretation of C semantics in its paper. The quotes are from the following paper: Richard W. M. Jones and Paul H. J. Kelly. Backwards-compatible bounds checking for arrays and pointers in C programs. In Third International Workshop on Automated Debugging (editors M. Kamkarand D. Byers), volume 2 (1997), No. 009 of Linköping Electronic Articles in Computer and Information Science. Linköping University Electronic Press, Linköping, Sweden. ISSN 1401-9841, May 1997 pp. 13–26. URL http://www.ep.liu.se/ea/cis/1997/009/02/
ANSI C conveniently allows us to define an object as the fundamental unit of memory allocation. [...] Operations are permitted which manipulate pointers within objects, but pointer operations are not permitted to cross between two objects. There is no ordering defined between objects, and the programmer should never be allowed to make assumptions about how objects are arranged in memory.
Bounds checking is not blocked or weakened by the use of a cast (i.e. type coercion). Cast can properly be used to change the type of the object to which a pointer refers, but cannot be used to turn a pointer to one object into a pointer to another. A corollary is that bounds checking is not type checking: it does not prevent storage from being declared with one data structure and used with another. More subtly, note that for this reason, bounds checking in C cannot easily validate use of arrays of structs which contain arrays in turn.
Every valid pointer-valued expression in C derives its result from exactly one original storage object. If the result of the pointer calculation refers to a different object, it is invalid.
This language is quite definitive but take note the paper was published in 1997, before the DR reports above were written and responded to. The best way to interpret the bounds checking system described in the paper is as a conforming implementation of C, but not one that detects all non-strictly conforming programs. I do see similarities between this paper and A Provenance-aware Memory Object Model for C - N2676 from 2021, however, so in the future the ideas similar to the ones quoted above might be codified in the language standard.
The C memory object model study group is a treasure trove of discussions related to container_of and many other closely related problems. From their mailing list archive we have these mentions of the container_of idiom:
2.5.4 Q34 Can one move among the members of a struct using representation-pointer arithmetic and casts?
The standard is ambiguous on the interaction between the allowable pointer arithmetic (on unsigned char* representation pointers) and subobjects. For example, consider:
Example cast_struct_inter_member_1.c
#include <stdio.h>
#include <stddef.h>
typedef struct { float f; int i; } st;
int main() {
st s = {.f=1.0, .i=1};
int *pi = &(s.i);
unsigned char *pci = ((unsigned char *)pi);
unsigned char *pcf = (pci - offsetof(st,i))
+ offsetof(st,f);
float *pf = (float *)pcf;
*pf = 2.0; // is this free of undefined behaviour?
printf("s.f=%f *pf=%f s.i=%i\n",s.f,*pf,s.i);
}
This forms an unsigned char* pointer to the second member (i) of a struct, does arithmetic on that using offsetof to form an unsigned char* pointer to the first member, casts that into a pointer to the type of the first member (f), and uses that to write.
In practice we believe that this is all supported by most compilers and it is used in practice, e.g. as in the Container idiom of Chisnall et al. [ASPLOS 2015], where they discuss container macros that take a pointer to a structure member and compute a pointer to the structure as a whole. They see it heavily used by one of the example programs they studied. We are told that Intel's MPX compiler does not support the container macro idiom, while Linux, FreeBSD, and Windows all rely on it.
The standard says (6.3.2.3p7): "...When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.". This licenses the construction of the unsigned char* pointer pci to the start of the representation of s.i (presuming that a structure member is itself an "object", which itself is ambiguous in the standard), but allows it to be used only to access the representation of s.i.
The offsetof definition in stddef.h, 7.19p3, " offsetof(type,member-designator) which expands to an integer constant expression that has type size_t, the value of which is the offset in bytes, to the structure member (designated by member-designator, from the beginning of its structure (designated by type", implies that the calculation of pcf gets the correct numerical address, but does not say that it can be used, e.g. to access the representation of s.f. As we saw in the discussion of provenance, in a post-DR260 world, the mere fact that a pointer has the correct address does not necessarily mean that it can be used to access that memory without giving rise to undefined behaviour.
Finally, if one deems pcf to be a legitimate char* pointer to the representation of s.f, then the standard says that it can be converted to a pointer to any object type if sufficiently aligned, which for float* it will be. 6.3.2.3p7: "A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned (68) for the referenced type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer....". But whether that pointer has the right value and is usable to access memory is left unclear.
This example should be allowed in our de facto semantics but is not clearly allowed in the ISO text.
What needs to be changed in the ISO text to clarify this?
More generally, the ISO text's use of "object" is unclear: does it refer to an allocation, or are struct members, union members, and array elements also "objects"?
Key phrase being "This example should be allowed in our de facto semantics but is not clearly allowed in the ISO text." i.e. I take this to mean the the group documenets like N2676 wish to see container_of supported.
However, in a later message:
2.2 Provenance and subobjects: container-of casts
A key question is whether one can cast from a pointer to the first member of a struct to the struct as a whole, and then use that to access other members. We discussed it previously in N2222 Q34 Can one move among the members of a struct using representation-pointer arithmetic and casts?, N2222 Q37 Are usable pointers to a struct and to its first member interconvertable?, N2013, and N2012. Some of us had thought that that was uncontroversially allowed in ISO C, by 6.7.2.1p15 ...A pointer to a structure object, suitably converted, points to its initial member..., and vice versa..., but others disagree. In practice, this seems to be common in real code, in the "container-of" idiom.
Though someone suggested that the IBM XL C/C++ compiler does not support it. Clarification from WG14 and compiler teams would be very helpful on this point.
With this, the group sums it up nicely: the idiom is widely used, but there is disagreement about what the standard says about it.
I think its strictly conforming or there's a big defect in the standard. Referring to your last example, the section on pointer arithmetic doesn't give the compiler any leeway to treat p and q differently. It isn't conditional on how the pointer value was obtained, only what object it points to.
Any interpretation that p and q could be treated differently in pointer arithmetic would require an interpretation that p and q do not point to the same object. Since since there's no implementation dependent behaviour in how you obtained p and q then that would mean they don't point to the same object on any implementation. That would in turn require that p == q be false on all implementations, and so would make all actual implementations non-conforming.
I just want to answer this bit.
int arr[5][5] = ...
int *p = &arr[0][0] + 5;
int *q = &arr[1][0];
This is not UB. It is certain that p is a pointer to an element of the array, provided only that it is within bounds. In each case it points to the 6th element of a 25 element array, and can safely be dereferenced. It can also be incremented or decremented to access other elements of the array.
See n3797 S8.3.4 for C++. The wording is different for C, but the meaning is the same. In effect arrays have a standard layout and are well-behaved with respect to pointers.
Let us suppose for a moment that this is not so. What are the implications? We know that the layout of an array int[5][5] is identical to int[25], there can be no padding, alignment or other extraneous information. We also know that once p and q have been formed and given a value, they must be identical in every respect.
The only possibility is that, if the standard says it is UB and the compiler writer implements the standard, then a sufficiently vigilant compiler might either (a) issue a diagnostic based on analysing the data values or (b) apply an optimisation which was dependent on not straying outside the bounds of sub-arrays.
Somewhat reluctantly I have to admit that (b) is at least a possibility. I am led to the rather strange observation that if you can conceal from the compiler your true intentions this code is guaranteed to produce defined behaviour, but if you do it out in the open it may not.