Is this behavior of clang standard compliant? - c

This is going to be a long, language lawyerish question, so I'd like to quickly state why I find it relevant. I am working on a project where strict standard compliance is crucial (writing a language that compiles to C). The example I am going to give seems like a standard violation on the part of clang, and so, if this is the case, I'd like to confirm it.
gcc says that a conditional with a pointer to a restrict qualified pointer can not co-inhabit a conditional statement with a void pointer. On the other hand, clang compiles such things fine. Here is an example program:
#include <stdlib.h>
int main(void){
int* restrict* A = malloc(8);
A ? A : malloc(8);
return 0;
}
For gcc, the options -std=c11 and -pedantic may be included or not in any combination, likewise for clang and the options -std=c11 and -Weverything. In any case, clang compiles with no errors, and gcc gives the following:
tem-2.c: In function ‘main’:
tem-2.c:7:2: error: invalid use of ‘restrict’
A ? A : malloc(8);
^
The c11 standard says the following with regard to conditional statements, emphasis added:
6.5.15 Conditional operator
...
One of the following shall hold for the second and third operands:
— both operands have arithmetic type;
— both operands have the same structure or union type;
— both operands have void type;
— both operands are pointers to qualified or unqualified versions of compatible types;
— one operand is a pointer and the other is a null pointer constant; or
— one operand is a pointer to an object type and the other is a pointer to a qualified or unqualified version of void.
...
If both the second and third operands are pointers or one is a null pointer constant and the
other is a pointer, the result type is a pointer to a type qualified with all the type qualifiers
of the types referenced by both operands. Furthermore, if both operands are pointers to
compatible types or to differently qualified versions of compatible types, the result type is
a pointer to an appropriately qualified version of the composite type; if one operand is a
null pointer constant, the result has the type of the other operand; otherwise, one operand
is a pointer to void or a qualified version of void, in which case the result type is a
pointer to an appropriately qualified version of void.
...
The way I see it, the first bold portion above says that the two types can go together, and the second bold portion defines the result to be a pointer to a restrict qualified version of void. However, as the following states, this type can not exist, and so the expression is correctly identified as erroneous by gcc:
6.7.3 Type qualifiers, paragraph 2
Types other than pointer types whose referenced type is an object type shall not be restrict-qualified.
Now, the problem is that a "shall not" condition is violated by this example program, and so is required to produce an error, by the following:
5.1.1.3 Diagnostics, paragraph 1
A conforming implementation shall produce at least one diagnostic message (identified in
an implementation-defined manner) if a preprocessing translation unit or translation unit
contains a violation of any syntax rule or constraint, even if the behavior is also explicitly
specified as undefined or implementation-defined. Diagnostic messages need not be
produced in other circumstances.
It seems clang is not standard compliant by treating an erroneous type silently. That makes me wonder what else clang does silently.
I am using gcc version 5.4.0 and clang version 3.8.0, on an x86-64 Ubuntu machine.

Yes it looks like a bug.
Your question more briefly: can void be restrict qualified? Since void is clearly not a pointer type, the answer is no. Because this violates a constraint, the compiler should give a diagnostic.
I was able to trick clang to confess its sins by using a _Generic expression
puts(_Generic(A ? A : malloc(8), void* : "void*"));
and clang tells me
static.c:24:18: error: controlling expression type 'restrict void *' not compatible with any generic association type
puts(_Generic(A ? A : malloc(8), void* : "void*"));
which shows that clang here really tries to match a nonsense type restrict void*.
Please file them a bug report.

While a compiler could satisfy all obligations surrounding restrict by ignoring the qualifier altogether, a compiler which wants to keep track of what it is or is not allowed to do needs to keep track of which pointers hold copies of restrict pointers. Given something like:
int *foo;
int *bar;
int wow(int *restrict p)
{
foo = p;
...
*p = 123;
*foo = 456;
*p++;
*bar = 890;
return *p;
}
since foo is derived from p, a compiler must allow for accesses made via
foo to alias accesses via p. A compiler need not make such allowances
for accesses made via bar, since that is known not to hold an address derived from p.
The rules surrounding restrict get murky in cases where a pointer may or
may not be derived from another. A compiler would certainly be allowed to
simply ignore a restrict qualifier in cases where it can't track all of
the pointers derived from a pointer; I'm not sure if any such cases would
invoke UB even if nothing ever modifies the storage identified by the
pointer. If a syntactic construct is structurally guaranteed to invoke
UB, having a compiler squawk may be more useful than having it act in an
arbitrary fashion (though having a compiler simply ignore any restrict
qualifiers it can't fully handle might be more useful yet).

Related

In C language, is it semantically possible to create an lvalue with incomplete type?

In the C89 standard, I found the following section:
3.2.2.1 Lvalues and function designators
Except when it is the operand of the sizeof operator, the unary & operator, the ++ operator, the -- operator, or the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue). If the lvalue has qualified type, the value has the unqualified version of the type of the lvalue; otherwise the value has the type of the lvalue. If the lvalue has an incomplete type and does not have array type, the behavior is undefined.
If I read it correctly, it allows us to create an lvalue and applies some operators on it, which compiles and can cause undefined behavior during runtime.
Problem is that, I can't think of an example of "an lvalue with incomplete type" which can pass compiler's semantic check and triggers undefined behavior.
Consider that an lvalue is
An lvalue is an expression (with an object type or an incomplete type other than void) that designates an object.
and that incomplete type is
Types are partitioned into object types (types that describe objects), function types (types that describe functions), and incomplete types (types that describe objects but lack information needed to determine their sizes).
A failed program I tried:
struct i_am_incomplete;
int main(void)
{
struct i_am_incomplete *p;
*(p + 1);
return 0;
}
and got the following error:
error: arithmetic on a pointer to an incomplete type 'struct i_am_incomplete'
*(p + 1);
~ ^
Anyone can think of an example on this ? An example of "an lvalue with incomplete type" which can pass compiler's semantic check and triggers undefined behavior.
UPDATE:
As #algrid said in the answer, I misunderstood undefined behavior, which contains compile error as an option.
Maybe I'm splitting hairs, I still wonder the underlying motivation here to prefer undefined behavior over disallowing an lvalue to have an incomplete type.
I believe this program demonstrates the case:
struct S;
struct S *s, *f();
int main(void)
{
s = f();
if ( 0 )
*s; // here
}
struct S { int x; };
struct S *f() { static struct S y; return &y; }
On the marked line, *s is an lvalue of incomplete type, and it does not fall under any of the "Except..." cases in your quote of 3.2.2.1 (which is 6.3.2.1/2 in the current standard). Therefore it is undefined behaviour.
I tried my program in gcc and clang and they both rejected it with the error that a pointer to incomplete type cannot be dereferenced; but I cannot find anywhere in the Standard which would make that a constraint violation, so I believe the compilers are incorrect to reject the program. Or possibly the standard is defective by omitting such a constraint, which would make sense.
(Since the code is inside an if(0), that means the compiler cannot reject it merely on the basis of it being undefined behaviour).
Some build systems may have been designed in a way would allow code like:
extern struct foo x;
extern use_foo(struct foo x); // Pass by value
...
use_foo(x);
to be processed successfully without the compiler having to know or care
about the actual representation of struct foo [for example, some systems may process pass-by-value by having the caller pass the address of an object and requiring the called function to make a copy if it's going to modify it].
Such a facility may be useful on systems that could support it, and I don't think the authors of the Standard wanted to imply that code which used that feature was "broken", but they also didn't want to mandate that all C implementations support such a feature. Making the behavior undefined would allow implementations to support it when practical, without requiring that they do so.
"Undefined behavior" term includes compilation error as an option. From the C89 standard:
Undefined behavior - behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately-valued objects, for which the Standard imposes no requirements. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
As you can see "terminating a translation" is ok.
In this case I believe the compilation error you get for you sample code is an example of "undefined behavior" implemented as compile time error.
Sure, array types can be that:
extern double A[];
...
A[0] = 1; // lvalue conversion of A
This has well defined behavior, even if the definition of A is not visible to the compiler. So inside this TU the array type is never completed.

Pass integer address in method expecting unsigned char pointer

Recently I had code (in C) where I passed the address of an int to a function expecting a pointer to unsigned char. Is this not valid? Is this UB or what?
e.g.,
void f(unsigned char*p)
{
// do something
}
// Call it somewhere
int x = 0; // actually it was uint32_t if it makes difference
f(&x);
I did get a warning though ... Compiled in Xcode
int * and unsigned char * are not considered compatible types, so implicit conversion will issue a diagnostic. However, the standard does allow explicit casting between different pointers, subject to two rules (C11 section 6.3.2.3):
Converting a type "pointer to A" to type "pointer to B" and back to "pointer to A" shall result in the same original pointer. (i.e., if p is of type int *, then (int *)(double *)p will yield p)
Converting any pointer to a char * will point to the lowest-addressable byte of the object.
So, in your case, an explicit (unsigned char *) cast will yield a conforming program without any undefined behavior.
The cast is required, see C11 (n1570) 6.5.2.2 p.2:
[…] Each argument shall have a type such that its value may be assigned to an object with the unqualified version of the type of its corresponding parameter.
This refers to the rules for assignment, the relevant part is (ibid. 6.5.16.1 p.1)
One of the following shall hold:
[…]
the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right.
[…]
And unsigned char isn’t compatible to int.
These rules both appear in a “constraint” section, where “shall” means that the compiler has to give a “diagnostic message” (cf. C11 5.1.1.3) and may stop compiling (or whatever, everything beyond that diagnostic is, strictly speaking, out of the scope of the C standard). Your code is an example of a constraint violation.
Other examples of constraint violations are calling a (prototyped and non-variadic) function with the wrong number of arguments, using bitwise operators on doubles, or redeclaring an identifier with an incompatible type in the same scope, ibid. 5.1.1.3 p.2:
Example
An implementation shall issue a diagnostic for the translation unit:
char i;
int i;
because in those cases where wording in this International Standard describes the behavior for a construct as being both a constraint error and resulting in undefined behavior, the constraint error shall be diagnosed.
Syntax violations are treated equally.
So, strictly speaking, your program is as invalid as
int foo(int);
int main() {
It's my birthday!
foo(0.5 ^ 42, 12);
}
which a conforming implementation very well may compile, maybe to a program having undefined behavior, as long as it gives at least one diagnostic (e.g. a warning).
For e.g. gcc, a warning is a diagnostic (you can turn syntax and constraint violations into errors with -pedantic-errors).
The term ill-formed may be used to refer to either a syntax or a constraint violation, the C standard doesn't use this term, but cf. C++11 (n3242):
1.3.9
ill-formed program
program that is not well formed
1.3.26
well-formed program
C++ program constructed according to the syntax rules, diagnosable semantic rules, and the One Definition Rule.
The language-lawyer attitude aside, your code will probably always either be not compiled at all (which should be reason enough to do the cast), or show the expected behavior.
C11, §6.5.2.2:
2 Each argument shall have a type such that its value may be assigned to an object with the unqualified version of the type of its corresponding parameter.
§6.5.16.1 describes assignment in terms of a list of constraints, including
the left operand has atomic, qualified, or unqualified pointer type, and (considering the type the left operand would have after lvalue conversion) both operands are pointers to qualified or unqualified versions of compatible types, and the type pointed to by the left has all the qualifiers of the type pointed to by the right
int and unsigned char are not compatible types, so the program is not well-formed and the Standard doesn't even guarantee that it will compile.
Although some would say "it is undefined behavior according to the standard", here is what happens de-facto (answering by an example):
Safe:
void f(char* p)
{
char r, w = 0;
r = p[0]; // read access
p[0] = w; // write access
}
...
int x = 0;
f((char*)&x); // the casting is just in order to emit the compilation warning
This code is safe as long as you access memory with p[i], where 0 <= i <= sizeof(int)-1.
Unsafe:
void f(int* p)
{
int r, w = 0;
r = p[0]; // read access
p[0] = w; // write access
}
...
char x[sizeof(int)] = {0};
f((int*)&x); // the casting is just in order to emit the compilation warning
This code is unsafe because although the allocated variable is large enough to accommodate an int, its address in memory is not necessarily a multiple of sizeof(int). As a result, unless the compiler (as well as the underlying HW architecture) supports unaligned load/store operations, a memory access violation will occur during runtime if the address of this variable in memory is indeed not properly aligned.

Sizeof a function that returns void in C [duplicate]

What would this statement yield?
void *p = malloc(sizeof(void));
Edit: An extension to the question.
If sizeof(void) yields 1 in GCC compiler, then 1 byte of memory is allocated and the pointer p points to that byte and would p++ be incremented to 0x2346? Suppose p was 0x2345. I am talking about p and not *p.
The type void has no size; that would be a compilation error. For the same reason you can't do something like:
void n;
EDIT.
To my surprise, doing sizeof(void) actually does compile in GNU C:
$ echo 'int main() { printf("%d", sizeof(void)); }' | gcc -xc -w - && ./a.out
1
However, in C++ it does not:
$ echo 'int main() { printf("%d", sizeof(void)); }' | gcc -xc++ -w - && ./a.out
<stdin>: In function 'int main()':
<stdin>:1: error: invalid application of 'sizeof' to a void type
<stdin>:1: error: 'printf' was not declared in this scope
If you are using GCC and you are not using compilation flags that remove compiler specific extensions, then sizeof(void) is 1. GCC has a nonstandard extension that does that.
In general, void is a incomplete type, and you cannot use sizeof for incomplete types.
Although void may stand in place for a type, it cannot actually hold a value. Therefore, it has no size in memory. Getting the size of a void isn’t defined.
A void pointer is simply a language construct meaning a pointer to untyped memory.
void has no size. In both C and C++, the expression sizeof (void) is invalid.
In C, quoting N1570 6.5.3.4 paragraph 1:
The sizeof operator shall not be applied to an expression that
has function type or an incomplete type, to the parenthesized name of
such a type, or to an expression that designates a bit-field member.
(N1570 is a draft of the 2011 ISO C standard.)
void is an incomplete type. This paragraph is a constraint, meaning that any conforming C compiler must diagnose any violation of it. (The diagnostic message may be a non-fatal warning.)
The C++ 11 standard has very similar wording. Both editions were published after this question was asked, but the rules go back to the 1989 ANSI C standard and the earliest C++ standards. In fact, the rule that void is an incomplete type to which sizeof may not be applied goes back exactly as far as the introduction of void into the language.
gcc has an extension that treats sizeof (void) as 1. gcc is not a conforming C compiler by default, so in its default mode it doesn't warn about sizeof (void). Extensions like this are permitted even for fully conforming C compilers, but the diagnostic is still required.
Taking the size of void is a GCC extension.
sizeof() cannot be applied to incomplete types. And void is incomplete type that cannot be completed.
In C, sizeof(void) == 1 in GCC, but this appears to depend on your compiler.
In C++, I get:
In function 'int main()':
Line 2: error: invalid application of 'sizeof' to a void type
compilation terminated due to -Wfatal-errors.
To the 2nd part of the question: Note that sizeof(void *)!= sizeof(void).
On a 32-bit arch, sizeof(void *) is 4 bytes, so p++, would be set accordingly.The amount by which a pointer is incremented is dependent on the data it is pointing to. So, it will be increased by 1 byte.
while sizeof(void) perhaps makes no sense in itself, it is important when you're doing any pointer math.
eg.
void *p;
while(...)
p++;
If sizeof(void) is considered 1 then this will work.
If sizeof(void) is considered 0 then you hit an infinite loop.
Most C++ compilers choosed to raise a compile error when trying to get sizeof(void).
When compiling C, gcc is not conforming and chose to define sizeof(void) as 1. It may look strange, but has a rationale. When you do pointer arithmetic adding or removing one unit means adding or removing the object pointed to size. Thus defining sizeof(void) as 1 helps defining void* as a pointer to byte (untyped memory address). Otherwise you would have surprising behaviors using pointer arithmetic like p+1 == p when p is void*. Such pointer arithmetic on void pointers is not allowed in c++ but works fine with when compiling C with gcc.
The standard recommended way would be to use char* for that kind of purpose (pointer to byte).
Another similar difference between C and C++ when using sizeof occurs when you defined an empty struct like:
struct Empty {
} empty;
Using gcc as my C compiler sizeof(empty) returns 0.
Using g++ the same code will return 1.
I'm not sure what states both C and C++ standards on this point, but I believe defining the size of some empty structs/objects helps with reference management to avoid that two references to differing consecutive objects, the first one being empty, get the same address. If reference are implemented using hidden pointers as it is often done, ensuring different address will help comparing them.
But this is merely avoiding a surprising behavior (corner case comparison of references) by introduction another one (empty objects, even PODs consume at least 1 byte memory).

Equivalence of p[0] and *p for incomplete array types

Consider the following code (it came about as a result of this discussion):
#include <stdio.h>
void foo(int (*p)[]) { // Argument has incomplete array type
printf("%d\n", (*p)[1]);
printf("%d\n", p[0][1]); // Line 5
}
int main(void) {
int a[] = { 5, 6, 7 };
foo(&a); // Line 10
}
GCC 4.3.4 complains with the error message:
prog.c: In function ‘foo’:
prog.c:5: error: invalid use of array with unspecified bounds
Same error message in GCC 4.1.2, and seems to be invariant of -std=c99, -Wall, -Wextra.
So it's unhappy with the expression p[0], but it's happy with *p, even though these should (in theory) be equivalent. If I comment out line 5, the code compiles and does what I would "expect" (displays 6).
Presumably one of the following is true:
My understanding of the C standard(s) is incorrect, and these expressions aren't equivalent.
GCC has a bug.
I'd place my money on (1).
Question: Can anyone elaborate on this behaviour?
Clarification: I'm aware that this can be "solved" by specifying an array size in the function definition. That's not what I'm interested in.
For "bonus" points: Can anyone confirm that MSVC 2010 is in error when it rejects line 10 with the following message?
1><snip>\prog.c(10): warning C4048: different array subscripts : 'int (*)[]' and 'int (*)[3]'
Section 6.5.2.1 of n1570, Array subscripting:
Constraints
One of the expressions shall have type ‘‘pointer to complete object type’’, the other
expression shall have integer type, and the result has type ‘‘type’’.
So the standard forbids the expression p[0] if p is a pointer to an incomplete type. There is no such restriction for the indirection operator *.
In older versions/drafts of the standard, however, (n1256 and C99), the word "complete" is absent in that paragraph. Not being involved in any way in the standard procedure, I can only guess whether it's a breaking change or the correction of an omission. The behaviour of the compiler suggests the latter. That is reinforced by the fact that p[i] is per the standard identical to *(p + i) and the latter expression doesn't make sense for a pointer to an incomplete type, so for p[0] to work if p is a pointer to an incomplete type, an explicit special case would be needed.
My C is a bit rusty, but my reading is that when you have an int (*p)[] this:
(*p)[n]
Says "dereference p to get an array of ints, then take the nth one". Which seems naturally to be well defined. Whereas this:
p[n][m]
Says "take the nth array in p, then take the mth element of that array". Which doesn't seem well-defined at all; you have to know how big the arrays are to find where the nth one starts.
This could work for the specific special case where n = 0, because the 0th array is easy to find regardless of how big the arrays are. You've simply found that GCC isn't recognising this special case. I don't know the language spec in detail, so I don't know whether that's a "bug" or not, but my personal tastes in language design are that p[n][m] should either work or not, not that it should work when n is statically known to be 0 and not otherwise.
Is *p <===> p[0] really a definitive rule from the language specification, or just an observation? I don't think of dereferencing and indexing-by-zero as the same operation when I'm programming.
For your "bonus points" question (you probably should have asked this as a separate question), MSVC10 is in error. Note that MSVC only implements C89, so I have used that standard.
For the function call, C89 §3.3.2.2 tells us:
Each argument shall have a type such that its value may be assigned to
an object with the unqualified version of the type of its
corresponding parameter.
The constraints for assignment are in C89 §3.3.16:
One of the following shall hold: ... both operands are pointers to
qualified or unqualified versions of compatible types, and the type
pointed to by the left has all the qualifiers of the type pointed to
by the right;
So we can assign two pointers (and thus call a function with a pointer parameter using a pointer argument) if the two pointers point to compatible types.
The compatibility of various array types is defined in C89 §3.5.4.2:
For two array types to be compatible, both shall have compatible
element types, and if both size specifiers are present, they shall
have the same value.
For the two array types int [] and int [3] this condition clearly holds. Therefore the function call is legal.
void foo(int (*p)[])
{
printf("%d\n", (*p)[1]);
printf("%d\n", p[0][1]); // Line 5
}
Here, p is a pointer to an array of an unspecified number of ints. *p accesses that array, so (*p)[1] is the 2nd element in the array.
p[n] adds p and n times the size of the pointed-to array, which is unknown. Even before considering the [1], it's broken. It's true that zero times anything is still 0, but the compiler's obviously checking the validity of all the terms without short-circuiting as soon as it sees zero. So...
So it's unhappy with the expression p[0], but it's happy with *p, even though these should (in theory) be equivalent.
As explained, they're clearly not equivalent... think of p[0] as p + 0 * sizeof *p and it's obvious why....
For "bonus" points: Can anyone confirm that MSVC 2010 is in error when it rejects line 10 with the following message?
1>\prog.c(10): warning C4048: different array subscripts : 'int ()[]' and 'int ()[3]'
Visual C++ (and other compilers) are free to warn about things that they think aren't good practice, things that have been found empirically to be often erroneous, or things the compiler writers just had an irrational distrust of, even if they're entirely legal re the Standard.... Examples that may be familiar include "comparing signed and unsigned" and "assignment within a conditional (suggest surrounding with extra parentheses)"

Can I do arithmetic on void * pointers in C?

is this valid
void *p = &X; /* some thing */
p += 12;
and if so what does p now point to?
I have (third party) code that does this (and compiles cleanly) and my guess is that the void * was treated as a char *. My trusty K&R is silent(ish) on the topic
EDIT: My little test app runs fine on gcc 4.1.1 and treats void * as char *. But g++ barfs
I know how to do it properly. I need to know if I have to clean this code base to find all the places its done.
BTW gcc -pedantic throws up a warning
Summary:
The C spec is ambiguous. It says that in terms of representation and use as function parameters void* =char*. But it is silent regarding pointer arithmetic.
gcc (4) permits it and treats it as
char *
g++ refuses it
gcc -pedantic warns about it
vs2010 both c and c++
refuses it
No this is not legal. A void* cannot be arbitrarily incremented. It needs to be cast to a specific type first.
If you want to increment it by a specific number of bytes then this is the solution I use.
p = ((char*)p) + 12;
The char type is convenient because it has a defined size of 1 byte.
EDIT
It's interesting that it runs on gcc with a warning. I tested on Visual Studio 2010 and verified it does not compile. My limited understanding of the standard would say that gcc in the error here. Can you add the following compilation flags
-Wall -ansi -pedantic
To quote from the spec:
§6.5.6/2: For addition, either both operands shall have arithmetic type, or one operand shall be a pointer to an object type and the other shall have integer type. (Incrementing is equivalent to adding 1.)
A pointer to void is not a pointer to an object type, as per these excerpts:
§6.2.5/1: [...] Types
are partitioned into object types (types that fully describe objects), function types (types
that describe functions), and incomplete types (types that describe objects but lack
information needed to determine their sizes).
§6.2.5/19: The void type comprises an empty set of values; it is an incomplete type that cannot be
completed.
Therefore, pointer arithmetic is not defined for pointer to void types.
It depends on compiler. Those that allow it consider sizeof(*(void *)) as 1.
EDIT: it's only for void pointer arithmetic. It would have no sense using in this case steps of sizeof(int) or of 0. The common expectations of someone who uses it would be the smallest possible step.
Your guess is correct.
In the standard ISO C99, section 6.2.5 paragraph 26, it declares that void pointers and character pointers will have the same representation and alignment requirements (paraphrasing).
You may want to have a look at the top voted answer for this question
Pointer arithmetic for void pointer in C
I don't think you can, because it doesn't know its type, therefore can not seek the correct amount of bytes.
Cast it to a type first, i.e. (int).

Resources