Apparently undocumented GCC behaviour with "constant" initializers in C - c

Consider the following C code:
#include <stdio.h>
int x = 5;
int y = x-x+10;
int z = x*0+5;
int main()
{
printf("%d\n", y);
printf("%d\n", z);
return 0;
}
The ANSI C90 standard states "All the expressions for an object that has static storage duration [...] shall be constant expressions" (6.5.7 constraint 3).
Clearly the initializers for y and z are not constant expressions. And indeed, trying to compile the above C code with clang main.c or clang -ansi main.c gives an error for this reason.
However, compiling with gcc main.c or even gcc main.c -ansi -pedantic -Wextra -Wall gives no errors at all, and runs, printing 10 and 5.
On the other hand, trying something like the following:
#include <stdio.h>
int x = 5;
int main()
{
int y[x-x+2];
printf("%lu\n", sizeof(y));
return 0;
}
gives a warning when compiled with gcc -ansi -pedantic ... or clang -ansi -pedantic ....
So gcc randomly performs the mathematically correct cancellations in order to pretend that something is a constant expression, even when asked not to (-ansi). Why is this? Is this a bug?
By the way, my gcc version is 9.4.0 and my clang version is 10.0.0-4.

Without looking, there's one simple explanation: the code that checks whether the initializer is an acceptable constant expression operates on an internal representation after a pass that does arithmetic simplification, so it never "sees" your x-x or x*0.
This probably makes the implementation of the rule in GCC simpler: it just has to ask "is this node one that represents a constant?" rather than "is this tree one that I could evaluate as a constant later on?". It also facilitates the behavior that they probably want for the later standards (see below), and a special case for -ansi would probably add an undesirable amount of code complexity.
Is it a bug? Arguably. But it's one with such a small impact that it's not especially likely to get fixed. It works correctly for valid code, and it errors correctly on "really" invalid code that could actually cause a problem. It only deviates from the standard in a fairly harmless way, and only for C90 (since the C99 and later standards say "an implementation may accept other forms of constant expressions", which gives GCC latitude to allow an expression that mentions things not on the laundry list, as long as it has a constant value).

Related

gfortran: pass logical argument to Fortran function from C

What argument type should I use in C when calling a Fortran function that takes logical arguments, specifically with gfortran? Where is this documented for gfortran?
Here's an example program that doesn't compile without warnings:
Contents of one.f:
subroutine proc1(x)
logical x
end
Contents of main.c:
void proc1_(_Bool *x);
int main() {
_Bool x;
proc1_(&x);
return 0;
}
If I compile using GCC as follows, with LTO enabled, I get a warning about mismatching function prototypes:
gfortran -flto -c one.f
gcc -flto -c main.c
gcc -flto main.o one.o
The warning I get:
main.c:2:6: warning: type of 'proc1_' does not match original declaration [-Wlto-type-mismatch]
2 | void proc1_(_Bool *x);
| ^
one.f:2:22: note: 'proc1' was previously declared here
2 | subroutine proc1(x)
| ^
one.f:2:22: note: code may be misoptimized unless '-fno-strict-aliasing' is used
Note that enabling LTO allows the linker to verify that argument types match between prototypes. Using LTO is unfortunately not our choice. CRAN requires the submitted code to compile without these warnings with LTO enabled.
I only see problems when trying to use logical arguments. real, integer and character are all fine.
gfortran can be asked to produce C prototypes, and this is the output it gives me:
gfortran -flto -fc-prototypes-external -c one.f
void proc1_ (int_fast32_t *x);
Using int_fast32_t in the C prototype doesn't work either. No type that I tried did, neither int, nor _Bool. Usually, when there is a type mismatch between prototypes, the error message mentions what the type should be—but not in this case.
How can I find what is the correct type to use?
For real modern C-Fortran interoperability you should use the types (kinds) supplied by the iso_c_binding module and make your Fortran procedure bind(c). That way you can use logical(c_bool).
In the old style the best thing is to work with integers and pass an int and only correct from integer to logical inside Fortran. Old C did not have any bool, it was added later.
With minimal changes:
subroutine proc1(x)
use iso_c_binding
logical(c_bool) x
end
#include <stdbool.h>
void proc1_(bool *x);
int main() {
bool x;
proc1_(&x);
return 0;
}
> gfortran -flto -c one.f
> gcc -flto -c main.c
> gcc -flto main.o one.o
issues no warning on my Linux and GCC 7 and 10.
Or after further changes:
subroutine proc1(x) bind(C, name="proc1")
use iso_c_binding
logical(c_bool), value :: x
end
#include <stdbool.h>
void proc1(bool x);
int main() {
bool x;
proc1(x);
return 0;
}
The change to pass-by-value of course only when it is indeed just an input parameter.
The correct and guaranteed to be portable solution is, as explained in the answer by Vladimir F, to create a Fortran wrapper routine that uses ISO_C_BINDING. This wrapper can also take the opportunity to make a more idiomatic C interface, e.g. using the value specifier to pass scalars by value.
However, for the quick and dirty solution that works on GFortran WITHOUT LTO (and somewhat likely on other compilers, but no guarantees), see https://gcc.gnu.org/onlinedocs/gfortran/Internal-representation-of-LOGICAL-variables.html#Internal-representation-of-LOGICAL-variables . That is, you can pass a C integer variable of the appropriate size containing 1 for true and 0 for false. Appropriate size here meaning that unless you have compiled your Fortran code with -fdefault-integer-8 or such compile options, the GFortran default kind logical will be 4 bytes, so a plain C int should be good (or int32_t if you really want to be sure, though I don't think GFortran supports any targets where the C int is not 32 bits).
The reason this doesn't work with LTO is that while the above works, in the bowels of GCC the Fortran LOGICAL variables are almost the same as integers, but not quite. So in practice they are special integer variables with max value 1 and min value 0 even though they take up more space (as specified by their kind parameter). So this kind of type mismatch is likely what it complains about. Unfortunately no solution to this one, except the above correct solution via ISO_C_BINDING.

Why are the results of this code different with and without "-fsanitize=undefined,address"?

I found that this code produces different results with "-fsanitize=undefined,address" and without it.
int printf(const char *, ...);
union {
long a;
short b;
int c;
} d;
int *e = &d.c;
int f, g;
long *h = &d.a;
int main() {
for (; f <= 0; f++) {
*h = g;
*e = 6;
}
printf("%d\n", d.b);
}
The command line is:
$ clang -O0 -fsanitize=undefined,address a.c -o out0
$ clang -O1 -fsanitize=undefined,address a.c -o out1
$ clang -O1 a.c -o out11
$ ./out0
6
$ ./out1
6
$ ./out11
0
The Clang version is:
$ clang -v
clang version 13.0.0 (/data/src/llvm-dev/llvm-project/clang 3eb2158f4fea90d56aeb200a5ca06f536c1df683)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /data/bin/llvm-dev/bin
Found candidate GCC installation: /opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7
Selected GCC installation: /opt/rh/devtoolset-7/root/usr/lib/gcc/x86_64-redhat-linux/7
Candidate multilib: .;#m64
Candidate multilib: 32;#m32
Selected multilib: .;#m64
Found CUDA installation: /usr/local/cuda, version 10.2
The OS and platform are:
CentOS Linux release 7.8.2003 (Core).0, x86_64 GNU/Linux
My questions:
Is there something wrong with my code? Is taking the address of multiple members of the union invalid in C?
If there is something wrong with my code, how do I get LLVM (or GCC) to warn me? I have used -Wall -Wextra but LLVM and GCC show no warning.
Is there something wrong with the code?
For practical purposes, yes.
I think this is the same underlying issue as Is it undefined behaviour to call a function with pointers to different elements of a union as arguments?
As Eric Postpischil points out, the C standard as read literally seems to permit your code, and require it to print out 6 (assuming that's consistent with how your implementation represents integer types and how it lays out unions). However, this literal reading would render the strict aliasing rule almost entirely impotent, so in my opinion it's not what the standard authors would have intended.
The spirit of the strict aliasing rule is that the same object may not be accessed through pointers to different types (with certain exceptions for character types, etc) and that the compiler may optimize on the assumption that this never happens. Although d.a and d.c are not strictly speaking "the same object", they do have overlapping storage, and I think compiler authors interpret the rule as also not allowing overlapping objects to be accessed through pointers to different types. Under that interpretation your code would have undefined behavior.
In Defect Report 236 the committee considered a similar example and stated that it has undefined behavior, because of its use of pointers that "have different types but designate the same region of storage". However, wording to clarify this does not seem to have ever made it into any subsequent version of the standard.
Anyhow, I think the practical upshot is that you cannot expect your code to work "correctly" under modern compilers that enforce their interpretations of the strict aliasing rule. Whether or not this is a clang bug is a matter of opinion, but even if you do think it is, then it's a bug that they are probably not ever going to fix.
Why does it behave this way?
If you use the -fno-strict-aliasing flag, then you get back to the 6 behavior. My guess is that the sanitizers happen to inhibit some of these optimizations, which is why you don't see the 0 behavior when using those options.
What seems to have happened under the hood with -O1 is the compiler assumed that the stores to *h and *e don't interact (because of their different types) and therefore can be freely reordered. So it hoisted *h = g outside the loop, since after all multiple stores to the same address, with no intervening load, are redundant and only the last one needs to be kept. It happened to put it after the loop, presumably because it can't prove that e doesn't point to g, so the value of g needs to be reloaded after the loop. So the final value of d.b is derived from *h = g which effectively does d.a = 0.
How to get a warning?
Unfortunately, compilers are not good at checking, either statically or at runtime, for violations of (their interpretation of) the strict aliasing rule. I'm not aware of any way to get a warning for such code. With clang you can use -Weverything to enable every warning option that it supports (many of which are useless or counterproductive), and even with that, it gives no relevant warnings about your program.
Another example
In case anyone is curious, here's another test case that doesn't rely on any type pun, reinterpretation, or other implementation-defined behavior.
#include <stdio.h>
short int zero = 0;
void a(int *pi, long *pl) {
for (int x = 0; x < 1000; x++) {
*pl = x;
*pi = zero;
}
}
int main(void) {
union { int i; long l; } u;
a(&u.i, &u.l);
printf("%d\n", u.i);
}
Try on godbolt
As read literally, this code would appear to print 0 on any implementation: the last assignment in a() was to u.i, so u.i should be the active member, and the printf should output the value 0 which was assigned to it. However, with clang -O2, the stores are reordered and the program outputs 999.
Just as a counterpoint, though, if you read the standard so as to make the above example UB, then this leads to the somewhat absurd conclusion that u.l = 0; u.i = 5; print(u.i); is well defined and prints 5, but that *&u.l = 0; *&u.i = 5; print(u.i); is UB. (Recall that the "cancellation rule" of & and * applies to &*p but not to *&x.)
The whole situation is rather unsatisfactory.
I will rewrite the code for ease of reading:
int printf(const char *, ...);
union
{
long l;
short s;
int i;
} u;
long *ul = &u.l;
int *ui = &u.i;
int counter, zero;
int main(void)
{
for (; counter <= 0; counter++)
{
*ul = zero;
*ui = 6;
}
printf("%d\n", u.s);
}
The only questionable code here is the use of u.s in the printf, when u.s is not the last member of the union that was stored. That is defined by C 2018 6.5.2.3, which says the value of u.s is that of the named member, and note 99 clarifies this means that, if s is not the member last used to store a value, the appropriate bytes are reinterpreted as a short. This is well established.
The other code is ordinary: *ul = zero; stores a value in a union member. There is no aliasing violating because ul points to a long and is used to access a long. *ui = 6; stores a value in another union member and is also not an aliasing violation.
The specific bytes used to represent 6 in an int are implementation-defined in regard to ordering and padding bits. However, whatever they are, they should be the same with or without Clang’s “sanitization” and the same in optimization levels 0 and 1. Therefore, the same result should be obtained in all compilations.
This is a compiler bug.
I agree with other comments and answer that this is likely a defect in the C standard, as it makes the aliasing rule largely useless. Nonetheless, the sample code conforms to the requirements of the C standard and ought to work as described.

"initialiser element is not constant" error in C, when using static const variable - Sometimes - Compiler settings?

I'm posting this because I couldn't find a suitable answer elsewhere, not because similar things haven't been asked before.
A project compiles just fine with the following:
#include <stdint.h>
void foo(void)
{ if (bar)
{ static const uint8_t ConstThing = 20;
static uint8_t StaticThing = ConstThing;
//...
}
}
But a cloned project does not, throwing the above error. Looks like we've not completely cloned compiler settings / warning levels etc, but can't find the difference right now.
Using arm-none-eabi-gcc (4.7.3) with -std=gnu99. Compiling for Kinetis.
If anyone knows which settings control cases when this is legal and illegal in the same compiler, I'm all ears. Thanks in advance.
Found the difference.
If optimisation is -O0 it doesn't compile.
If optimisation is -OS it does.
I'm guessing it produces 'what you were asking for, a better way' and fixes it.
Didn't see that coming. Thanks for your input everyone.
Converting some of my comments into an answer.
In standard C, ConstThing is a constant integer, but not an integer constant, and you can only initialize static variables with integer constants. The rules in C++ are different, as befits a different language.
C11 §6.7.9 Initialization ¶4 states:
All the expressions in an initializer for an object that has static or thread storage duration shall be constant expressions or string literals.
§6.4.4.1 Integer constants defines integer constants.
§6.6 Constant expressions defines constant expressions.
…I'm not sure I understand the difference between a 'constant integer' and an 'integer constant'.
Note that ConstThing is not one of the integer constants defined in §6.4.4.1 — so, whatever else it is, it is not an integer constant. Since it is a const-qualified int, it is a constant integer, but that is not the same as an integer constant. Sometimes, the language of the standard is surprising, but it is usually very precise.
The code in the question was compiled by GCC 4.7.3, and apparently compiling with -O0 triggers the error and compiling with -Os (-OS is claimed in the question, but not supported in standard GCC — it requires the optional argument to -O to be a non-negative integer, or s, g or fast) does not. Getting different views on the validity of the code depending on the optimization level is not a comfortable experience — changing the optimization should not change the meaning of the code.
So, the result is compiler dependent — and not required by the C standard. As long as you know that you are limiting portability (in theory, even if not in practice), then that's OK. It's if you don't realize that you're breaking the standard rules and if portability matters, then you have problems of the "Don't Do It" variety.' Personally, I wouldn't risk it — code should compile with or without optimization, and should not depend on a specific optimization flag. It's too fragile otherwise.
Having said that, if it's any consolation, GCC 10.2.0 and Apple clang version 11.0.0 (clang-1100.0.33.17) both accept the code with options
gcc -std=c11 -pedantic-errors -pedantic -Werror -Wall -Wextra -O3 -c const73.c
with any of -O0, -O1, -O2, -O3, -Os, -Og, -Ofast. That surprises me — I don't think it should be accepted in pedantic (strictly) standard-conforming mode (it would be different with -std=gnu11; then extensions are deemed valid). Even adding -Weverything to the clang compilations does not trigger an error. That really does surprise me. The options are intended to diagnose extensions over the standard, but are not completely successful. Note that GCC 4.7.3 is quite old; it was released 2013-04-11. Also, GCC 7.2.0 and v7.3.0 complain about the code under -O0, but not under -Os, -O1, -O2, or -O3 etc, while GCC 8.x.0, 9.x.0 and 10.x.0 do not.
extern int bar;
extern int baz;
extern void foo(void);
#include <stdio.h>
#include <stdint.h>
void foo(void)
{
if (bar)
{
static const uint8_t ConstThing = 20;
static uint8_t StaticThing = ConstThing;
baz = StaticThing++;
}
if (baz)
printf("Got a non-zero baz (%d)\n", baz);
}
However, I suspect that you get away with it because of the limited scope of ConstThing. (See also the comment by dxiv.)
If you use extern const uint8_t ConstThing; (at file scope, or inside the function) with the initializer value omitted, you get the warning that started the question.
extern int bar;
extern int baz;
extern void foo(void);
#include <stdio.h>
#include <stdint.h>
extern const uint8_t ConstThing; // = 20;
void foo(void)
{
if (bar)
{
static uint8_t StaticThing = ConstThing;
baz = StaticThing++;
}
if (baz)
printf("Got a non-zero baz (%d)\n", baz);
}
None of the compilers accepts this at any optimization level.

What is exactly undefined when using a C function forwardly declarated without its arguments?

While fiddling about old strange compatibility behaviors of C, I ended up with this piece of code:
#include <stdio.h>
int f();
int m() {
return f();
}
int f(int a) {
return a;
}
int main() {
f(2);
printf("%i\n", m());
}
I'm sure that the call to f() in m() is an undefined behavior as f() should take exactly one argument, but:
on x86, both GCC 9.1 and clang 8.0.1 do not show any warning (nor in -Wextra, -Weverything or whatever) except when using GCC and -O3. The output is then 2 without -O3, 0 with it. On Windows, MSVC doesn't print any error and the program outputs just random numbers.
on ARM (Raspberry Pi 3), GCC 6.3.0 and clang 3.8.1, I observe the same behavior for errors, the option -O3 still outputs 0, but normal compilation leads to 2 with GCC and... 66688 with clang.
When the error message is present, it's pretty much what you would expect: (pretty funny as a is not present in the printed line)
foo.c: In function ‘m’:
foo.c:4:9: warning: ‘a’ is used uninitialized in this function [-Wuninitialized]
return f();
^~~
foo.c: In function ‘main’:
foo.c:11:2: warning: ‘a’ is used uninitialized in this function [-Wuninitialized]
printf("%i\n", m());
My guess is that -O3 leads GCC to inline the calls, thus making it understand that a problem occurs; and that the leftovers on the stack or in the registers are used as if they were the argument to the call. But how can it still compile? Is this really the (un)expected behavior?
The specific rule violated is C 2018 6.5.2.2 (Function calls) 6:
If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions. If the number of arguments does not equal the number of parameters, the behavior is undefined.…
Since this is not a constraint, the compiler is not required to produce a diagnostic—the behavior is entirely undefined by the C standard, meaning the standard imposes no requirements at all.
Since the standard imposes no requirements, both ignoring the issue (or failing to recognize it) and diagnosing a problem are permissible.

Unused Variable Error in C .. Simple Question

I get error in C(Error- Unused Variable) for variable when I type in following code
int i=10;
but when I do this(break it up into two statements)
int i;
i=10;
The error Goes away
..I am using Xcode(ver-4.1)(Macosx-Lion)..
Is something wrong with xcode....
No nothing is wrong the compiler just warns you that you declared a variable and you are not using it.
It is just a warning not an error.
While nothing is wrong, You must avoid declaring variables that you do not need because they just occupy memory and add to the overhead when they are not needed in the first place.
The compiler isn't wrong, but it is missing an opportunity to print a meaningful error.
Apparently it warns if you declare a variable but never "use" it -- and assigning a vale to it qualifies as using it. The two code snippets are equivalent; the first just happens to make it a bit easier for the compiler to detect the problem.
It could issue a warning for a variable whose value is never read. And I wouldn't be surprised if it did so at a higher optimization level. (The analysis necessary for optimization is also useful for discovering this kind of problem.)
It's simply not possible for a compiler to detect all possible problems of this kind; doing so would be equivalent to solving the Halting Problem. (I think.) Which is why language standards typically don't require warnings like this, and different compilers expend different levels of effort detecting such problems.
(Actually, a compiler probably could detect all unused variable problems, but at the expense of some false positives, i.e., issuing warnings for cases where there isn't really a problem.)
UPDATE, 11 years later:
Using gcc 11.3.0 with -Wall, I get warnings on both:
$ cat a.c
int main() {
int i = 10;
}
$ gcc -Wall -c a.c
a.c: In function ‘main’:
a.c:2:9: warning: unused variable ‘i’ [-Wunused-variable]
2 | int i = 10;
| ^
$ cat b.c
int main() {
int i;
i = 10;
}
$ gcc -Wall -c b.c
b.c: In function ‘main’:
b.c:2:9: warning: variable ‘i’ set but not used [-Wunused-but-set-variable]
2 | int i;
| ^
$
But clang 8.0.1 does not warn on the second program. (XCode probably uses clang.)
The language does not require a warning, but it would certainly make sense to issue one in this case. Tests on godbolt.org indicate that clang issues a warning for the second program starting with version 13.0.0.
(void) i;
You can cast the unused variable to void to suppress the error.

Resources