In C, does initialising a variable to it's own value make sense? If yes, what for?
Allow me to elaborate. In Git sources there are some examples of initialising a variable to it's own undefined value, as seen in transport.c or wt-status.c. I removed assignments from those declarations and run tests. Seeing no regressions, I thought that those assignments were redundant.
On the other hand, I did some simple tests with GCC 4.6 and Clang 2.9.
#include <stdio.h>
int main() {
printf("print to increase probability of registers being non-zero\n");
int status = status;
return printf("%i\n", status);
}
Compiling with -Wall -std=c99 and various -O levels prints no warnings and shows that status == 0. Clang with a non-zero optimisation level prints some garbage values though. It makes me infer that results of such expressions are undefined.
I can imagine that such assignment can suppress an uninitialised variable warning, but it's not the case for the examples taken from Git. Removing assignments doesn't introduce any warnings.
Are such assignments an undefined behaviour? If not, what do you use them for?
I've suggested the change on the Git mailing list. Here's what I've learned.
This compiles because Standard C99 §6.2.1/7 says:
Any identifier that is not a structure, union, or enumeration tag "has scope that begins just after the completion of its declarator." The declarator is followed by the initializer.
However, value of status is Indeterminate. And you cannot rely on it being initialized to something meaningful.
How does it work?
int status creates an space for the variable to exist on the stack(local storage) which is then further read to perform status = status, status might get initialized to any value that was present in the stack frame.
How can you guard against such self Initialization?
gcc provides a specific setting to detect self Initializations and report them as errors:
-Werror=uninitialized -Winit-self
Why is it used in this code?
The only reason I can think it is being used in the said code is to suppress the unused variable warning for ex: In transport.c, if the control never goes inside the while loop then in that control flow cmp will be unused and the compiler must be generating a warning for it. Same scenario seems to be with status variable in wt-status.c
For me the only reason of such self-assigning initialization is to avoid a warning.
In the case of your transport.c, I don't even understand why it is useful. I would have left cmp uninitialized.
My own habit (at least in C) is to initialize all the variables, usually to 0. The compiler will optimize unneeded initialization, and having all variables initialized makes debugging easier.
There is a case when I want a variable to remain uninitialized, and I might self-assign it: random seeds:
unsigned myseed = myseed;
srand(myseed);
On MacOS X 10.7.2, I tried this example - with the result shown...
$ cat x3.c
#include <stdio.h>
int status = -7;
int main()
{
printf("status = %d\n", status);
int status = status;
printf("status = %d\n", status);
return 0;
}
$ make x3
gcc -O -std=c99 -Wall -Wextra x3.c -o x3
$ ./x3
status = -7
status = 1787486824
$
The stack space where the local status in main() has been used by printf() so the self-initialization copies garbage around.
I think status = status doesn't change the value of status (compared to int status;). I think it is used to suppress the unused variable warning.
Related
I'm writing my own test-runner for my current project. One feature (that's probably quite common with test-runners) is that every testcase is executed in a child process, so the test-runner can properly detect and report a crashing testcase.
I want to also test the test-runner itself, therefore one testcase has to force a crash. I know "crashing" is not covered by the C standard and just might happen as a result of undefined behavior. So this question is more about the behavior of real-world implementations.
My first attempt was to just dereference a null-pointer:
int c = *((int *)0);
This worked in a debug build on GNU/Linux and Windows, but failed to crash in a release build because the unused variable c was optimized out, so I added
printf("%d", c); // to prevent optimizing away the crash
and thought I was settled. However, trying my code with clang instead of gcc revealed a surprise during compilation:
[CC] obj/x86_64-pc-linux-gnu/release/src/test/test/test_s.o
src/test/test/test.c:34:13: warning: indirection of non-volatile null pointer
will be deleted, not trap [-Wnull-dereference]
int c = *((int *)0);
^~~~~~~~~~~
src/test/test/test.c:34:13: note: consider using __builtin_trap() or qualifying
pointer with 'volatile'
1 warning generated.
And indeed, the clang-compiled testcase didn't crash.
So, I followed the advice of the warning and now my testcase looks like this:
PT_TESTMETHOD(test_expected_crash)
{
PT_Test_expectCrash();
// crash intentionally
int *volatile nptr = 0;
int c = *nptr;
printf("%d", c); // to prevent optimizing away the crash
}
This solved my immediate problem, the testcase "works" (aka crashes) with both gcc and clang.
I guess because dereferencing the null pointer is undefined behavior, clang is free to compile my first code into something that doesn't crash. The volatile qualifier removes the ability to be sure at compile time that this really will dereference null.
Now my questions are:
Does this final code guarantee the null dereference actually happens at runtime?
Is dereferencing null indeed a fairly portable way for crashing on most platforms?
I wouldn't rely on that method as being robust if I were you.
Can't you use abort(), which is part of the C standard and is guaranteed to cause an abnormal program termination event?
The answer refering to abort() was great, I really didn't think of that and it's indeed a perfectly portable way of forcing an abnormal program termination.
Trying it with my code, I came across msvcrt (Microsoft's C runtime) implements abort() in a special chatty way, it outputs the following to stderr:
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
That's not so nice, at least it unnecessarily clutters the output of a complete test run. So I had a look at __builtin_trap() that's also referenced in clang's warning. It turns out this gives me exactly what I was looking for:
LLVM code generator translates __builtin_trap() to a trap instruction if it is supported by the target ISA. Otherwise, the builtin is translated into a call to abort.
It's also available in gcc starting with version 4.2.4:
This function causes the program to exit abnormally. GCC implements this function by using a target-dependent mechanism (such as intentionally executing an illegal instruction) or by calling abort.
As this does something similar to a real crash, I prefer it over a simple abort(). For the fallback, it's still an option trying to do your own illegal operation like the null pointer dereference, but just add a call to abort() in case the program somehow makes it there without crashing.
So, all in all, the solution looks like this, testing for a minimum GCC version and using the much more handy __has_builtin() macro provided by clang:
#undef HAVE_BUILTIN_TRAP
#ifdef __GNUC__
# define GCC_VERSION (__GNUC__ * 10000 \
+ __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__)
# if GCC_VERSION > 40203
# define HAVE_BUILTIN_TRAP
# endif
#else
# ifdef __has_builtin
# if __has_builtin(__builtin_trap)
# define HAVE_BUILTIN_TRAP
# endif
# endif
#endif
#ifdef HAVE_BUILTIN_TRAP
# define crashMe() __builtin_trap()
#else
# include <stdio.h>
# define crashMe() do { \
int *volatile iptr = 0; \
int i = *iptr; \
printf("%d", i); \
abort(); } while (0)
#endif
// [...]
PT_TESTMETHOD(test_expected_crash)
{
PT_Test_expectCrash();
// crash intentionally
crashMe();
}
you can write memory instead of reading it.
*((int *)0) = 0;
No, dereferencing a NULL pointer is not a portable way of crashing a program. It is undefined behavior, which means just that, you have no guarantees what will happen.
As it happen, for the most part under any of the three main OS's used today on desktop computers, that being MacOS, Linux and Windows NT (*) dereferencing a NULL pointer will immediately crash your program.
That said: "The worst possible result of undefined behavior is for it to do what you were expecting."
I purposely put a star beside Windows NT, because under Windows 95/98/ME, I can craft a program that has the following source:
int main()
{
int *pointer = NULL;
int i = *pointer;
return 0;
}
that will run without crashing. Compile it as a TINY mode .COM files under 16 bit DOS, and you'll be just fine.
Ditto running the same source with just about any C compiler under CP/M.
Ditto running that on some embedded systems. I've not tested it on an Arduino, but I would not want to bet either way on the outcome. I do know for certain that were a C compiler available for the 8051 systems I cut my teeth on, that program would run fine on those.
The program below should work. It might cause some collateral damage, though.
#include <string.h>
void crashme( char *str)
{
char *omg;
for(omg=strtok(str, "" ); omg ; omg=strtok(NULL, "") ) {
strcat(omg , "wtf");
}
*omg =0; // always NUL-terminate a NULL string !!!
}
int main(void)
{
char buff[20];
// crashme( "WTF" ); // works!
// crashme( NULL ); // works, too
crashme( buff ); // Maybe a bit too slow ...
return 0;
}
gcc version - gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
Compiling below code with -Wall -Wuninitialized is throwing warning as expected
warning: ‘test’ is used uninitialized in this function [-Wuninitialized]
#include<stdio.h>
#include<stdlib.h>
void check (int test )
{
printf("%d",test);
}
int
main()
{
int test;
check(test);
return 0;
}
But compiling below code with -Wall -Wuninitialized is not throwing warning
#include<stdio.h>
#include<stdlib.h>
void check (int test )
{
printf("%d",test);
}
int
main()
{
int test;
int condition = 0;
if(condition == 27)
test = 10;
check(test);
return 0;
}
Shouldn't it throw warning?. is it anyway related to compiler optimization?
What an user understands as a false positive may be different for the particular
user. Some users are interested in cases that are hidden because of
actions of the optimizers combined with the current environment.
However, many users aren't, since that case is hidden because it
cannot arise in the compiled code. The canonical example is (MM05):
int x;
if (f ())
x = 3;
return x;
where 'f' always return non-zero for the current environment, and
thus, it may be optimised away. Here, a group of users would like to
get an uninitialized warning since 'f' may return zero when compiled
elsewhere. Yet, other group of users would consider spurious a warning
about a situation that cannot arise in the executable being compiled.
https://gcc.gnu.org/wiki/Better_Uninitialized_Warnings#Proposal
EDIT: MMO5 is already fixed
https://gcc.gnu.org/wiki/Better_Uninitialized_Warnings#MM05
Yes, gcc is not warning you because the code is optimized away. Some people think this is the right thing, because it might be that way by a compile time configuration value. Others think it's the wrong thing, because it hides possible mistakes.
You are not the first to notice this problem with GCC. There's a whole proposal about it.
clang with -Wall detects the problem and even suggests a fix.
$ make
cc -Wall -g test.c -o test
test.c:15:7: warning: variable 'test' is used uninitialized whenever 'if' condition is false
[-Wsometimes-uninitialized]
if(condition == 27)
^~~~~~~~~~~~~~~
test.c:18:10: note: uninitialized use occurs here
check(test);
^~~~
test.c:15:4: note: remove the 'if' if its condition is always true
if(condition == 27)
^~~~~~~~~~~~~~~~~~~
test.c:12:12: note: initialize the variable 'test' to silence this warning
int test;
^
= 0
1 warning generated.
I am trying to understand why the following code generates an "argument might be clobbered.." warning. Here's a minimal sample:
#include <unistd.h>
extern char ** environ;
int test_vfork(char **args, char **newEnviron)
{
pid_t pid = vfork();
if (pid == 0) {
execve(args[0], args, (newEnviron != NULL)? newEnviron : environ);
}
return 0;
}
And this is the output (this is gcc 4.3.3):
$ gcc -O2 -Wclobbered -c test.c -o test.o
test.c: In function ‘test_vfork’:
test.c:5: warning: argument ‘newEnviron’ might be clobbered by ‘longjmp’ or ‘vfork’
Also I found out that the warning goes away if I replace the execve line with the following:
if (newEnviron != NULL)
execve(commandLine[0], commandLine, newEnviron);
else
execve(commandLine[0], commandLine, environ);
Not sure why gcc likes this better than the original. Can anyone shed some light?
From C99's longjmp description:
All accessible objects have values, and all other components of the abstract
machine218) have state, as of the time the longjmp function was called, except
that the values of objects of automatic storage duration that are local to the
function containing the invocation of the corresponding setjmp macro that do not
have volatile-qualified type and have been changed between the setjmp invocation
and longjmp call are indeterminate.
If you make newEnviron a volatile object, this paragraph indicates that newEnviron will not be clobbered by longjmp. The specification or implementation of vfork might have a similar caveat.
--EDIT--
Because you have optimizations enabled and because newEnviron is non-volatile and because it is not accessed after its use in the ternary conditional operator, one optimization the implementation could perform for your conditional operator would be to actually re-assign newEnviron with the value of environ. Something like:
execve(args[0], args, (newEnviron = newEnviron ? newEnviron : environ));
But we know from the manual for vfork that modifying most objects before the execve results in undefined behaviour.
args doesn't suffer from the same concern because there is no such conditional test.
Your if statement has more sequence points and besides that, might not be as easily recognized as an optimization opportunity. However, I'd hazard a guess that the sequence points play a strong role in the optimization.
The optimization, by the way, is that the storage for newEnviron is repurposed for the result of the conditional operator, meaning neither another register is used (if registers would normally be used) nor additional stack-space is required (for systems using stacks).
I'll bet that if you were able to convince gcc that you needed to access the value of newEnviron after the execve line, the optimization would not be possible and your warning would disappear.
But of course, using volatile is a simple solution, too.
Can I detect a possible segmentation fault at compile-time?
I understand the circumstance of a segmentation fault. But I am curious if GCC as a compiler has some flags to check for the basic scenarios resulting in segmentation faults.
This would help enormously to take precautions before releasing a library.
Can I detect a possible segmentation fault at compile time?
Sometimes, but no, you can't flawlessly detect these scenarios at compile time. Consider the general case in this C code:
volatile extern int mem[];
void foo (int access)
{
mem[access];
}
A compiler would be too noisy if it were to warn about this access at compile time, the code is valid C and a warning is, in general, inappropriate. Static analysis can't do anything with this code unless you have a mechanism for whole-program or link-time analysis.
An additonal optimization flag in GCC 4.8 which can sometimes catch a few out-of-bounds access in loops is `-faggressive-loop-optimizations'. This found a number of issues in the SPEC benchmark suite last year (http://blog.regehr.org/archives/918)
I understand the circumstance of segmentation fault. But i am curious if GCC as a compiler has some flags to check for the basic scenarios resulting in segmention faults.
GCC 4.8 comes with an address sanitizer which can help catch some of these run-time only issues (out of bounds/use-after-free bugs). You can use it with
-fsanitize=address.
http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Debugging-Options.html#Debugging-Options
GCC 4.9 (which will be released within the next few months) comes with an undefined behaviour sanitizer and more aggressive optimization of NULL pointer paths, which might help you catch some more issues. When it comes, it will be available with -fsanitize=undefined
http://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html#Debugging-Options
Note however that neither of these are "compile-time" solutions, they both rely on instrumenting the binary and performing run-time checks.
Yes, there are ways of detecting some faults that may cause runtime errors such as segmentation faults. Those ways are called warnings. Many warnings messages are places where you have undefined behavior, and undefined behavior is often the leading cause of runtime crashes.
When I build, I always use the -Wall, -Wextra and -pedantic flags.
Other than that, there are really no good way of detecting all places that may cause segmentation faults (or other runtime errors), except strict coding guidelines, code reviews and plenty of testing.
gcc -Wall -Werror as mention by Joachim Pileborg are very good ideas. You could also use another compiler maybe. some reports more memory issues. I think you can not do a lot more at compile time.
At running time, I highly recommend Valgrind, which is a amazing tool for detecting memory issues. (don't forget to compile with the -g option)
Can I detect a possible segmentation fault at compile-time?
Yes, it is possible. Unfortunately, it is very limited what the compiler can do. Here is a buggy code example and the output from gcc and clang:
#include <stdlib.h>
int main() {
int a[4];
int x, y;
a[5]=1;
if(x)
y = 5;
x = a[y];
int* p = malloc(3*sizeof(int));
p[5] = 0;
free(p);
free(p);
}
For this buggy code, gcc -Wall -Wextra corrupt.c gives
corrupt.c: In function ‘main’:
corrupt.c:13:1: warning: control reaches end of non-void function [-Wreturn-type]
corrupt.c:6:7: warning: ‘x’ is used uninitialized in this function [-Wuninitialized]
clang catches more:
corrupt.c:5:5: warning: array index 5 is past the end of the array (which contains 4 elements) [-Warray-bounds]
a[5]=1;
^ ~
corrupt.c:3:5: note: array 'a' declared here
int a[4];
^
corrupt.c:6:8: warning: variable 'y' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
if(x)
^
corrupt.c:8:11: note: uninitialized use occurs here
x = a[y];
^
corrupt.c:6:5: note: remove the 'if' if its condition is always true
if(x)
^~~~~
corrupt.c:4:13: note: initialize the variable 'y' to silence this warning
int x, y;
^
= 0
corrupt.c:6:8: warning: variable 'x' is uninitialized when used here [-Wuninitialized]
if(x)
^
corrupt.c:4:10: note: initialize the variable 'x' to silence this warning
int x, y;
^
= 0
3 warnings generated.
I believe the above code example gives you insight what to expect. (Even though I tried, I could not get the static analyzer in clang to work.)
This would help enormously to take precautions before releasing a library.
As you can see above, it won't be an enormous help, unfortunately. I can only confirm that instrumentation is currently the best way to debug your code. Here is another code example:
#include <stdlib.h>
int main() {
int* p = malloc(3*sizeof(int));
p[5] = 0; /* line 4 */
free(p);
p[1]=42; /* line 6 */
free(p); /* line 7 */
}
Compiled as clang -O0 -fsanitize=address -g -Weverything memsen.c. (GCC 4.8 also has address santizier but I only have gcc 4.7.2.) The output:
==3476==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000f004 at pc 0x4887a7 bp 0x7fff9544be30 sp 0x7fff9544be28
WRITE of size 4 at 0x60200000f004 thread T0
#0 0x4887a6 in main /home/ali/tmp/memsen.c:4
[...]
Awesome, we know what went wrong (heap-buffer-overflow) and where (in main /home/ali/tmp/memsen.c:4). Now, I comment out line 4 and get:
==3481==ERROR: AddressSanitizer: heap-use-after-free on address 0x60200000eff4 at pc 0x4887d7 bp 0x7fff27a00d50 sp 0x7fff27a00d48
WRITE of size 4 at 0x60200000eff4 thread T0
#0 0x4887d6 in main /home/ali/tmp/memsen.c:6
[...]
Again, we see what went wrong and where. Finally, I comment out line 6.
==3486==ERROR: AddressSanitizer: attempting double-free on 0x60200000eff0 in thread T0:
#0 0x46dba1 in free /home/ali/llvm/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:65
#1 0x48878c in main /home/ali/tmp/memsen.c:7
[...]
Also caught the problem.
If your code has tests, or at least you can run your code with different inputs on your machine before releasing the library, you could probably track down a significant portion of the bugs. Unfortunately, it is not a compile-time solution and you probably don't want to release instrumented code (code compiled with -fsanitize=* flag). So if the user runs your code with an input that triggers a bug, the program will still crash with a segmentation fault.
We want to start using -Wall -Werror on a large project.
Due to the size, this change has to be phased, and we want to start with the most important warnings first.
The best way to do it seems to be using -Wall -Werror, with exceptions for specific warnings. The exceptional warnings are those which we have a lot of (so fixing them all is hard and risky), and we don't consider them very dangerous.
I'm not saying we don't want to fix all these warnings - just not on the first phase.
I know two ways to exclude a warning from -Werror - the best is -Wno-error=xxx, and if it doesn't work - -Wno-xxx (of course, we prefer to see the warning and ignore it, rather than hide it).
My problem is with warnings which are enabled by default, and don't have a -Wxxx flag related to them. I couldn't find any way to alllow them when -Werror is used.
I'm specifically concerned about two specific warnings. Here's a program that exhibits them and the compiler output:
#include <stdio.h>
void f(int *p) { printf("%p\n", p); }
int main(int argc, char *argv[]) {
const int *p = NULL;
const unsigned int *q = NULL;
f(p); /* Line 7: p is const, f expects non const */
if (p == q) { /* Line 8: p is signed, q is unsigned */
printf("Both NULL\n");
}
return 0;
}
% gcc warn.c
warn.c: In function 'main':
warn.c:7: warning: passing argument 1 of 'f' discards qualifiers from pointer target type
warn.c:8: warning: comparison of distinct pointer types lacks a cast
I know the best solution is to fix these warnings, but it's much easier said than done. In order for this change to be successful, we have to do this phased, and can't do too many changes at once.
Any suggestions?
Thanks.
What about phasing on a compilation unit/module/library basis instead of per warning? Is triggering a subtarget compilation an option (a good-enough build system in place)?
It might be folly, but ...
Why not a simple grep ?
something like
gcc teste.c 2>&1 | grep -v 'comparison of distinct' | grep -v 'some_other_string'
You probably want to hide these greps in a script, and call the script from your makefile instead of gcc
According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43245 it will be "-Wdiscarded-qualifiers", but since the bug-fixed entry is from May 1, 2014, the gcc compiler you are using might not support it.