I am writing a C program where I want to trigger an exception such that the program crashes on Windows and invokes the Windows Error Reporting.
I thought, one good way to trigger a crash would be to divide by zero. However, when I try to do that, the compiler wouldn't compile the program and give the error: C2124.
Is there a way that I can force the compiler to compile the program and ignore the above divide by zero statement in the code?
Or should I find another way to crash my program? :)
#include <stdio.h>
int main(int argc, char **argv)
{
return 1/0;
}
abort, defined in <stdlib.h> is the preferred way to cause abnormal program termination. However, a solution to your specific intent to cause a divide-by-zero is to conceal the divisor from the compiler:
volatile int x = 0;
return 1/x;
The keyword volatile tells the compiler that x may be changed in ways unknown to the compiler. This prevents the compiler from knowing that it is zero, so the compiler must generate code to perform the division at run-time in case x is changed to something other than zero.
Of course, the behavior of division by zero is not defined, so it is not guaranteed to cause program termination.
Also keep in mind that “If you lie to the compiler, it will get its revenge.” (Henry Spencer).
1/0 is evaluated at compile time. The compiler is smart enough to detect that it's not going to work.
I'd do either:
int a = 0;
return 1/a;
(you may get a warning but that's going to compile)
Or trigger a null pointer access:
int *p = NULL;
return *p;
It's hard to get a C program to crash in a well-defined way. Really it's a contradiction in terms.
Favourite ways are inducing stack overflows, null pointer dereferences, and divisions by zero. But compilers are getting better at spotting these tricks. Recursions can get unrolled to loops, and since 1/0 is a compile-time evaluable constant expression, and it's behaviour is undefined, a compiler is allowed to fail compilation.
A more reliable way is to use void abort(void) from <stdlib.h>. That at least is guaranteed by the C standard not to return to the caller:
#include <stdlib.h>
int main(int, char **)
{
abort();
}
Related
The program below has different behaviors with different option levels. When I compile it with -O3, it will never terminate. when I compile it with -O0, it will always terminate very soon.
#include <stdio.h>
#include <pthread.h>
void *f(void *v) {
int *i = (int *)v;
*i = 0;
printf("set to 0!\n");
return NULL;
}
int main() {
const int c = 1;
int i = 0;
pthread_t thread;
void *ptr = (void *)&c;
while (c) {
i++;
if (i == 1000) {
pthread_create(&thread, NULL, &f, ptr);
}
}
printf("done\n");
}
This is the result of running it with different optimization flags.
username#hostname:/src$ gcc -O0 main.c -o main
username#hostname:/src$ ./main
done
set to 0!
set to 0!
username#hostname:/src$ gcc -O3 main.c -o main
username#hostname:/src$ ./main
set to 0!
set to 0!
set to 0!
set to 0!
set to 0!
set to 0!
^C
username#hostname:/src$
The answer given by the professor's slide is like this:
Will it always terminate?
Depends of gcc options
With –O3 (all optimisations): no
Why?
The variable c is likely to stay local in a register, hence it will not be shared.
Solution « volatile »
Thank you for your replies. I now realize that volatile is a keyword in C. The description of the volatile keyword:
A volatile specifier is a hint to a compiler that an object may change its values in ways not specified by the language so that aggressive optimizations must be avoided.
According to my understanding, there is a shared register that stores the c value when we use -O3 flag. So the main thread and sub-thread will share it. In this case, if a sub-thread modifies c to 0, the main thread will get 0 when it wants to read c to compare in the while(c) statement. Then, the loop stops.
There is no register storing c that can be shared by the main thread and sub-threads when we use -O0 flag. Though the c is modified by a sub-thread, this change may not be written to memory and just be stored in a register, or it is written to memory while the main thread just uses the old value which is read and saved in a register. As a result, the loop is infinite.
If I declared the c value with const: const volatile int c = 1;, the program will terminate finally even if we compiled it with -O3. I guess all threads will read c from the main memory and write back to the main memory if they change the c value.
I know, according to the specifications or rules about C language, we are not allowed to modify a value that is declared by the const keyword. But I don't understand what is un behavior.
I wrote a test program:
#include "stdio.h"
int main() {
const int c = 1;
int *i = &c;
*i = 2;
printf("c is : %d\n", c);
}
output
username#hostname:/src$ gcc test.c -o test
test.c: In function ‘main’:
test.c:9:14: warning: initialization discards ‘const’ qualifier from pointer target type [-Wdiscarded-qualifiers]
9 | int *i = &c;
| ^
username#hostname:/src$ ./test
c is : 2
username#hostname:/src$
The result is 2 which means a variable declared with the const can be modified but this behavior is not suggested, right?
I also tried changing the judgment condition. If it is changed to while (1){ from while(c){, the loop will be an infinite one no matter using -O0 or -O3
This program is not a good one as it violates the specifications or rules of C language. Actually it comes from the lecture about software security.
Can I just understand like this? All threads share the same register storing c when we compile the program with -O0.
While the value c is in un-shared registers, so main thread is not informed when sub-threads modify value c when we use -O3. Or, while(c){ is replaced by while(1){ when we use -O3 so the loop is infinite.
I know this question can be solved easily if I check the generated assembly code. But I am not good at it.
This is undefined behavior. Per 6.7.3 Type qualifiers, paragraph 6 of the (draft) C11 standard:
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.
There's no requirement for any particular behavior on the program. How it behaves is literally outside the specifications of the C language.
Your professor's observation of how it behaves may be correct. But he goes off the rails. There is no "why" for undefined behavior. What happens can change with changes to compiler options, particulars of the source code, time of day, or phase of the moon. Anything. Any expectation for any particular behavior is unfounded.
And
Solution « volatile »
is flat-out WRONG.
volatile does not provide sufficient guarantees for multithreaded access. See Why is volatile not considered useful in multithreaded C or C++ programming?.
volatile can appear to "work" because of particulars of the system, or just because any race conditions just don't happen to be triggered in an observable manner, but that doesn't make it correct. It doesn't "work" - you just didn't observe any failure. "I didn't see it break" does not mean "it works".
Note that some C implementations do define volatile much more extensively than the C standard requires. Microsoft in particular defines volatile much more expansively, making volatile much more effective and even useful and correct in multithreaded programs.
But that does not apply to all C implementations. And if you read that link, you'll find it doesn't even apply to Microsoft-compiled code running on ARM hardware...
The professor's explanation is not quite right.
The initial value of c is 1, which is truthy. It's declared as a constant, so its value can't change. Thus, the condition in while (c) is guaranteed to always be true, so there's no need to test the variable at all when the program is running. Just generate code for an infinite loop.
This optimization of not reading the variable is not done when optimization is disabled. In practice, declaring the variable volatile also forces it to be read whenever the variable is referenced in code.
Note that optimizations are implementation-dependent. Assigning to a const variable by accessing it through a non-const pointer results in undefined behavior, so any result is possible.
The typical use of a const volatile variable is for variables that reference read-only hardware registers that can be changed asynchronously (e.g. I/O ports on microcontrollers). This allows the application to read the register but code that tries to assign to the variable will not compile.
The explanation of "The variable c is likely to stay local in a register, hence it will not be shared." is not quite right. Or I'm having trouble parsing its precise meaning.
Once you take a pointer to it, the compiler has to put it into memory, unless it can convince itself that the pointer will not be used.
Here https://godbolt.org/z/YavbYxqoE
mov DWORD PTR [rsp+4], 1
and
lea rcx, [rsp+4]
suggest to me that the compiler has put the variable on the stack.
It's just that the while loop is not checking it for changes due to it being advertised as const.
In c, this pattern is fairly common:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int init_ptr_or_return_err(int *p) {
srand(time(NULL));
// random to make code compile/demonstrate the question
int error = rand() % 2;
if (error) {
return 1;
}
*p = 10;
return 0;
}
int main() {
int a;
init_ptr_or_return_err(&a);
printf("a = %d\n", a);
return 0;
}
In the main function above, without checking the return code of the function, accessing the value of a might be undefined at runtime (but this is not statically determinable). So, it is usually wrapped in a block such as:
if (init_ptr_or_return_err(&a)) {
// error handling
} else {
// access a
}
In this case, the compiler knows that a is initialized in the else because the function returns 0 if and only if it sets a. So, technically, accessing a in the else is defined, but accessing a in the if is undefined. However, return 0 could easily be "return some fixed, but statically unknown value from a file" (and then check just that value before accessing a). So in either case, it isn't statically determinable whether a is initialized or not.
Therefore, it seems to me like in general, the compiler cannot statically decide if this is undefined behavior or not and therefore should not be able to e.g. optimize it out.
What are the exact semantics of such code (is it undefined behavior, something else, or is there a difference between static and runtime undefined behavior) and where does the standard specify this? If this is not defined by the standard, I am using gcc, so answers in the context of gcc would be helpful.
The vast majority of undefined behavior is not statically determinate. Most undefined behavior is of the form "if this statement is reached, and these conditions are met, the program has undefined behavior".
That's the case here. When the program is invoked at a time such that rand() returns an odd number, it has undefined behavior. When it's invoked at a time such that rand() returns an even number, the behavior is well-defined.
Further, the compiler is free to assume you will only invoke the program at a time when rand() returns an even number. For example it might optimize out the branch to the return 1; case, and thereby always print 10.
This code does not necessarily invoke undefined behavior. It is a wide-spread myth that reading an uninitialized variable always invokes undefined behavior. UB only occurs in two special cases:
Reading uninitalized variables that didn't have their address taken, or
Running on an exotic systems with trap representations for plain integers, where an indeterminate value could be a trap.
On mainstream 2's complement platforms (x86/x64, ARM, PowerPC, almost anything...), this code will merely use an unspecified value and invoke unspecified behavior. This is because the variable had its address taken. This is explained in detail here.
Meaning that the result isn't reliable, but the code will execute as expected, without optimizations going bananas etc.
Indeed the compiler will most likely not optimize out the function, but that is because of the time() call and not because of some poorly-defined behavior.
I've got the following code in C
#include <stdio.h>
#include <limits.h>
long checks();
void main() {
int results = checks();
printf("%d", results);
}
long checks(){
return LLONG_MAX;
}
It gives the output of -1
Despite declaring function prototype how is this file even compiling?
int results = checks(); is supposed to give an error!
Moreover there return type don't match!
In C return type of a function mustn't be equal to the same data type?
C is very "forgiving" with things like that. Implicit type conversions are usually not errors in C. Of course, this is a philosophical issue, the modern answer to which is diametrically opposite to what it used to be when the first C compilers were being written. Nowadays we do not call this forgiving, we actually call it extremely unforgiving, because it lets errors go undetected.
You can have most modern C compilers issue warnings for things like that, but you need to examine the documentation of your compiler to figure out how to enable warnings. (It might be something like "-Wall".)
For anyone looking for a quick fix,
Most of the things already said by Mike....
It should give a warning regarding this implicit conversion...
I had to enable -Wconversion flag for the warnings to show up :)
I'm not sure I quite understand the extent to which undefined behavior can jeopardize a program.
Let's say I have this code:
#include <stdio.h>
int main()
{
int v = 0;
scanf("%d", &v);
if (v != 0)
{
int *p;
*p = v; // Oops
}
return v;
}
Is the behavior of this program undefined for only those cases in which v is nonzero, or is it undefined even if v is zero?
I'd say that the behavior is undefined only if the users inserts any number different from 0. After all, if the offending code section is not actually run the conditions for UB aren't met (i.e. the non-initialized pointer is not created neither dereferenced).
A hint of this can be found into the standard, at 3.4.3:
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
This seems to imply that, if such "erroneous data" was instead correct, the behavior would be perfectly defined - which seems pretty much applicable to our case.
Additional example: integer overflow. Any program that does an addition with user-provided data without doing extensive check on it is subject to this kind of undefined behavior - but an addition is UB only when the user provides such particular data.
Since this has the language-lawyer tag, I have an extremely nitpicking argument that the program's behavior is undefined regardless of user input, but not for the reasons you might expect -- though it can be well-defined (when v==0) depending on the implementation.
The program defines main as
int main()
{
/* ... */
}
C99 5.1.2.2.1 says that the main function shall be defined either as
int main(void) { /* ... */ }
or as
int main(int argc, char *argv[]) { /* ... */ }
or equivalent; or in some other implementation-defined manner.
int main() is not equivalent to int main(void). The former, as a declaration, says that main takes a fixed but unspecified number and type of arguments; the latter says it takes no arguments. The difference is that a recursive call to main such as
main(42);
is a constraint violation if you use int main(void), but not if you use int main().
For example, these two programs:
int main() {
if (0) main(42); /* not a constraint violation */
}
int main(void) {
if (0) main(42); /* constraint violation, requires a diagnostic */
}
are not equivalent.
If the implementation documents that it accepts int main() as an extension, then this doesn't apply for that implementation.
This is an extremely nitpicking point (about which not everyone agrees), and is easily avoided by declaring int main(void) (which you should do anyway; all functions should have prototypes, not old-style declarations/definitions).
In practice, every compiler I've seen accepts int main() without complaint.
To answer the question that was intended:
Once that change is made, the program's behavior is well defined if v==0, and is undefined if v!=0. Yes, the definedness of the program's behavior depends on user input. There's nothing particularly unusual about that.
Let me give an argument for why I think this is still undefined.
First, the responders saying this is "mostly defined" or somesuch, based on their experience with some compilers, are just wrong. A small modification of your example will serve to illustrate:
#include <stdio.h>
int
main()
{
int v;
scanf("%d", &v);
if (v != 0)
{
printf("Hello\n");
int *p;
*p = v; // Oops
}
return v;
}
What does this program do if you provide "1" as input? If you answer is "It prints Hello and then crashes", you are wrong. "Undefined behavior" does not mean the behavior of some specific statement is undefined; it means the behavior of the entire program is undefined. The compiler is allowed to assume that you do not engage in undefined behavior, so in this case, it may assume that v is non-zero and simply not emit any of the bracketed code at all, including the printf.
If you think this is unlikely, think again. GCC may not perform this analysis exactly, but it does perform very similar ones. My favorite example that actually illustrates the point for real:
int test(int x) { return x+1 > x; }
Try writing a little test program to print out INT_MAX, INT_MAX+1, and test(INT_MAX). (Be sure to enable optimization.) A typical implementation might show INT_MAX to be 2147483647, INT_MAX+1 to be -2147483648, and test(INT_MAX) to be 1.
In fact, GCC compiles this function to return a constant 1. Why? Because integer overflow is undefined behavior, therefore the compiler may assume you are not doing that, therefore x cannot equal INT_MAX, therefore x+1 is greater than x, therefore this function can return 1 unconditionally.
Undefined behavior can and does result in variables that are not equal to themselves, negative numbers that compare greater than positive numbers (see above example), and other bizarre behavior. The smarter the compiler, the more bizarre the behavior.
OK, I admit I cannot quote chapter and verse of the standard to answer the exact question you asked. But people who say "Yeah yeah, but in real life dereferencing NULL just gives a seg fault" are more wrong than they can possibly imagine, and they get more wrong with every compiler generation.
And in real life, if the code is dead you should remove it; if it is not dead, you must not invoke undefined behavior. So that is my answer to your question.
If v is 0, your random pointer assignment never gets executed, and the function will return zero, so it is not undefined behaviour
When you declare variables (especially explicit pointers), a piece of memory is allocated (usually an int). This peace of memory is being marked as free to the system but the old value stored there is not cleared (this depends on the memory allocation being implemented by the compiler, it might fill the place with zeroes) so your int *p will have a random value (junk) which it has to interpret as integer. The result is the place in memory where p points to (p's pointee). When you try to dereference (aka. access this piece of the memory), it will be (almost every time) occupied by another process/program, so trying to alter/modify some others memory will result in access violation issues by the memory manager.
So in this example, any other value then 0 will result in undefined behavior, because no one knows what *p will point to at this moment.
I hope this explanation is of any help.
Edit: Ah, sorry, again few answers ahead of me :)
It is simple. If a piece of code doesn't execute, it doesn't have a behavior!!!, whether defined or not.
If input is 0, then the code inside if doesn't run, so it depends on the rest of the program to determine whether the behavior is defined (in this case it is defined).
If input is not 0, you execute code that we all know is a case of undefined behavior.
I would say it makes the whole program undefined.
The key to undefined behavior is that it is undefined. The compiler can do whatever it wants to when it sees that statement. Now, every compiler will handle it as expected, but they still have every right to do whatever they want to - including changing parts unrelated to it.
For example, a compiler may choose to add a message "this program may be dangerous" to the program if it detects undefined behavior. This would change the output whether or not v is 0.
Your program is pretty-well defined. If v == 0 then it returns zero. If v != 0 then it splatters over some random point in memory.
p is a pointer, its initial value could be anything, since you don't initialise it. The actual value depends on the operating system (some zero memory before giving it to your process, some don't), your compiler, your hardware and what was in memory before you ran your program.
The pointer assignment is just writing into a random memory location. It might succeed, it might corrupt other data or it might segfault - it depends on all of the above factors.
As far as C goes, it's pretty well defined that unintialised variables do not have a known value, and your program (though it might compile) will not be correct.
I'm working on a C-brain teaser: Write the standard Hello-World program, without semi-colons.
My best answer so far is:
int main(void)
{
if (printf("Hello World!\n"), exit(0), 0)
{
/* do nothing */
}
}
But I don't understand why I don't get compiler error (Visual Studio):
error C4716: 'main' : must return a value
I've tried other functions with a return-type declared, but missing a return-statement, and get this compiler error.
Note that I've also tried:
int foo(void)
{
if (printf("Hello World!\n"), exit(0), true)
{
/* do nothing */
}
}
int main(void)
{
foo();
}
And don't get a compiler error on foo. If I remove the "exit(0)", I do get the compiler error. Apparently the compiler has knowledge that "exit" is a special function? This seems very odd to me.
As Jens pointed out in a comment, the posted code does not exhibit undefined behavior. The original answer here isn't correct and doesn't even really seem to answer the question anyway (on re-reading everything a few years later).
The question can be summed up as, "why doesn't MSVC issue warning C4716 for main() in the same circumstances it would for other functions"?
Note that diagnostic C4716 is a warning, not an error. As far as the C language is concerned (from a standards point of view anyway), there's never a requirement to diagnose a non-error. but that doesn't really explain why there's a difference, it's just a technicality that may mean you can't complain too much...
The real explanation for why MSVC doesn't issue the warning for main() when it does for other functions can really only be answered by someone on the MSVC team. As far as I can tell, the docs do not explain the difference, but maybe I missed something; so all I can do is speculate:
In C++, the main() function is treated specially in that there's an implicit return 0; just before the closing brace.
I suspect that Microsoft's C compiler provides the same treatment when it's compiling in C mode (if you look at the assembly code, the EAX register is cleared even if there's no return 0;), therefore as far as the compiler is concerned there is no reason to issue warning C4716. Note that Microsoft's C mode is C90 compliant, not C99 compliant. In C90 'running off the end' of main() has undefined behavior. However, always returning 0 meets the low requirements of undefined behavior, so there's no problem.
So even if the program in the question did run off the end main() (resulting in undefined behavior) there still wouldn't be a warning.
Original, not so good answer:
In ANSI/ISO 90 C, this is undefined behavior, so MS really should produce an error (but they aren't required to by the standard). In C99 the standard permits an implied return at the end of main() - as does C++.
So if this is compiled as C++ or C99, there's no error and it's the same as return 0;. C90 results in undefined behavior (which does not require a diagnostic).
Interestingly (well, maybe not), of the several compilers (VC9, VC6, GCC 3.4.5, Digital Mars, Comeau) I tried this on with my basic, mostly default options set (the environment I pretty much always use for quick-n-dirty testing of code snippets) the only compiler that warns about the missing return statement is VC6 when compiling as a C++ program (VC6 does not complain when compiling for C).
Most of the compilers complain (a warning or error) if the function is not named main. Digital Mars when compiling for C does not and GCC doesn't for C or C++.
if you don't return anything the program will return 0.
See http://www.research.att.com/~bs/bs_faq2.html#void-main
The compile may be smart enough to know that exit(0) is being called, which never returns so it's not needed.
From http://msdn.microsoft.com/en-us/library/k9dcesdd(VS.71).aspx
C++ Language Reference
exit Function
The exit function, declared in the
standard include file STDLIB.H,
terminates a C++ program.
The value supplied as an argument to
exit is returned to the operating
system as the program's return code or
exit code. By convention, a return
code of zero means that the program
completed successfully.
Note You can use the constants
EXIT_FAILURE and EXIT_SUCCESS, defined
in STDLIB.H, to indicate success or
failure of your program.
Issuing a
return statement from the main
function is equivalent to calling the
exit function with the return value as
its argument.
Because it's not an error -- it's undefined behavior. See section 6.9.1, paragraph 12 of the C99 standard:
If the } that terminates a function is reached, and the value of the function call is used by the caller, the behavior is undefined.
Thus, the compiler is free to do whatever it wants when seeing your code failing to return -- it can emit an error, a warning, or nothing at all. In the case of GCC, it by default compiles successfully with no warnings or errors. With the -Wall option, it emits a warning.
main() is special in C: it's the only function that's allowed not to return a value. The C standard says that if control reaches the end of main() without a return statement, it implicitly returns 0. This is only true for main(), all other non-void functions must return a value.
Section 5.1.2.2.3:
If the return type of the main function is a type compatible with int, a return from the
initial call to the main function is equivalent to calling the exit function with the value returned by the main function as its argument;10) reaching the } that terminates the
main function returns a value of 0. If the return type is not compatible with int, the
termination status returned to the host environment is unspecified.
Not returning from main, or more precisely, not reaching the terminating '}' of the main function is perfectly fine for a variety of reasons. Typical cases include looping forever and exiting or aborting before.
In this case, the exit(0) is guaranteed to be executed before the end of main is reached. Why should the compiler warn? You wouldn't expect a warning for these, would you?
int main (void) { for (;;) { /* do something useful */ } }
int main (void) { /* do something */; exit (0); }
I'd even be surprised if
int main (void)
{
if (printf("Hello World!\n"), exit(0), true)
{
/* do nothing */
}
return 0;
}
wouldn't cause warning: unreachable code: return 0 or somesuch.