Trapping floating-point overflow in C - c

I am trying to trap floating-point overflow in C. Here is what I have tried
#define _GNU_SOURCE
#include <fenv.h>
#include <signal.h>
#include <stdio.h>
void catch_overflow (int sig) {
printf ("caught division by zero\n");
signal (sig, catch_overflow);
}
int main(void) {
feenableexcept(FE_DIVBYZERO);
signal (FE_DIVBYZERO, catch_overflow);
float a = 1., b = 0.; float c = a/b; return 0; }
I expected to see "caught division by zero" message, but I only get a core dump message, "Floating point exception (core dumped)". How could I modify the program to get the "caught division by zero" message?
Thanks.

A floating point exception is translated in machine- and OS-dependent way to some signaling approach. But no such way allows mixing of different constant spaces, as FE_xxx and SIGxxx. Using FE_DIVBYZERO for signal(), you really caught SIGILL which is not generated for floating-point errors (because you did not specify OS, I was free to choose any - Linux in my case).
For your program, I have made this exception catching working under Linux with two changes:
Setting signal number to handle to SIGFPE.
Declaring a, b, c as volatile (prevents compiler from calculating c as INF at compile time).
After this, the program fell into eternal cycle printing "caught division by zero" because signal handler for such error restores execution to the same command. You shall carefully consider how to fix such errors without making an erroneous instruction eternal. For example, you can use longjmp to go out of signal handler to an installed exception handler.
And, you still should consult your target OS manuals for how to catch FP errors.
(This has been already described in comments in main parts - but I considered to form an explicit answer.)

Related

How to trap floating-point exceptions on M1 Macs?

The way to trap floating-point exceptions is architecture-dependent. This is code I have tested successfully on an Intel (x86) Mac: it takes the square root of a negative number twice, once before, and once after, enabling floating-point exception trapping. The second time, fpe_signal_handler() is called.
#include <cmath> // for sqrt()
#include <csignal> // for signal()
#include <iostream>
#include <xmmintrin.h> // for _mm_setcsr
void fpe_signal_handler(int /*signal*/) {
std::cerr << "Floating point exception!\n";
exit(1);
}
void enable_floating_point_exceptions() {
_mm_setcsr(_MM_MASK_MASK & ~_MM_MASK_INVALID);
signal(SIGFPE, fpe_signal_handler);
}
int main() {
const double x{-1.0};
std::cout << sqrt(x) << "\n";
enable_floating_point_exceptions();
std::cout << sqrt(x) << "\n";
}
Compiling with the apple-clang compiler provided by Xcode
clang++ -g -std=c++17 -o fpe fpe.cpp
and running gives the following expected output:
nan
Floating point exception!
I would like to write an analogous program that does the same thing as the above program on an M1 (arm64) Mac. I tried the following:
#include <cfenv> // for std::fenv_t
#include <cmath> // for sqrt()
#include <csignal> // for signal()
#include <fenv.h> // for fegetenv(), fesetenv()
#include <iostream>
void fpe_signal_handler(int /*signal*/) {
std::cerr << "Floating point exception!\n";
exit(1);
}
void enable_floating_point_exceptions() {
std::fenv_t env;
fegetenv(&env);
env.__fpcr = env.__fpcr | __fpcr_trap_invalid;
fesetenv(&env);
signal(SIGFPE, fpe_signal_handler);
}
int main() {
const double x{-1.0};
std::cout << sqrt(x) << "\n";
enable_floating_point_exceptions();
std::cout << sqrt(x) << "\n";
}
It almost works: After compiling with the apple-clang compiler provided by Xcode
clang++ -g -std=c++17 -o fpe fpe.cpp
I get the following output:
nan
zsh: illegal hardware instruction ./fpe
I tried adding the -fexceptions flag, but that didn't make a difference. I noticed that the ARM Compiler toolchain "does not support floating-point exception trapping for AArch64 targets," but I'm not sure if this applies to M1 Macs with Apple's toolchain.
Am I correct that the M1 Mac hardware just doesn't support floating-point exception trapping? Or is there a way to modify this program so it traps the second floating-point exception and then calls fpe_signal_handler()?
Synchronously testing for exceptions within the same thread does work fine, using ISO C fetestexcept from <fenv.h> as in the cppreference example. The problem here is getting FP exceptions to actually trap so the OS delivers SIGFPE, instead of just setting sticky flags in the FP environment.
Based on the long discussion below (many thanks to the patience of #Peter Cordes), it seems that with MacOS on Aarch64, unmasking FP exceptions and then generating the corresponding bad FP math results in a SIGILL rather than a SIGFPE. A signal code ILL_ILLTRP can be detected in the handler.
#include <fenv.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
static void
fpe_signal_handler( int sig, siginfo_t *sip, void *scp )
{
int fe_code = sip->si_code;
printf("In signal handler : ");
if (fe_code == ILL_ILLTRP)
printf("Illegal trap detected\n");
else
printf("Code detected : %d\n",fe_code);
abort();
}
void enable_floating_point_exceptions()
{
fenv_t env;
fegetenv(&env);
env.__fpcr = env.__fpcr | __fpcr_trap_invalid;
fesetenv(&env);
struct sigaction act;
act.sa_sigaction = fpe_signal_handler;
sigemptyset (&act.sa_mask);
act.sa_flags = SA_SIGINFO;
sigaction(SIGILL, &act, NULL);
}
void main()
{
const double x = -1;
printf("y = %f\n",sqrt(x));
enable_floating_point_exceptions();
printf("y = %f\n",sqrt(x));
}
This results in :
y = nan
In signal handler : Illegal trap detected
Abort trap: 6
Other floating point exceptions can be detected in a similar manner, e.g unmasking __fpcr_trap_divbyzero, and then generating double x=0; double y=1/x. If the exception is not unmasked then the program terminates normally.
Without a SIGFPE, however, it doesn't seem possible to detect exactly which floating point operation triggered the signal handler. One can imagine unmasking all exceptions (e.g. using FE_ALL_EXCEPT). Then, when a bad math op generates a SIGILL, we don't have enough information in the handler to know what operation sent the signal. Unmasking all exceptions and playing around a bit with fetestexcept in the handler didn't produce very reliable results. This might take some more investigation.
You can decode the nature of the FPU exception in the SIGILL handler by casting the third argument of your sigaction handler to a struct ucontext*, and then decoding scp->uc_mcontext->__es.esr in your handler. This is the value of the ESR_ELx, Exception Syndrome Register (ELx) register. If its top 6 bits (EC) are 0b101100, then the signal was triggered by a trapping AArch64 FPU operation. If in that case bit 23 of that register (TFV) is also 1, then the register's lower 7 bits will match what the lower 7 bits of the fpsr register would have been in case trapping would have been disabled (i.e., the kind of FPU exception triggered by the instruction).
See section D17.2.37 of the ARMv8 architecture manual for more information ("ESR_EL1, Exception Syndrome Register (EL1)").

How to force a crash in C, is dereferencing a null pointer a (fairly) portable way?

I'm writing my own test-runner for my current project. One feature (that's probably quite common with test-runners) is that every testcase is executed in a child process, so the test-runner can properly detect and report a crashing testcase.
I want to also test the test-runner itself, therefore one testcase has to force a crash. I know "crashing" is not covered by the C standard and just might happen as a result of undefined behavior. So this question is more about the behavior of real-world implementations.
My first attempt was to just dereference a null-pointer:
int c = *((int *)0);
This worked in a debug build on GNU/Linux and Windows, but failed to crash in a release build because the unused variable c was optimized out, so I added
printf("%d", c); // to prevent optimizing away the crash
and thought I was settled. However, trying my code with clang instead of gcc revealed a surprise during compilation:
[CC] obj/x86_64-pc-linux-gnu/release/src/test/test/test_s.o
src/test/test/test.c:34:13: warning: indirection of non-volatile null pointer
will be deleted, not trap [-Wnull-dereference]
int c = *((int *)0);
^~~~~~~~~~~
src/test/test/test.c:34:13: note: consider using __builtin_trap() or qualifying
pointer with 'volatile'
1 warning generated.
And indeed, the clang-compiled testcase didn't crash.
So, I followed the advice of the warning and now my testcase looks like this:
PT_TESTMETHOD(test_expected_crash)
{
PT_Test_expectCrash();
// crash intentionally
int *volatile nptr = 0;
int c = *nptr;
printf("%d", c); // to prevent optimizing away the crash
}
This solved my immediate problem, the testcase "works" (aka crashes) with both gcc and clang.
I guess because dereferencing the null pointer is undefined behavior, clang is free to compile my first code into something that doesn't crash. The volatile qualifier removes the ability to be sure at compile time that this really will dereference null.
Now my questions are:
Does this final code guarantee the null dereference actually happens at runtime?
Is dereferencing null indeed a fairly portable way for crashing on most platforms?
I wouldn't rely on that method as being robust if I were you.
Can't you use abort(), which is part of the C standard and is guaranteed to cause an abnormal program termination event?
The answer refering to abort() was great, I really didn't think of that and it's indeed a perfectly portable way of forcing an abnormal program termination.
Trying it with my code, I came across msvcrt (Microsoft's C runtime) implements abort() in a special chatty way, it outputs the following to stderr:
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
That's not so nice, at least it unnecessarily clutters the output of a complete test run. So I had a look at __builtin_trap() that's also referenced in clang's warning. It turns out this gives me exactly what I was looking for:
LLVM code generator translates __builtin_trap() to a trap instruction if it is supported by the target ISA. Otherwise, the builtin is translated into a call to abort.
It's also available in gcc starting with version 4.2.4:
This function causes the program to exit abnormally. GCC implements this function by using a target-dependent mechanism (such as intentionally executing an illegal instruction) or by calling abort.
As this does something similar to a real crash, I prefer it over a simple abort(). For the fallback, it's still an option trying to do your own illegal operation like the null pointer dereference, but just add a call to abort() in case the program somehow makes it there without crashing.
So, all in all, the solution looks like this, testing for a minimum GCC version and using the much more handy __has_builtin() macro provided by clang:
#undef HAVE_BUILTIN_TRAP
#ifdef __GNUC__
# define GCC_VERSION (__GNUC__ * 10000 \
+ __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__)
# if GCC_VERSION > 40203
# define HAVE_BUILTIN_TRAP
# endif
#else
# ifdef __has_builtin
# if __has_builtin(__builtin_trap)
# define HAVE_BUILTIN_TRAP
# endif
# endif
#endif
#ifdef HAVE_BUILTIN_TRAP
# define crashMe() __builtin_trap()
#else
# include <stdio.h>
# define crashMe() do { \
int *volatile iptr = 0; \
int i = *iptr; \
printf("%d", i); \
abort(); } while (0)
#endif
// [...]
PT_TESTMETHOD(test_expected_crash)
{
PT_Test_expectCrash();
// crash intentionally
crashMe();
}
you can write memory instead of reading it.
*((int *)0) = 0;
No, dereferencing a NULL pointer is not a portable way of crashing a program. It is undefined behavior, which means just that, you have no guarantees what will happen.
As it happen, for the most part under any of the three main OS's used today on desktop computers, that being MacOS, Linux and Windows NT (*) dereferencing a NULL pointer will immediately crash your program.
That said: "The worst possible result of undefined behavior is for it to do what you were expecting."
I purposely put a star beside Windows NT, because under Windows 95/98/ME, I can craft a program that has the following source:
int main()
{
int *pointer = NULL;
int i = *pointer;
return 0;
}
that will run without crashing. Compile it as a TINY mode .COM files under 16 bit DOS, and you'll be just fine.
Ditto running the same source with just about any C compiler under CP/M.
Ditto running that on some embedded systems. I've not tested it on an Arduino, but I would not want to bet either way on the outcome. I do know for certain that were a C compiler available for the 8051 systems I cut my teeth on, that program would run fine on those.
The program below should work. It might cause some collateral damage, though.
#include <string.h>
void crashme( char *str)
{
char *omg;
for(omg=strtok(str, "" ); omg ; omg=strtok(NULL, "") ) {
strcat(omg , "wtf");
}
*omg =0; // always NUL-terminate a NULL string !!!
}
int main(void)
{
char buff[20];
// crashme( "WTF" ); // works!
// crashme( NULL ); // works, too
crashme( buff ); // Maybe a bit too slow ...
return 0;
}

Returning From Catching A Floating Point Exception

So, I am trying to return from a floating point exception, but my code keeps looping instead. I can actually exit the process, but what I want to do is return and redo the calculation that causes the floating point error.
The reason the FPE occurs is because I have a random number generator that generates coefficients for a polynomial. Using some LAPACK functions, I solve for the roots and do some other things. Somewhere in this math intensive chain, a floating point exception occurs. When this happens, what I want to do is increment the random number generator state, and try again until the coefficients are such that the error doesn't materialize, as it usually doesn't, but very rarely does and causes catastrophic results.
So I wrote a simple test program to learn how to work with signals. It is below:
In exceptions.h
#ifndef EXCEPTIONS_H
#define EXCEPTIONS_H
#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <math.h>
#include <errno.h>
#include <float.h>
#include <fenv.h>
void overflow_handler(int);
#endif // EXCEPTIONS_H //
In exceptions.c
#include "exceptions.h"
void overflow_handler(int signal_number)
{
if (feclearexcept(FE_OVERFLOW | FE_UNDERFLOW | FE_DIVBYZERO | FE_INVALID)){
fprintf(stdout, "Nothing Cleared!\n");
}
else{
fprintf(stdout, "All Cleared!\n");
}
return;
}
In main.c
#include "exceptions.h"
int main(void)
{
int failure;
float oops;
//===Enable Exceptions===//
failure = 1;
failure = feenableexcept(FE_OVERFLOW | FE_UNDERFLOW | FE_DIVBYZERO | FE_INVALID);
if (failure){
fprintf(stdout, "FE ENABLE EXCEPTIONS FAILED!\n");
}
//===Create Error Handler===//
signal(SIGFPE, overflow_handler);
//===Raise Exception===//
oops = exp(-708.5);
fprintf(stdout, "Oops: %f\n", oops);
return 0;
}
The Makefile
#===General Variables===#
CC=gcc
CFLAGS=-Wall -Wextra -g3 -Ofast
#===The Rules===#
all: makeAll
makeAll: makeExceptions makeMain
$(CC) $(CFLAGS) exceptions.o main.o -o exceptions -ldl -lm
makeMain: main.c
$(CC) $(CFLAGS) -c main.c -o main.o
makeExceptions: exceptions.c exceptions.h
$(CC) $(CFLAGS) -c exceptions.c -o exceptions.o
.PHONY: clean
clean:
rm -f *~ *.o
Why doesn't this program terminate when I am clearing the exceptions, supposedly successfully? What do I have to do in order to return to the main, and exit?
If I can do this, I can put code in between returning and exiting, and do something after the FPE has been caught. I think that I will set some sort of flag, and then clear all most recent info in the data structures, redo the calculation etc based on whether or not that flag is set. The point is, the real program must not abort nor loop forever, but instead, must handle the exception and keep going.
Help?
"division by zero", overflow/underflow, etc. result in undefined behaviour in the first place. If the system, however, generates a signal for this, the effect of UB is "suspended". The signal handler takes over instead. But if the handler returns, the effect of UB will "resume".
Therefore, the standard disallows returning from such a situation.
Just think: How would the program have to recover from e.g. DIV0? The abstract machine has no idea about FPU registers or status flags, and even if - what result would have to be generated?
C also has no provisions to unroll the stack properly like C++.
Note also, that generating signals for arithmetic exceptions is optional, so there is no guarantee a signal will actually be generated. The handler is mostly meant to notify about the event and possibly clean up external resources.
Behaviour is different for signals which do not origin from undefined behaviour, but just interrupt program execution. This is well defined as the program state is well-defined.
Edit:
If you have to rely on the program to continue under all circumstances, you hae to check all arguments of arithmetic operations before doing the actual operation and/or use safe operations only (re-order, use larger intermediate types, etc.). One exaple for integers might be to use unsigned instead of signed integers, as for those overflow-behavior is well-defined (wrap), so intermediate results overflowing will not make trouble as long as that is corrected afterwards and the wrap is not too much. (Disclaimer: that does not always work, of course).
Update:
While I am still not completely sure, according to comments, the standard might allow, for a hosted environment at least, to use LIA-1 traps and to recover from them (see Annex H. As these are not necessarily precise, I suspect recovery is not possible under all circumstances. Also, math.h might present additional aspects which have to be carefully evaluated.
Finally: I still think there is nothing gained with such approach, but some uncertainty added compared to using safe algorithms. It would be different, if there wer not so much different components involved. For a bare-metal embedded system, the view might be completely different.
I think you're supposed to mess around with the calling stack frame if you want to skip an instruction or break out of exp or whatever. This is high voodoo and bound to be unportable.
The GNU C library lets you use setjmp() outside of a signal handler to which you can longjmp() from inside. This seems like a better way to go. Here is a self-contained modification of your program showing how to do it:
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <setjmp.h>
#include <math.h>
#include <errno.h>
#include <float.h>
#include <fenv.h>
sigjmp_buf oh_snap;
void overflow_handler(int signal_number) {
if (feclearexcept(FE_OVERFLOW | FE_UNDERFLOW | FE_DIVBYZERO | FE_INVALID)){
fprintf(stdout, "Nothing Cleared!\n");
}
else{
fprintf(stdout, "All Cleared!\n");
}
siglongjmp(oh_snap, 1);
return;
}
int main(void) {
int failure;
float oops;
failure = 1;
failure = feenableexcept(FE_OVERFLOW | FE_UNDERFLOW | FE_DIVBYZERO | FE_INVALID);
if (failure){
fprintf(stdout, "FE ENABLE EXCEPTIONS FAILED!\n");
}
signal(SIGFPE, overflow_handler);
if (sigsetjmp(oh_snap, 1)) {
printf("Oh snap!\n");
} else {
oops = exp(-708.5);
fprintf(stdout, "Oops: %f\n", oops);
}
return 0;
}

How to raise a float point exception

guys. I am doing some work around float point operations. The 0.1 is inexact represented by binary float point format. Thus I wrote down this
float i = 0.1f;
and expecting the inexact exception to arise. I turnned on the -fp-trap-all=all option, set fp-mode to be strict and installed SIGFPE signal handler in my code. But nothing happened. Then I tried
float i = 0.1f,j = 0.2f, c;
c = i + j;
still can not catch any exceptions! It drive my crazy.
Sorry to mention that I am using intel c++ compiler on Linux at last.
You have to test for exceptions yourself. The following code works for me:
#include <stdio.h>
#include <fenv.h>
#ifndef FE_INEXACT
# error No FP Exception handling!
#endif
int main()
{
double a = 4.0;
a /= 3.0;
if (fetestexcept(FE_INEXACT) & FE_INEXACT)
{
printf("Exception occurred\n");
}
else
{
printf("No exception.\n");
}
}
If you replace 4.0 by 3.0, you will not get the exception.
You can do something similar with double a = 0.0; a = sin(a);.
Trapping exceptions are only supported conditionally. To check, use the macros described in the documentation:
#define _GNU_SOURCE
#include <fenv.h>
#ifndef FE_NOMASK_ENV
# warning Cannot raise FP exceptions!
#else
# warning Trapping FP exceptions enabled.
feenableexcept(FE_INEXACT);
#endif
According to this answer, inexact exception is only raised if the rounded version of the float isn't the same as the mathematically exact amount. In your case, the rounded response is the same, so no exception raised.

Valgrind: Deliberately cause segfault

This is a mad-hack, but I am trying to deliberately cause a segfault at a particular point in execution, so valgrind will give me a stack trace.
If there is a better way to do this please tell me, but I would still be curious to know how to deliberaly cause a segfault, and why my attempt didn't work.
This is my failed attempt:
long* ptr = (long *)0xF0000000;
ptr = 10;
I thought valgrind should at least pick up on that as a invalid write, even if it's not a segmentation violation. Valgrind says nothing about it.
Any ideas why?
EDIT
Answer accepted, but I still have some up-votes for any suggestions for a more sane way to get a stack trace...
Just call abort(). It's not a segfault but it should generate a core dump.
Are you missing a * as in *ptr = 10? What you have won't compile.
If it does, somehow, that of course won't cause a seg-fault, since you're just assigning a number. Dereferencing might.
Assuming dereferencing null on your OS results in a segfault, the following should do the trick:
inline void seg_fault(void)
{
volatile int *p = reinterpret_cast<volatile int*>(0);
*p = 0x1337D00D;
}
Sorry for mentioning the obvious but why not use gdb with a breakbpoint and then use backtrace?
(gdb) b somewhere
(gdb) r
(gdb) bt
As mentioned in other answers you can just call abort() if you want to abnormally terminate your program entirely, or kill(getpid(), SIGSEGV) if it has to be a segmentation fault. This will generate a core file that you can use with gdb to get a stack trace or debug even if you are not running under valgrind.
Using a valgrind client request you can also have valgrind dump a stack trace with your own custom message and then continue executing. The client request does nothing when the program is not running under valgrind.
#include <valgrind/valgrind.h>
...
VALGRIND_PRINTF_BACKTRACE("Encountered the foobar problem, x=%d, y=%d\n", x, y);
Wouldn't it be better to send sig SEGV (11) to the process to force a core dump?
Are you on x86? If so then there is actually an opcode in the CPU that means "invoke whatever debugger may be attached". This is opcode CC, or more commonly called int 3. Easiest way to trigger it is with inline assembly:
inline void debugger_halt(void)
{
#ifdef MSVC
__asm int 3;
#elif defined(GCC)
asm("int 3");
#else
#pragma error Well, you'll have to figure out how to do inline assembly
in your compiler
#endif
}
MSVC also supports __debugbreak() which is a hardware break in unmanaged code and an MSIL "break" in managed code.

Resources