FORTRAN and C interoperability - can REAL(4)s be too small? - c

We've been having some weird crashes in some Intel FORTRAN code, and I eventually tracked the line down to:
L_F = EXP(-L_B2*L_BETASQ*L_DS)
Where the -L_B2*L_BETASQ*L_DS term evaluated to approximately -230. As it happens, EXP(-230) evaluates to about 1e-100. In all other known cases, L_DS is much smaller, resulting in the smallest (known) return from EXP being about 1e-50, which does not cause an error.
As soon as FORTRAN evaluates the clause EXP(-230), you get:
forrtl: severe (157): Program Exception - access violation
Image PC Routine Line Source
But no other information.
Exception 157 is generally concerned with interoperability, and you cannot debug into EXP in FORTRAN as it cannot find a particular .c file - which presumably means that EXP is implemented in C (which I find surprising).
My hypothesis is that FORTRAN has implemented EXP in C, but the interface cannot translate floats which are smaller than 1e-100 into REAL(4)s. As I had previously believed floats and REAL(4)s to be byte-wise identical, I cannot back this hypothesis up - and I cannot find anything anywhere about it.
Before I close this bug down, can anyone confirm or deny my hypothesis - or provide me with another?
Best regards,
Mike
EDIT: I'm going to mark this question as answered, as High Performance Mark has answered the immediate question.
My hypothesis is unfortunately incorrect - I've tried to trap the problem doing this:
L_ARG = L_B2*L_BETASQ*L_DS
IF (L_ARG .GT. 230.0) THEN
L_F = 0.0
ELSE
L_F = EXP(-L_ARG)
ENDIF
Unfortunately, the exception now (apparently) occurs in the L_ARG .GT. 230.0 clause. This either means that the debugging in Release mode is worse than I thought, or it's some sort of 'stored up' floating point error (see "Floating-point invalid operation" when inputting float to a stringstream).

Fortran hasn't (necessarily) implemented anything in C. The implementation of standard intrinsics is compiler-specific; it is common to find that implementations call libm or one of its relatives. From Intel's (or any other compiler writer's) point of view this makes sense, write one robust and fast implementation of exp in whatever language takes your fancy and call it from Fortran, C, Ada, COBOL, and all the other languages you've ever heard of. It may even be sensible to write it in C. Part of your hypothesis may, therefore, be correct.
However, unless you are explicitly writing C code and Fortran code and making a single binary from it there's not really any interoperability (in the Fortran standard sense) going on, all the dirty details of that are (or should be) hidden from you; the compiler ought to generate correct calls to whatever libraries it uses to implement exp and get the return values whatever they may be including NaNs and similar.
Certainly, the value of exp(-230) is 0.00000000 for a 4-byte real but I see no reason why a Fortran program which uses a library written in C should raise an access violation because it comes across those numbers. I think it is far more likely that you have an error elsewhere in your program, perhaps trying to access an array element outside the bounds of the array, and that your run-time fails to identify it at the right location in the source code. This is not uncommon.
EDIT
I wrote this stuff about interoperability before (re-)reading the question. Now that you've clarified that you are using interoperability features, it may be of interest or use ...
You can certainly not depend on your Fortran's real(4) and your C's float being identical; it is very likely but not certain. Most of the modern Fortran compilers (including Intel's) use kind type parameters which match the number of bytes in their representation, so the code 4 indicates that you are dealing with a 4-byte real which, on an IEEE-754 compliant processor, should be the same as a C float. The Fortran standards do not require any correspondence between those kind type parameters and the number of bytes used to represent a number. It's always worth checking your compiler documentation or doing some tests.
If you are concerned about interoperability you should probably be using Fortran's intrinsic features. For example if you
use :: iso_c_binding
you have available a number of constants including C_FLOAT which can be used like this:
real(C_FLOAT) :: a_fortran_float
If your Fortran compiler supports this then a_fortran_float should match a C float on a companion processor. This last term is left somewhat undefined, in practice compilers from the same stable seem always to be companions, for different stables sometimes yes sometimes no. The Intel Fortran and C and C++ compilers seem to be companions in the required sense. It would not surprise me to learn that the Intel Fortran and MS C++ compilers do not play nicely together.
My vague recollections of C include an uncertainty that float is standardised, in which case you can't be sure, without testing or reading the documentation, that you really do have a 4-byte IEEE single-precision floating-point number on that side of your interoperation either.

Related

Why do C variables maximum and minimum values touch?

I am working on a tutorial for binary numbers. Something I have wondered for a while is why all the integer maximum and minimum values touch. For example for an unsigned byte 255 + 1 = 0 and 0 - 1 = 255. I understand all the binary math that goes into it, but why was the decision made to have them work this way instead of a straight number line that gives an error when the extremes are breached?
Since your example is unsigned, I assume it's OK to limited the scope to unsigned types.
Allowing wrapping is useful. For example, it's what allows you (and the compiler) to always reorder (and constant-fold) a sequence of additions and subtractions. Even something such as x + 3 - 1 could not be optimized to x + 2 if the language requires trapping, because it changes the conditions under which the expression would trap. Wrapping also mixes better with bit manipulation, with the interpretation of an unsigned number as a vector of bits it makes very little sense if there's trapping. That applies especially to shifts, but addition, subtraction and even multiplication also make sense on bitvectors and combine usefully with the usual bitwise operations.
The algebraic structure you get when allowing wrapping, Z/2kZ, is fairly nice (perhaps not as nice as modulo a prime, but that would interact badly with the bitvector interpretation and it doesn't match typical hardware) and well known, so it's not like anything particularly unexpected or weird will happen, it's not like a wrapped result is a "uselessly arbitrary" result.
And of course testing the carry flag (or whatever may be required) after just about every operation has a big direct overhead as well.
Trapping on "unsigned overflow" is both expensive and undesirable, at least if it is the default behaviour.
Why not "give an error when the extremes are breached"?
Error handling is one of the hardest things in software development. When an error happens, there are many possible ways software could be required to do:
Show an annoying message to the user? Like Hey user, you have just tried to add 1 to this variable, which is already too big. Stop that! - there often is no context to show the user, that would be of any help.
Throw an exception? (BTW C has support for that) - that would show a stack trace, if you happened to execute your code in a debugger. Otherwise, it would just crash - not bad (it won't corrupt your files) but not good either (can be exploited as a denial of service attack).
Write it to a log file? - sometimes it's the best thing to do - record the error and move on, so it can be debugged later.
The right thing to do depends on your code. So a generic programming language like C doesn't want to restrict you by providing any mandatory behavior.
Instead, C provides two guidelines:
For unsigned types like unsigned int or uint8_t or (usually) char - it provides silent wraparound, for best performance.
For signed types like int - it provides "undefined behavior", which makes it possible to "choose", in a very limited way, what will happen on overflow
Throw an exception if using -ftrapv in gcc
Silent wraparound if using -fwrapv in gcc
By default (no fancy command-line options) - the compiler may assume it will never happen, which may help it produce optimized code
The idea here is that you (the programmer) should think where checking for overflow is worth doing, and how to recover from overflow (if the language provided a standard error handling mechanism, it would deny you the latter part). This approach has maximum flexibility, (potentially) maximum performance, and (usually) hardest to do - which fits the philosophy of C.

Why is undefined behaviour allowed in C

I have been messing around trying to learn C lately. Coming from Java, it surprised me that you can perform certain operations declared as "undefined".
This just seems extremely unsafe to me. I understand it is the programmer's responsibility not to perform undefined operations, but why is it even allowed to start with? Why does the compiler not catch, for instance, array indices out of bounds, or even dangling pointers? You just end up accessing blocks of memory you never should access, with no (apparent) good reason.
As a comparison, Java makes extra sure you don't do anything stupid, throwing Exceptions around like hot cakes.
Surely there must be a reason why this is allowed? What is it?
ANSWER: To my understanding, the main reason is performance. Also, Java does have undefined behaviours, although not labeled as such.
EDIT: restricted question to C
Undefined behavior is not allowed, it's just not caught by the compiler.
The tradeoff here is between the speed and the safety. Many kinds of undefined behavior could be prevented at the expense of a few additional CPU cycles.
For example, you could prevent UB that happens when you read from memory that has been allocated but not initialized by having the compiled code write zeros into it. This, however, costs you a whole additional write into a memory, which is entirely unnecessary.
Similarly, one could prevent reading/writing past the end of an array by checking its bounds inside [] operator. However, this would cost you a few additional CPU cycles on each array access.
C++ designers decided that it is better to have speed and allow potential UB than to force everyone pay for what they do not need. This approach, however, is incompatible with Java's "write once, run anywhere" requirement, so designers of Java language insisted on fully defined behavior in nearly all situations.
Originally, most forms of Undefined Behavior represented things which some implementations might trap, but other implementations might not. Because there was no way for the authors of the Standard to predict all the things a platform might do in case of a trap (including, literally, the possibility that a system would sound an alarm and lock up until an operator manually cleared the fault), the consequences of traps fell outside the jurisdiction of the C Standard, and thus almost every action for which some platform might conceivably cause a trap is--from the point of view of the Standard--considered "Undefined Behavior".
That should not be taken to imply that the authors of the Standard didn't believe implementations should try to behave sensibly for such things when practical. The authors of the C89 Standard noted, for example, that the majority of current systems of that era would define behavior for:
/* Assume USmall is half the size of "int" */
unsigned mult(USmall x, USmall y) { return x*y; }
which would in all cases, including those where the mathematical product of x and y was between INT_MAX+1 and UINT_MAX, be equivalent to (unsigned)x*y;. I see no reason to believe they wouldn't have expected that trend to continue.
Unfortunately, a new philosophy has become fashionable, based on the revisionist viewpoint that compiler writers only supported useful behaviors in cases not mandated by the Standard because they were too unsophisticated to do anything else. In gcc, for example, using optimization level 2 but no other non-default options, the above "mult" routine will sometimes generate bogus code in cases where the product would be between 0x80000000u and 0xFFFFFFFFu, even when running on platforms where such computations would historically have worked. This is supposedly being done in the name of "optimization"; it would be interesting to know how many of the "optimizations" such techniques end up performing are actually useful and could not have been achieved via safer means.
Historically, Undefined Behavior was a license for a C compiler to expose the behavior of the underlying platform; in cases where the underlying platform's behavior fit the programmer's needs, this allowed the programmer's requirements to be expressed in machine code more efficiently than if everything had to be done in ways defined by the Standard. Lately, however, it has been interpreted as license for compilers to implement behaviors which not only bear no relation to anything in the underlying platform nor to any plausible programmer expectations, but aren't even bound by laws of time and causality.
Java has a run time environment to take care of you. That's why an exception is thrown when going out of bounds - it's something that can't be figured out at compile time.
There is run time bounds checking in C++ when using the at() method for a vector. It's what distinguishes at() from the []operator

Are there well-known "profiles" of the C standard?

I write C code that makes certain assumptions about the implementation, such as:
char is 8 bits.
signed integral types are two's complement.
>> on signed integers sign-extends.
integer division rounds negative quotients towards zero.
double is IEEE-754 doubles and can be type-punned to and from uint64_t with the expected result.
comparisons involving NaN always evaluate to false.
a null pointer is all zero bits.
all data pointers have the same representation, and can be converted to size_t and back again without information loss.
pointer arithmetic on char* is the same as ordinary arithmetic on size_t.
functions pointers can be cast to void* and back again without information loss.
Now, all of these are things that the C standard doesn't guarantee, so strictly speaking my code is non-portable. However, they happen to be true on the architectures and ABIs I'm currently targeting, and after careful consideration I've decided that the risk they will fail to hold on some architecture that I'll need to target in the future is acceptably low compared to the pragmatic benefits I derive from making the assumptions now.
The question is: how do I best document this decision? Many of my assumptions are made by practically everyone (non-octet chars? or sign-magnitude integers? on a future, commercially successful, architecture?). Others are more arguable -- the most risky probably being the one about function pointers. But if I just list everything I assume beyond what the standard gives me, the reader's eyes are just going to glaze over, and he may not notice the ones that actually matter.
So, is there some well-known set of assumptions about being a "somewhat orthodox" architecture that I can incorporate by reference, and then only document explicitly where I go beyond even that? (Effectively such a "profile" would define a new language that is a superset of C, but it might not acknowledge that in so many words -- and it may not be a pragmatically useful way to think of it either).
Clarification: I'm looking for a shorthand way to document my choices, not for a way to test automatically whether a given compiler matches my expectations. The latter is obviously useful too, but does not solve everything. For example, if a business partner contacts us saying, "we're making a device based on Google's new G2015 chip; will your software run on it?" -- then it would be nice to be able to answer "we haven't worked with that arch yet, but it shouldn't be a problem if it has a C compiler that satisfies such-and-such".
Clarify even more since somebody has voted to close as "not constructive": I'm not looking for discussion here, just for pointers to actual, existing, formal documents that can simplify my documentation by being incorporated by reference.
I would introduce a STATIC_ASSERT macro and put all your assumptions in such asserts.
Unfortunately, not only is there a lack of standards for a dialect of C that combines the extensions which have emerged as de facto standards during the 1990s (two's-complement, universally-ranked pointers, etc.) but compilers trends are moving in the opposite direction. Given the following requirements for a function:
* Accept int parameters x,y,z:
* Return 0 if x-y is computable as "int" and is less than Z
* Return 1 if x-y is computable as "int" and is not less than Z
* Return 0 or 1 if x-y is not computable */
The vast majority of compilers in the 1990s would have allowed:
int diffCompare(int x, int y, int z)
{ return (x-y) >= z; }
On some platforms, in cases where the difference between x-y was not computable as int, it would be faster to compute a "wrapped" two's-complement value of x-y and compare that, while on others it would be faster to perform the calculation using a type larger than int and compare that. By the late 1990s, however, nearly every C compiler would implement the above code to use one of whichever one of those approaches would have been more efficient on its hardware platform.
Since 2010, however, compiler writers seem to have taken the attitude that if computations overflow, compilers shouldn't perform the calculations in whatever fashion is normal for their platform and let what happens happens, nor should they recognizably trap (which would break some code, but could prevent certain kinds of errant program behavior), but instead they should overflows as an excuse to negate laws of time and causality. Consequently, even if a programmer would have been perfectly happy with any behavior a 1990s compiler would have produced, the programmer must replace the code with something like:
{ return ((long)x-y) >= z; }
which would greatly reduce efficiency on many platforms, or
{ return x+(INT_MAX+1U)-y >= z+(INT_MAX+1U); }
which requires specifying a bunch of calculations the programmer doesn't actually want in the hopes that the optimizer will omit them (using signed comparison to make them unnecessary), and would reduce efficiency on a number of platforms (especially DSPs) where the form using (long) would have been more efficient.
It would be helpful if there were standard profiles which would allow programmers to avoid the need for nasty horrible kludges like the above using INT_MAX+1U, but if trends continue they will become more and more necessary.
Most compiler documentation includes a section that describes the specific behavior of implementation-dependent features. Can you point to that section of the gcc or msvc docs to describe your assumptions?
You can write a header file "document.h" where you collect all your assumptions.
Then, in every file that you know that non-standard assumptions are made, you can #include such a file.
Perhaps "document.h" would not have real sentences at all, but only commented text and some macros.
// [T] DOCUMENT.H
//
#ifndef DOCUMENT_H
#define DOCUMENT_H
// [S] 1. Basic assumptions.
//
// If this file is included in a compilation unit it means that
// the following assumptions are made:
// [1] A char has 8 bits.
// [#]
#define MY_CHARBITSIZE 8
// [2] IEEE 754 doubles are addopted for type: double.
// ........
// [S] 2. Detailed information
//
#endif
The tags in brackets: [T] [S] [#] [1] [2] stand for:
* [T]: Document Title
* [S]: Section
* [#]: Print the following (non-commented) lines as a code-block.
* [1], [2]: Numbered items of a list.
Now, the idea here is to use the file "document.h" in a different way:
To parse the file in order to convert the comments in "document.h" to some printable document, or some basic HTML.
Thus, the tags [T] [S] [#] etc., are intended to be interpreted by a parser that convert any comment into an HTML line of text (for example), and generate <h1></h1>, <b></b> (or whatever you want), when a tag appears.
If you keep the parser as a simple and small program, this can give you a short hand to handle this kind of documentation.

optimizing with IEEE floating point - guaranteed mathematical identities?

I am having some trouble with IEEE floating point rules preventing compiler optimizations that seem obvious. For example,
char foo(float x) {
if (x == x)
return 1;
else
return 0;
}
cannot be optimized to just return 1 because NaN == NaN is false. Okay, fine, I guess.
However, I want to write such that the optimizer can actually fix stuff up for me. Are there mathematical identities that hold for all floats? For example, I would be willing to write !(x - x) if it meant the compiler could assume that it held all the time (though that also isn't the case).
I see some reference to such identities on the web, for example here, but I haven't found any organized information, including in a light scan of the IEEE 754 standard.
It'd also be fine if I could get the optimizer to assume isnormal(x) without generating additional code (in gcc or clang).
Clearly I'm not actually going to write (x == x) in my source code, but I have a function that's designed for inlining. The function may be declared as foo(float x, float y), but often x is 0, or y is 0, or x and y are both z, etc. The floats represent onscreen geometric coordinates. These are all cases where if I were coding by hand without use of the function I'd never distinguish between 0 and (x - x), I'd just hand-optimize stupid stuff away. So, I really don't care about the IEEE rules in what the compiler does after inlining my function, and I'd just as soon have the compiler ignore them. Rounding differences are also not very important since we're basically doing onscreen drawing.
I don't think -ffast-math is an option for me, because the function appears in a header file, and it is not appropriate that the .c files that use the function compile with -ffast-math.
Another reference that might be of some use for you is a really nice article on floating-point optimization in Game Programming Gems volume 2, by Yossarian King. You can read the article here. It discusses the IEEE format in quite detail, taking into account implementations and architecture, and provides many optimization tricks.
I think that you are always going to struggle to make computer floating-point-number arithmetic behave like mathematical real-number arithmetic, and suggest that you don't for any reason. I suggest that you are making a type error trying to compare the equality of 2 fp numbers. Since fp numbers are, in the overwhelming majority, approximations, you should accept this and use approximate-equality as your test.
Computer integers exist for equality testing of numerical values.
Well, that's what I think, you go ahead and fight the machine (well, all the machines actually) if you wish.
Now, to answer some parts of your question:
-- for every mathematical identity you are familiar with from real-number arithmetic, there are counter examples in the domain of floating-point numbers, whether IEEE or otherwise;
-- 'clever' programming almost always makes it more difficult for a compiler to optimise code than straightforward programming;
-- it seems that you are doing some graphics programming: in the end the coordinates of points in your conceptual space are going to be mapped to pixels on a screen; pixels always have integer coordinates; your translation from conceptual space to screen space defines your approximate-equality function
Regards
Mark
If you can assume that floating-point numbers used in this module will not be Inf/NaN, you can compile it with -ffinite-math-only (in GCC). This may "improve" the codegen for examples like the one you posted.
You could compare for bitwise equality. Although you might get bitten for some values that are equivalent but bitwise different, it will catch all those cases where you have a true equality as you mentioned. And I am not sure the compiler will recognize what you do and remove it when inlining (which I believe is what you are after), but that can easily be checked.
What happened when you tried it the obvious way and profiled it? or examined the generated asm?
If the function is inlined with values known at the call site, the optimizer has this information available. For example: foo(0, y).
You may be surprised at the work you don't have to do, but at the very least profiling or looking at what the compiler actually does with the code will give you more information and help you figure out where to proceed next.
That said, if you know certain things that the optimizer can't figure out itself, you can write multiple versions of the function, and specify the one you want to call. This is something of a hassle, but at least with inline functions they will all be specified together in one header. It's also quite a bit easier than the next step, which is using inline asm to do exactly what you want.

initialize a variable statically (at compile time)

1) I've got many constants in my C algo.
2) my code works both in floating-point and fixed-point.
Right now, these constants are initialized by a function, float2fixed, whereby in floating-point it does nothing, while in fixed-point, it finds their fixed-point representation. For instance, 0.5f stays 0.5f if working in floating-point, whereas it uses the pow() routine and becomes 32768 if working in fixed-point and the fixed-point representation is Qx.16.
That's easy to maintain, but it takes a lot of time actually to compute these constants in fixed-point (pow is a floatin-point function). In C++, I'd use some meta-programming, so the compiler computes these values at compile-time, so there's no hit at run-time. But in C, thats not possible. Or is it? Anybody knows of such a trick? Is any compiler clever enough to do that?
Looking forward to any answers.
A
Rather than using (unsigned)(x*pow(2,16)) to do your fixed point conversion, write it as (unsigned)(0.5f * (1 << 16))
This should be an acceptable as a compile-time constant expression since it involves only builtin operators.
When using fixed-point, can you write a program that takes your floating point values and converts them into correct, constant initializers for the fixed point type, so you effectively add a step to the compilation that generates the fixed point values.
One advantage of this will be that you can then define and declare your constants with const so that they won't change at run-time - whereas with the initialization functions, of course, the values have to be modifiable because they are calculated once.
I mean write a simple program that can scan for formulaic lines that might read:
const double somename = 3.14159;
it would read that and generate:
const fixedpoint_t somename = { ...whatever is needed... };
You design the operation to make it easy to manage for both notations - so maybe your converter always reads the file and sometimes rewrites it.
datafile.c: datafile.constants converter
converter datafile.constants > datafile.c
In plain C, there's not much you can do. You need to do the conversion at some point, and the compiler doesn't give you any access to call interesting user-provided functions at compile time. Theoretically, you could try to coax the preprocessor to do it for you, but that's the quick road to total insanity (i.e. you'd have to implement pow() in macros, which is pretty hideous).
Some options I can think of:
Maintain a persistent cache on disk. At least then it'd only be slow once, though you still have to load it, make sure it's not corrupt, etc.
As mentioned in another comment, use template metaprogramming anyway and compile with a C++ compiler. Most C works just fine (arguably better) with a C++ compiler.
Hmm, I guess that's about all I can think of. Good luck.
Recent versions of GCC ( around 4.3 ) added the ability to use GMP and MPFR to do some compile-time optimisations by evaluating more complex functions that are constant. That approach leaves your code simple and portable, and trust the compiler to do the heavy lifting.
Of course, there are limits to what it can do, and it would be hard to know if it's optimizing a given instance without going and looking at the assembly. But it might be worth checking out. Here's a link to the description in the changelog

Resources