I'm fairly new to Lua. While testing I discovered #INF/#IND. However, I can't find a good reference that explains it.
What are #INF, #IND, and similar (such as negatives) and how do you generate and use them?
#INF is infinite, #IND is NaN. Give it a test:
print(1/0)
print(0/0)
Output on my Windows machine:
1.#INF
-1.#IND
As there's no standard representation for these in ANSI C, you may get different result. For instance:
inf
-nan
Expanding #YuHao already good answer.
Lua does little when converting a number to a string, since it heavily relies on the underlying C library implementation. In fact Lua print implementation calls Lua tostring which in turn (after a series of other calls) uses the lua_number2str macro, which is defined in terms of C sprintf. Thus in the end you see whatever representation for infinities and NaNs the C implementation uses (this may vary according to which compiler was used to compile Lua and which C runtime your application is linked to).
#YuHao has already answered what the OP has effectively asked: what does +/-1.#INF (+-inf) and -1.#IND (nan) mean.
What I want to do here is just to add some value to the question/answer by expanding on to deal with -- to check -- them (which I just needed and learned to):
inf (+/-1.#INF) are the highest number(+/-) that Lua can represent, and the Language provides such value(s) for you through math.huge. So you can test if a number is +/-INF:
local function isINF(value)
return value == math.huge or value == -math.huge
end
nan (-1.#IND) is something that can not be handled numerically, and the result of any operation involving it is also Not-a-number(*). Long-story-short... if a number is a NaN, comparing it against itself will (always) be False. The function below implements the simplest way for checking if a NaN:
local function isNAN(value)
return value ~= value
end
(*): This (NaN) is formally defined some where in the IEEE754 standard (floating-point numbers).
Related
Is it possible to use real numbers as iterators and array indices when compiling with gfortran? Here is some example code:
program test
real i
real testarray(5)
testarray = 0.
do i=1,5
write(*,*) testarray(i)
end do
end program
I want to run some code that I did not write. It compiles fine with the intel compiler on windows, but I want to compile and run it in linux with the gfortran compiler. I'm currently getting errors using real numbers as array indices and do loop iterators.
Thanks!
Why would you want to use real numbers as array and loop indices?
If you need to use the real value of the index, do something like:
program test
integer i
real testarray(5)
testarray = 0.
do i=1,5
testarray(i) = REAL(i)
end do
end program
And of course you could go the other direction if you needed to,
integer j
do j = 1, INTEGER(testarray(1))
...
end do
for example. The standard doesn't allow non-integer indices. They don't make sense either -- what is the 1.5 index in your array?
It appears that the real array indexing is an extension that should be possible if you compile with --std=gnu. But support for that may not always be there as it is not part of the standard.
If you don't want to see the warnings, then try --std=legacy. Otherwise "gnu", as already suggested. The gfortran manual states:
As an extension, GNU Fortran allows the use of REAL expressions or
variables as array indices.
and
The default value for std is ‘gnu’, which specifies a superset of the
Fortran 95 standard that includes all of the extensions supported by
GNU Fortran, although warnings will be given for obsolete extensions
not recommended for use in new code. The ‘legacy’ value is equivalent
but without the warnings for obsolete extensions, and may be useful
for old non-standard programs.
Using real variables as loop indices was deleted from the language standard with Fortran 95. Because of the amount of legacy code that uses this, it is likely to remain in compilers for decades.
Another possibility is to implement this as a function or subroutine. The user experience would be similar tab(x) loohs like an array or like a function, but would allow more control (for example you can check if x is within eps of some value of x0 for which you have defined a value).
In general the idea seems dangerous due to rounding errors.
If you are working on rational numbers or let say srqt's of integer numers, then it is again ideal case when f(x) as a function applies (with x bein e.g. a derived type that contains numerator and denominator).
So my final answer is: write it as a function.
We've been having some weird crashes in some Intel FORTRAN code, and I eventually tracked the line down to:
L_F = EXP(-L_B2*L_BETASQ*L_DS)
Where the -L_B2*L_BETASQ*L_DS term evaluated to approximately -230. As it happens, EXP(-230) evaluates to about 1e-100. In all other known cases, L_DS is much smaller, resulting in the smallest (known) return from EXP being about 1e-50, which does not cause an error.
As soon as FORTRAN evaluates the clause EXP(-230), you get:
forrtl: severe (157): Program Exception - access violation
Image PC Routine Line Source
But no other information.
Exception 157 is generally concerned with interoperability, and you cannot debug into EXP in FORTRAN as it cannot find a particular .c file - which presumably means that EXP is implemented in C (which I find surprising).
My hypothesis is that FORTRAN has implemented EXP in C, but the interface cannot translate floats which are smaller than 1e-100 into REAL(4)s. As I had previously believed floats and REAL(4)s to be byte-wise identical, I cannot back this hypothesis up - and I cannot find anything anywhere about it.
Before I close this bug down, can anyone confirm or deny my hypothesis - or provide me with another?
Best regards,
Mike
EDIT: I'm going to mark this question as answered, as High Performance Mark has answered the immediate question.
My hypothesis is unfortunately incorrect - I've tried to trap the problem doing this:
L_ARG = L_B2*L_BETASQ*L_DS
IF (L_ARG .GT. 230.0) THEN
L_F = 0.0
ELSE
L_F = EXP(-L_ARG)
ENDIF
Unfortunately, the exception now (apparently) occurs in the L_ARG .GT. 230.0 clause. This either means that the debugging in Release mode is worse than I thought, or it's some sort of 'stored up' floating point error (see "Floating-point invalid operation" when inputting float to a stringstream).
Fortran hasn't (necessarily) implemented anything in C. The implementation of standard intrinsics is compiler-specific; it is common to find that implementations call libm or one of its relatives. From Intel's (or any other compiler writer's) point of view this makes sense, write one robust and fast implementation of exp in whatever language takes your fancy and call it from Fortran, C, Ada, COBOL, and all the other languages you've ever heard of. It may even be sensible to write it in C. Part of your hypothesis may, therefore, be correct.
However, unless you are explicitly writing C code and Fortran code and making a single binary from it there's not really any interoperability (in the Fortran standard sense) going on, all the dirty details of that are (or should be) hidden from you; the compiler ought to generate correct calls to whatever libraries it uses to implement exp and get the return values whatever they may be including NaNs and similar.
Certainly, the value of exp(-230) is 0.00000000 for a 4-byte real but I see no reason why a Fortran program which uses a library written in C should raise an access violation because it comes across those numbers. I think it is far more likely that you have an error elsewhere in your program, perhaps trying to access an array element outside the bounds of the array, and that your run-time fails to identify it at the right location in the source code. This is not uncommon.
EDIT
I wrote this stuff about interoperability before (re-)reading the question. Now that you've clarified that you are using interoperability features, it may be of interest or use ...
You can certainly not depend on your Fortran's real(4) and your C's float being identical; it is very likely but not certain. Most of the modern Fortran compilers (including Intel's) use kind type parameters which match the number of bytes in their representation, so the code 4 indicates that you are dealing with a 4-byte real which, on an IEEE-754 compliant processor, should be the same as a C float. The Fortran standards do not require any correspondence between those kind type parameters and the number of bytes used to represent a number. It's always worth checking your compiler documentation or doing some tests.
If you are concerned about interoperability you should probably be using Fortran's intrinsic features. For example if you
use :: iso_c_binding
you have available a number of constants including C_FLOAT which can be used like this:
real(C_FLOAT) :: a_fortran_float
If your Fortran compiler supports this then a_fortran_float should match a C float on a companion processor. This last term is left somewhat undefined, in practice compilers from the same stable seem always to be companions, for different stables sometimes yes sometimes no. The Intel Fortran and C and C++ compilers seem to be companions in the required sense. It would not surprise me to learn that the Intel Fortran and MS C++ compilers do not play nicely together.
My vague recollections of C include an uncertainty that float is standardised, in which case you can't be sure, without testing or reading the documentation, that you really do have a 4-byte IEEE single-precision floating-point number on that side of your interoperation either.
I write C code that makes certain assumptions about the implementation, such as:
char is 8 bits.
signed integral types are two's complement.
>> on signed integers sign-extends.
integer division rounds negative quotients towards zero.
double is IEEE-754 doubles and can be type-punned to and from uint64_t with the expected result.
comparisons involving NaN always evaluate to false.
a null pointer is all zero bits.
all data pointers have the same representation, and can be converted to size_t and back again without information loss.
pointer arithmetic on char* is the same as ordinary arithmetic on size_t.
functions pointers can be cast to void* and back again without information loss.
Now, all of these are things that the C standard doesn't guarantee, so strictly speaking my code is non-portable. However, they happen to be true on the architectures and ABIs I'm currently targeting, and after careful consideration I've decided that the risk they will fail to hold on some architecture that I'll need to target in the future is acceptably low compared to the pragmatic benefits I derive from making the assumptions now.
The question is: how do I best document this decision? Many of my assumptions are made by practically everyone (non-octet chars? or sign-magnitude integers? on a future, commercially successful, architecture?). Others are more arguable -- the most risky probably being the one about function pointers. But if I just list everything I assume beyond what the standard gives me, the reader's eyes are just going to glaze over, and he may not notice the ones that actually matter.
So, is there some well-known set of assumptions about being a "somewhat orthodox" architecture that I can incorporate by reference, and then only document explicitly where I go beyond even that? (Effectively such a "profile" would define a new language that is a superset of C, but it might not acknowledge that in so many words -- and it may not be a pragmatically useful way to think of it either).
Clarification: I'm looking for a shorthand way to document my choices, not for a way to test automatically whether a given compiler matches my expectations. The latter is obviously useful too, but does not solve everything. For example, if a business partner contacts us saying, "we're making a device based on Google's new G2015 chip; will your software run on it?" -- then it would be nice to be able to answer "we haven't worked with that arch yet, but it shouldn't be a problem if it has a C compiler that satisfies such-and-such".
Clarify even more since somebody has voted to close as "not constructive": I'm not looking for discussion here, just for pointers to actual, existing, formal documents that can simplify my documentation by being incorporated by reference.
I would introduce a STATIC_ASSERT macro and put all your assumptions in such asserts.
Unfortunately, not only is there a lack of standards for a dialect of C that combines the extensions which have emerged as de facto standards during the 1990s (two's-complement, universally-ranked pointers, etc.) but compilers trends are moving in the opposite direction. Given the following requirements for a function:
* Accept int parameters x,y,z:
* Return 0 if x-y is computable as "int" and is less than Z
* Return 1 if x-y is computable as "int" and is not less than Z
* Return 0 or 1 if x-y is not computable */
The vast majority of compilers in the 1990s would have allowed:
int diffCompare(int x, int y, int z)
{ return (x-y) >= z; }
On some platforms, in cases where the difference between x-y was not computable as int, it would be faster to compute a "wrapped" two's-complement value of x-y and compare that, while on others it would be faster to perform the calculation using a type larger than int and compare that. By the late 1990s, however, nearly every C compiler would implement the above code to use one of whichever one of those approaches would have been more efficient on its hardware platform.
Since 2010, however, compiler writers seem to have taken the attitude that if computations overflow, compilers shouldn't perform the calculations in whatever fashion is normal for their platform and let what happens happens, nor should they recognizably trap (which would break some code, but could prevent certain kinds of errant program behavior), but instead they should overflows as an excuse to negate laws of time and causality. Consequently, even if a programmer would have been perfectly happy with any behavior a 1990s compiler would have produced, the programmer must replace the code with something like:
{ return ((long)x-y) >= z; }
which would greatly reduce efficiency on many platforms, or
{ return x+(INT_MAX+1U)-y >= z+(INT_MAX+1U); }
which requires specifying a bunch of calculations the programmer doesn't actually want in the hopes that the optimizer will omit them (using signed comparison to make them unnecessary), and would reduce efficiency on a number of platforms (especially DSPs) where the form using (long) would have been more efficient.
It would be helpful if there were standard profiles which would allow programmers to avoid the need for nasty horrible kludges like the above using INT_MAX+1U, but if trends continue they will become more and more necessary.
Most compiler documentation includes a section that describes the specific behavior of implementation-dependent features. Can you point to that section of the gcc or msvc docs to describe your assumptions?
You can write a header file "document.h" where you collect all your assumptions.
Then, in every file that you know that non-standard assumptions are made, you can #include such a file.
Perhaps "document.h" would not have real sentences at all, but only commented text and some macros.
// [T] DOCUMENT.H
//
#ifndef DOCUMENT_H
#define DOCUMENT_H
// [S] 1. Basic assumptions.
//
// If this file is included in a compilation unit it means that
// the following assumptions are made:
// [1] A char has 8 bits.
// [#]
#define MY_CHARBITSIZE 8
// [2] IEEE 754 doubles are addopted for type: double.
// ........
// [S] 2. Detailed information
//
#endif
The tags in brackets: [T] [S] [#] [1] [2] stand for:
* [T]: Document Title
* [S]: Section
* [#]: Print the following (non-commented) lines as a code-block.
* [1], [2]: Numbered items of a list.
Now, the idea here is to use the file "document.h" in a different way:
To parse the file in order to convert the comments in "document.h" to some printable document, or some basic HTML.
Thus, the tags [T] [S] [#] etc., are intended to be interpreted by a parser that convert any comment into an HTML line of text (for example), and generate <h1></h1>, <b></b> (or whatever you want), when a tag appears.
If you keep the parser as a simple and small program, this can give you a short hand to handle this kind of documentation.
I am having some trouble with IEEE floating point rules preventing compiler optimizations that seem obvious. For example,
char foo(float x) {
if (x == x)
return 1;
else
return 0;
}
cannot be optimized to just return 1 because NaN == NaN is false. Okay, fine, I guess.
However, I want to write such that the optimizer can actually fix stuff up for me. Are there mathematical identities that hold for all floats? For example, I would be willing to write !(x - x) if it meant the compiler could assume that it held all the time (though that also isn't the case).
I see some reference to such identities on the web, for example here, but I haven't found any organized information, including in a light scan of the IEEE 754 standard.
It'd also be fine if I could get the optimizer to assume isnormal(x) without generating additional code (in gcc or clang).
Clearly I'm not actually going to write (x == x) in my source code, but I have a function that's designed for inlining. The function may be declared as foo(float x, float y), but often x is 0, or y is 0, or x and y are both z, etc. The floats represent onscreen geometric coordinates. These are all cases where if I were coding by hand without use of the function I'd never distinguish between 0 and (x - x), I'd just hand-optimize stupid stuff away. So, I really don't care about the IEEE rules in what the compiler does after inlining my function, and I'd just as soon have the compiler ignore them. Rounding differences are also not very important since we're basically doing onscreen drawing.
I don't think -ffast-math is an option for me, because the function appears in a header file, and it is not appropriate that the .c files that use the function compile with -ffast-math.
Another reference that might be of some use for you is a really nice article on floating-point optimization in Game Programming Gems volume 2, by Yossarian King. You can read the article here. It discusses the IEEE format in quite detail, taking into account implementations and architecture, and provides many optimization tricks.
I think that you are always going to struggle to make computer floating-point-number arithmetic behave like mathematical real-number arithmetic, and suggest that you don't for any reason. I suggest that you are making a type error trying to compare the equality of 2 fp numbers. Since fp numbers are, in the overwhelming majority, approximations, you should accept this and use approximate-equality as your test.
Computer integers exist for equality testing of numerical values.
Well, that's what I think, you go ahead and fight the machine (well, all the machines actually) if you wish.
Now, to answer some parts of your question:
-- for every mathematical identity you are familiar with from real-number arithmetic, there are counter examples in the domain of floating-point numbers, whether IEEE or otherwise;
-- 'clever' programming almost always makes it more difficult for a compiler to optimise code than straightforward programming;
-- it seems that you are doing some graphics programming: in the end the coordinates of points in your conceptual space are going to be mapped to pixels on a screen; pixels always have integer coordinates; your translation from conceptual space to screen space defines your approximate-equality function
Regards
Mark
If you can assume that floating-point numbers used in this module will not be Inf/NaN, you can compile it with -ffinite-math-only (in GCC). This may "improve" the codegen for examples like the one you posted.
You could compare for bitwise equality. Although you might get bitten for some values that are equivalent but bitwise different, it will catch all those cases where you have a true equality as you mentioned. And I am not sure the compiler will recognize what you do and remove it when inlining (which I believe is what you are after), but that can easily be checked.
What happened when you tried it the obvious way and profiled it? or examined the generated asm?
If the function is inlined with values known at the call site, the optimizer has this information available. For example: foo(0, y).
You may be surprised at the work you don't have to do, but at the very least profiling or looking at what the compiler actually does with the code will give you more information and help you figure out where to proceed next.
That said, if you know certain things that the optimizer can't figure out itself, you can write multiple versions of the function, and specify the one you want to call. This is something of a hassle, but at least with inline functions they will all be specified together in one header. It's also quite a bit easier than the next step, which is using inline asm to do exactly what you want.
I'm writing a little library where you can set a range; start and end points are doubles. The library has some build-in or calculated default values for that range, but once they are set by the range setting function, there is no way to go back to the default value.
Hence what I like to do is to use the NaN value as the indicator to use the default value, but I haven't found any standard definition of NaN, and reading the gcc manual it says that there are platforms that don't support NaN.
My questions are:
Are there any recent platforms that don't use IEEE 754 floating point numbers? I don't care about some obscured embedded devices, because the lib focuses on platforms with GUI, to be accurate cairo.
And the second question would you use the NaN value as an argument for such a purpose? I have no problem with defining it some where in the header.
NaN is not equal to any number, not even to itself. Hence, using it as an indicator will lead to convoluted code or even bugs. I would not use it in this way.
I would not use a NaN for this purpose - beyond the issue of just which NaN to use (and there are many), it would be better to add a function call API to reset to the defaults.
NaNs are kind of weird to deal with in code, and I certainly wouldn't like a library to use them purposes which they are not made for.
Edit: Another problem that I just thought of is that if a calculation results in NaN, and it is passed as the argument, you will get unintended behavior. For example:
MyFunc(SomeCalculation()); //if SomeCalculation() is assumed to not be NaN,
//this will cause unintended behavior