What is the "char-sequence" argument to NaN generating functions for? - c

Aside from the NAN macro, C99 has two ways to generate a NaN value for a floating point number, the nanf(const char *tagp) function and strtof("NAN(char-sequence)").
Both of these methods of generating a NaN take an optional string argument (*tagp in nanf() and the char-sequence in the strtof method). What exactly does this string argument do? I haven't been able to find any concrete examples of how you'd use it. From cppreference.com we have:
The call nan("string") is equivalent to the call strtod("NAN(string)", (char**)NULL);
The call nan("") is equivalent to the call strtod("NAN()", (char**)NULL);
The call nan(NULL) is equivalent to the call strtod("NAN", (char**)NULL);
And nan(3) says:
These functions return a representation (determined by tagp) of a quiet NaN. [snip]
The argument tagp is used in an unspecified manner. On IEEE 754 systems, there are many representations of NaN, and
tagp selects one.
This doesn't really tell me what I can use for the tagp string or why I'd ever want to use it. Is there a list anywhere of the valid options for this tag string, and what would the reason be to use one over the default nanf(NULL)?

Tl;Dr : tagp argument gives you the ability to have different NAN values.
This is from man page for nan(3) which gives a little more information on tagp.
Mainly:
The nan() functions return a quiet NaN, whose trailing fraction field
contains the result of converting
tagp to an unsigned integer.
This gives you the ability to have different NAN values.
Specifically from the C99 Rationale doc:
Other applications of NaNs may prove useful. Available parts of NaNs
have been used to encode auxiliary information, for example about the
NaN’s origin. Signaling NaNs might be candidates for filling
uninitialized storage; and their available parts could distinguish
uninitialized floating objects. IEC 60559 signaling NaNs and trap
handlers potentially provide hooks for maintaining diagnostic
information or for implementing special arithmetics.
There is an implementation of it here. Mind you, this may or may not be standard conforming, as noted in comments. However, it should give you an idea of what tagp is used for.
As in the man page, you can see the replacement mentioned above:
nan("1") = nan (7ff8000000000001)
nan("2") = nan (7ff8000000000002)
Full man page here:
NAN(3) BSD Library Functions Manual
NAN(3)
NAME
nan -- generate a quiet NaN
SYNOPSIS
#include
double
nan(const char *tagp);
long double
nanl(const char *tagp);
float
nanf(const char *tagp);
DESCRIPTION
The nan() functions return a quiet NaN, whose trailing fraction field contains the result of converting
tagp to an unsigned integer. If tagp is too large to be contained in the trailing fraction field of the
NaN, then the least significant bits of the integer represented by tagp are used.
SPECIAL VALUES
If tagp contains any non-numeric characters, the function returns a NaN whos trailing fraction field is
zero.
If tagp is empty, the function returns a NaN whos trailing fraction field is zero.
STANDARDS
The nan() functions conform to ISO/IEC 9899:2011.
BSD July 01, 2008

This doesn't really tell me what I can use for the tagp string or why I'd ever want to use it.
Some floating point standards, e.g., IEEE-754, have multiple different NaN values. This nan function specification allows an implementation to select a specific NaN representation depending on the string on a way that might be specified by the implementation.

Related

How is NaN saved during run time?

I had a small function where at one point I divided by 0 and created my first NaN. After looking on the internet I did find out the NaN - not a number and NaN != NaN.
My questions are:
During run time how is NaN saved or how does the controller know that a variable has the NaN value?(I am working with small micro controllers(c language), the mechanism is different in programs that are running on a pc(c# and other OOP languages))?
Inf is similar to NaN?
In C, the types of values are determined statically by your source code. For named objects (“variables”), you explicitly declare the types. For constants, the syntax of them (e.g., 3 versus 3.) determines the type. In typical C implementations that compile to machine code on common processors, the processors have different instructions for working with integers and floating-point. The compiler uses integer instructions for integers and floating-point instructions for floating-point values. The floating-point instructions are designed in hardware to work with encodings of floating-point values.
In IEEE-754 binary floating-point, floating-point data is encoded with a sign bit, an exponent field, and a significand field. If the exponent field is all ones and the significand field is not all zeros, the datum represents a NaN. In common modern processors, this is built into the hardware.
Infinity is not largely similar to a NaN. They might both be considered special in that they are not normal numbers and are processed somewhat differently from normal numbers. However, in IEEE-754 arithmetic, infinity is a number and participates in arithmetic. NaN is not a number.

Is there a standard C way to print floating-point values "perfectly" a la Dragon4?

Reading Here be dragons: advances in problems you didn’t even know you had I've noticed that they compare the new algorithm with the one used in glibc's printf:
Grisu3 is about 5 times faster than the algorithm used by printf in GNU libc
But at the same time I've failed to find any format specifier for printf which would automatically find the best number of decimal places to print. All I tried either have some strange defaults like 6 digits after decimal point for %f or 2 after point for %g or 6 after point for %e.
How do I actually make use of that algorithm implementation in glibc, mentioned in the article? Is there really any such implementation in glibc and is it even discussed in any way by the Standard?
This is the actual article. The blog post is referring to the results in section 7 (in other words, “they” are not comparing anything in the blog post, “they” are regurgitating the information from the actual article, omitting crucial details):
Implementations of Dragon4 or Grisu3 can be found in implementations of modern programming languages that specify this “minimal number of decimal digits” fashion (I recommend you avoid calling it “perfect”). Java uses this type of conversion to decimal in some contexts, as does Ruby. C is not one of the languages that specify “minimal number of decimal digits” conversion to decimal, so there is no reason for a compiler or for a libc to provide an implementation for Dragon4 or Grisu3.
There is no such thing as "best number of decimal places" because floating point numbers are not stored as decimal numbers. So you need to define what you mean by "best". If you want to print the numbers without any possible loss of information C11 gives you the format specifier %a (except for non-normalized floating point numbers where the behavior is unspecified).
The defaults from the C11 standard are 6 digits for %f and %e and for %g it is:
Let P equal the
precision if nonzero, 6 if the precision is omitted, or 1 if the precision is zero.
Then, if a conversion with style E would have an exponent of X:
— if P > X ≥ −4, the conversion is with style f (or F) and precision
P − (X + 1).
— otherwise, the conversion is with style e (or E) and precision P − 1.
If you want to use that algorithm, implement your own function for it. Or hope that glibc have implemented it in the past 5 years. Or just rethink if the performance of printing floating point numbers is really a problem you have.

Overflow vs Inf

When I enter a number greater than max double in Matlab that is approximately 1.79769e+308, for example 10^309, it returns Inf. For educational purposes, I want to get overflow exception like C compilers that return an overflow error message, not Inf. My questions are:
Is Inf an overflow exception?
If is, why C compilers don't return Inf?
If not, can I get an overflow exception in Matlab?
Is there any difference between Inf and an overflow exception at all?
Also I don't want check Inf in Matlab and then throw an exception with error() function.
1) Floating-points in C/C++
Operations on floating-point numbers can produce results that are not numerical values. Examples:
the result of an operation is a complex number (think sqrt(-1.0))
the result of an operation is undefined (think 1.0 / 0.0)
the result of an operation is too large to be represented
an operation is performed where one of the operands is already NaN or Inf
The philosophy of IEEE754 is to not trap such exceptions by default, but to produce special values (Inf and NaN), and allow computation to continue normally without interrupting the program. It is up to the user to test for such results and treat them separately (like isinf and isnan functions in MATLAB).
There exist two types of NaN values: NaN (Quiet NaN) and sNaN (Signaling NaN). Normally all arithmetic operations of floating-point numbers will produce the quiet type (not the signaling type) when the operation cannot be successfully completed.
There are (platform-dependent) functions to control the floating-point environment and catch FP exceptions:
Win32 API has _control87() to control the FPU flags.
POSIX/Linux systems typically handle FP exception by trapping the SIGFPE signal (see feenableexcept).
SunOS/Solaris has its own functions as well (see chapter 4 in Numerical Computation Guide by Sun/Oracle)
C99/C++11 introduced the fenv header with functions that control the floating-point exception flags.
For instance, check out how Python implements the FP exception control module for different platforms: https://hg.python.org/cpython/file/tip/Modules/fpectlmodule.c
2) Integers in C/C++
This is obviously completely different from floating-points, since integer types cannot represent Inf or NaN:
unsigned integers use modular arithmetic (so values wrap-around if the result exceeds the largest integer). This means that the result of an unsigned arithmetic operation is always "mathematically defined" and never overflows. Compare this to MATLAB which uses saturation arithmetic for integers (uint8(200) + uint8(200) will be uint8(255)).
signed integer overflow on the other hand is undefined behavior.
integer division by zero is undefined behavior.
Floating Point
MATLAB implements the IEEE Standard 754 for floating point operations.
This standard has five defined exceptions:
Invalid Operation
Division by Zero
Overflow
Underflow
Inexact
As noted by the GNU C Library, these exceptions are indicated by a status word but do not terminate the program.
Instead, an exception-dependent default value is returned; the value may be an actual number or a special value Special values in MATLAB are Inf, -Inf, NaN, and -0; these MATLAB symbols are used in place of the official standard's reserved binary representations for readability and usability (a bit of nice syntactic sugar).
Operations on the special values are well-defined and operate in an intuitive way.
With this information in hand, the answers to the questions are:
Inf means that an operation was performed that raised one of the above exceptions (namely, 1, 2, or 3), and Inf was determined to be the default return value.
Depending on how the C program is written, what compiler is being used, and what hardware is present, INFINITY and NaN are special values that can be returned by a C operation. It depends on if-and-how the IEEE-754 standard was implemented. The C99 has IEEE-754 implementation as part of the standard, but it is ultimately up to the compiler on how the implementation works (this can be complicated by aggressive optimizations and standard options like rounding modes).
A return value of Inf or -Inf indicates that an Overflow exception may have happened, but it could also be an Invalid Operation or Division by Zero. I don't think MATLAB will tell you which it is (though maybe you have access to that information via compiled MEX files, but I'm unfamiliar with those).
See answer 1.
For more fun and in-depth examples, here is a nice PDF.
Integers
Integers do not behave as above in MATLAB.
If an operation on an integer of a specified bit size will exceed the maximum value of that class, it will be set to the maximum value and vice versa for negatives (if signed).
In other words, MATLAB integers do not wrap.
I'm going to repeat an answer by Jan Simon from the "MATLAB Answers" website:
For stopping (in debugger mode) on division-by-zero, use:
warning on MATLAB:divideByZero
dbstop if warning MATLAB:divideByZero
Similarly for stopping on taking the logarithm of zero:
warning on MATLAB:log:LogOfZero
dbstop if warning MATLAB:log:LogOfZero
and for stopping when an operation (a function call or an assignment) returns either NaN or Inf, use:
dbstop if naninf
Unfortunately the first two warnings seems to be no longer supported, although the last option still works for me on R2014a and is in fact documented.

Compilation platform taking FPU rounding mode into account in printing, conversions

EDIT: I had made a mistake during the debugging session that lead me to ask this question. The differences I was seeing were in fact in printing a double and in parsing a double (strtod). Stephen's answer still covers my question very well even after this rectification, so I think I will leave the question alone in case it is useful to someone.
Some (most) C compilation platforms I have access to do not take the FPU rounding mode into account when
converting a 64-bit integer to double;
printing a double.
Nothing very exotic here: Mac OS X Leopard, various recent Linuxes and BSD variants, Windows.
On the other hand, Mac OS X Snow Leopard seems to take the rounding mode into account when doing these two things. Of course, having different behaviors annoys me no end.
Here are typical snippets for the two cases:
#if defined(__OpenBSD__) || defined(__NetBSD__)
# include <ieeefp.h>
# define FE_UPWARD FP_RP
# define fesetround(RM) fpsetround(RM)
#else
# include <fenv.h>
#endif
#include <float.h>
#include <math.h>
fesetround(FE_UPWARD);
...
double f;
long long b = 2000000001;
b = b*b;
f = b;
...
printf("%f\n", 0.1);
My questions are:
Is there something non-ugly that I can do to normalize the behavior across all platforms? Some hidden setting to tell the platforms that take rounding mode into account not to or vice versa?
Is one of the behaviors standard?
What am I likely to encounter when the FPU rounding mode is not used? Round towards zero? Round to nearest? Please, tell me that there is only one alternative :)
Regarding 2. I found the place in the standard where it is said that floats converted to integers are always truncated (rounded towards zero) but I couldn't find anything for the integer -> float direction.
If you have not set the rounding mode, it should be the IEEE-754 default mode, which is round-to-nearest.
For conversions from integer to float, the C standard says (§6.3.1.4):
When a value of integer type is
converted to a real floating type, if
the value being converted can be
represented exactly in the new type,
it is unchanged. If the value being
converted is in the range of values
that can be represented but cannot be
represented exactly, the result is
either the nearest higher or nearest
lower representable value, chosen in
an implementation-defined manner. If
the value being converted is outside
the range of values that can be
represented, the behavior is
undefined.
So both behaviors conform to the C standard.
The C standard says (§F.5) that conversions between IEC60559 floating point formats and character sequences be correctly rounded as per the IEEE-754 standard. For non-IEC60559 formats, this is recommended, but not required. The 1985 IEEE-754 standard says (clause 5.4):
Conversions shall be correctly rounded
as specified in Section 4 for operands
lying within the ranges specified in
Table 3. Otherwise, for rounding to
nearest, the error in the converted
result shall not exceed by more than
0.47 units in the destination's least significant digit the error that is
incurred by the rounding
specifications of Section 4, provided
that exponent over/underflow does not
occur. In the directed rounding modes
the error shall have the correct sign
and shall not exceed 1.47 units in the
last place.
What section (4) actually says is that the operation shall occur according to the prevailing rounding mode. I.e. if you change the rounding mode, IEEE-754 says that the result of float->string conversion should change accordingly. Ditto for integer->float conversions.
The 2008 revision of the IEEE-754 standard says (clause 4.3):
The rounding-direction attribute
affects all computational operations
that might be inexact. Inexact numeric
floating-point results always have the
same sign as the unrounded result.
Both conversions are defined to be computational operations in clause 5, so again they should be performed according to the prevailing rounding mode.
I would argue that Snow Leopard has the correct behavior here (assuming that it is correctly rounding the results according to the prevailing rounding mode). If you want to force the old behavior, you can always wrap your printf calls in code that changes the rounding mode, I suppose, though that's clearly not ideal.
Alternatively, you could use the %a format specifier (hexadecimal floating point) on C99 compliant platforms. Since the result of this conversion is always exact, it will never be effected by the prevailing rounding mode. I don't think that the Windows C library supports %a, but you could probably port the BSD or glibc implementation easily enough if you need it.

NaN as a special argument

I'm writing a little library where you can set a range; start and end points are doubles. The library has some build-in or calculated default values for that range, but once they are set by the range setting function, there is no way to go back to the default value.
Hence what I like to do is to use the NaN value as the indicator to use the default value, but I haven't found any standard definition of NaN, and reading the gcc manual it says that there are platforms that don't support NaN.
My questions are:
Are there any recent platforms that don't use IEEE 754 floating point numbers? I don't care about some obscured embedded devices, because the lib focuses on platforms with GUI, to be accurate cairo.
And the second question would you use the NaN value as an argument for such a purpose? I have no problem with defining it some where in the header.
NaN is not equal to any number, not even to itself. Hence, using it as an indicator will lead to convoluted code or even bugs. I would not use it in this way.
I would not use a NaN for this purpose - beyond the issue of just which NaN to use (and there are many), it would be better to add a function call API to reset to the defaults.
NaNs are kind of weird to deal with in code, and I certainly wouldn't like a library to use them purposes which they are not made for.
Edit: Another problem that I just thought of is that if a calculation results in NaN, and it is passed as the argument, you will get unintended behavior. For example:
MyFunc(SomeCalculation()); //if SomeCalculation() is assumed to not be NaN,
//this will cause unintended behavior

Resources