Lower Bounds For Floating Points - c

Are there any lower bounds for floating point types in C? Like there are lower bounds for integral types (int being at least 16 bits)?

Yes. float.h contains constants such as:
FLT_EPSILON, DBL_EPSILON, LDBL_EPSILON this is the least magnitude non-zero value which can be represented by float, double, and long double representations.
FLT_MAX and FLT_MIN represent the extreme positive and negative numbers which can be represented for float. Similar DBL_ and LDBL_ are available.
FLT_DIG, DBL_DIG, LDBL_DIG are defined as the number of decimal digits precision.
You are asking for either the xxx_MIN or the xxx_EPSILON value.
Along these lines, here is a question wherein I posted some code which displays the internals of a 64-bit IEEE-754 floating-point number.

To be strict and grounded:
ISO/IEC 9899:TC2: (WG14/N1124m May 6, 2005):
5.2.4.2.2, Characteristics of floating types <float.h>

float.h contains many macros describing various properties of the floating types (including FLT_MIN and DBL_MIN).
The description of the requirements of the limits infloat.h is given in the standard (C90 or C99 - 5.2.4.2.2 "Characteristics of floating types").
In particular, according to the standard any implementation must support a lower-bound of at least 1E-37 for float or double. But an implementation is free to do better than that (and indicate what it does in FLT_MIN and DBL_MIN).
See this question for information on where to get a copy of the standards documents if you need one:
Where do I find the current C or C++ standard documents?

Maybe this helps: float.h reference (it is C++, I'm not sure if it applies to plain C as well)

This Draft C99 standard (PDF) notes minimum values for floating point type precision in section 5.2.4.2.2.
(Found via Wikipedia on C99.)

A useful reference here is What Every Computer Scientist Should Know About Floating-Point Arithmetic.
The nature of a floating point number — its size, precision, limits — is really defined by the hardware, rather than the programming language. A single-precision float on an x86 is the same in C, C#, Java, and any other practical programming language. (The exception is esoteric programming languages that implement odd widths of floating point number in software.)

Excerpts from the Standard draft (n1401.pdf)
Annex F
(normative)
IEC 60559 floating-point arithmetic
F.1 Introduction
1 ... An implementation that defines _ _STDC_IEC_559_ _ shall conform to
the specifications in this annex. ...
F.2 Types
1 The C floating types match the IEC 60559 formats as follows:
-- The float type matches the IEC 60559 single format.
-- The double type matches the IEC 60559 double format.
-- The long double type matches an IEC 60559 extended format ...
Wikipedia has an article about IEC 559 (or rather IEEE 754-1985) you might find interesting.

Related

Does C99 assume that subnormal numbers are supported?

Does C99 assume that subnormal numbers are supported?
From:
the presence of FP_SUBNORMAL classification macro
the fact that in IEEE 754 support of subnormal numbers is required
I make the conclusion that subnormal numbers are supported in C99. Is this conclusion correct?
Does C99 assume that subnormal numbers are supported?
No. 5.2.4.2.2. The language defines a model of a floating point number. Then the language defines what is a subnormal floating point within that model. Then an interface is established how to detect and work with subnormal floating point numbers and how are they handled in corner cases - I mean, when exceptions are raised and when not.
It does not mean, that the underlying architecture uses this model to represent floating point numbers. The intention is to write the standard in an abstract way, trying to provide an interface without requiring how it should be implemented. Note 16:
The floating-point model is intended to clarify the description of each floating-point characteristic and does not require the floating-point arithmetic of the implementation to be identical.
If the implementation implements Annex F, then the floating types match the formats described in IEC 60559, so it will have subnormal numbers. This is recommended practice, but optional, detected with a macro - there is no requirement.
the presence of FP_SUBNORMAL classification macro
There may be more FP_[A-Z]* macros provided by implementation for additional "kinds of floating point values".
the fact that in IEEE 754 support of subnormal numbers is required
But C does not require IEEE 754 support.

Is the floating-point literal “.1” the same as “0.1” in C?

In the source text of a C program, do .1 and 0.1 have the same value?
.1 represents one-tenth, the same as 0.1 does. However, due to a lack of strictness in the C standard, .1 and 0.1 do not necessarily convert to the same internal value, per C 2018 6.4.4.2 5. They will be equal in all compilers of reasonable quality. (6.4.4.2 5 says “All floating constants of the same source form shall convert to the same internal format with the same value.” Footnote 77 gives examples of source forms that have the same mathematical values but that do not necessarily convert to the same internal value.)
Floating-point constants in source text are converted to an internal format. Most commonly, a binary-based format is used. Most decimal numerals, including .1, are not exactly representable in binary floating-point. So, when they are converted, the result is rounded (in binary) to a representable value. In typical C implementations, .1 becomes 0.1000000000000000055511151231257827021181583404541015625.
All good compilers will convert .1 and 0.1 to the same value. The reason the C standard is lax about this is that other floating-point literals, involving exponents or many digits, were difficult (in some sense) to convert to binary floating-point with ideal rounding. Historically, there were C implementations that fudged the conversions. The C standard accommodated these implementations by not making strict requirements about handling of floating-point values. (Today, good algorithms are known, and any good compiler ought to convert a floating-point literal to the nearest representable value, with ties to the even low digit, unless the user requests otherwise.)
So, the C standard does not guarantee that .1 and 0.1 have the same value. However, in practice, they will.
Eric's answer is correct if you're just talking about the baseline C standard, which makes basically no guarantees about floating point; 1.0==42.0 is a valid implementation choice. But this is not very helpful.
If you want any reasonable floating point behavior in C, you want an implementation that supports Annex F (the alignment of IEEE floating point semantics with C), an optional part of the standard. You can tell if your implementation supports (or claims to support) Annex F by checking for the predefined macro __STDC_IEC_559__.
Assuming Annex F, the interpretation of floating point literals is not up for grabs, and .1 and 0.1 will necessarily be the same.

Guaranteed precision of sqrt function in C/C++

Everyone knows sqrt function from math.h/cmath in C/C++ - it returns square root of its argument. Of course, it has to do it with some error, because not every number can be stored precisely. But am I guaranteed that the result has some precision? For example, 'it's the best approximation of square root that can be represented in the floating point type usedorif you calculate square of the result, it will be as close to initial argument as possible using the floating point type given`?
Does C/C++ standard have something about it?
For C99, there are no specific requirements. But most implementations try to support Annex F: IEC 60559 floating-point arithmetic as good as possible. It says:
An implementation that defines __STDC_IEC_559__ shall conform to the specifications in this annex.
And:
The sqrt functions in <math.h> provide the IEC 60559 square root operation.
IEC 60559 (equivalent to IEEE 754) says about basic operations like sqrt:
Except for binary <-> decimal conversion, each of the operations shall be performed as if it first produced an intermediate result correct to infinite precision and with unbounded range, and then coerced this intermediate result to fit in the destination's format.
The final step consists of rounding according to several rounding modes but the result must always be the closest representable value in the target precision.
This question was already answered here as Chris Dodd noticed in the comments section. In short: it's not guaranteed by C++ standard, but IEEE-754 standard guarantees me that the result will be as close to the 'real result' as possible, i.e. error will be less than or equal to 1/2 unit-in-the-last-place. In particular, if the result can be precisely stored, it should be.

How to check that IEEE 754 single-precision (32-bit) floating-point representation is used?

I want to test the following things on my target board:
Is 'float' implemented with IEEE 754 single-precision (32-bit) floating-point variable?
Is 'double' implemented with IEEE 754 double-precision (64-bit) floating-point variable?
What are the ways in which i can test it with a simple C program.
No simple test exists.
The overwhelming majority of systems today use IEEE-754 formats for floating-point. However, most C implementations do not fully conform to IEEE 754 (which is identical to IEC 60559) and do not set the preprocessor identifier __STDC_IEC_559__. In the absence of this identifier, the only way to determine whether a C implementation conforms to IEEE 754 is one or a combination of:
Read its documentation.
Examine its source code.
Test it (which is, of course, difficult when only exhaustive testing can be conclusive).
In many C implementations and software applications, the deviations from IEEE 754 can be ignored or worked around: You may write code as if IEEE 754 were in use, and much code will largely work. However, there are a variety of things that can trip up an unsuspecting programmer; writing completely correct floating-point code is difficult even when the full specification is obeyed.
Common deviations include:
Intermediate arithmetic is performed with more precision than the nominal type. E.g., expressions that use double values may be calculated with long double precision.
sqrt does not return a correctly rounded value in every case.
Other math library routines return values that may be slightly off (a few ULP) from the correctly rounded results. (In fact, nobody has implemented all the math routines recommended in IEEE 754-2008 with both guaranteed correct rounding and guaranteed bound run time.)
Subnormal numbers (tiny numbers near the edge of the floating-point format) may be converted to zero instead of handled as specified by IEEE 754.
Conversions between decimal numerals (e.g., 3.1415926535897932384626433 in the source code) and binary floating-point formats (e.g., the common double format, IEEE-754 64-bit binary) do not always round correctly, in either conversion direction.
Only round-to-nearest mode is supported; the other rounding modes specified in IEEE 754 are not supported. Or they may be available for simple arithmetic but require using machine-specific assembly language to access. Standard math libraries (cos, log, et cetera) rarely support other rounding modes.
In C99, you can check for __STDC_IEC_559__:
#ifdef __STDC_IEC_559__
/* using IEEE-754 */
#endif
This is because the international floating point standard referenced by C99 is IEC 60559:989 (IEC 559 and IEEE-754 was a previous description). The mapping from the C language to IEC 60559 is optional, but if in use, the implementation defines the macro __STDC_IEC_559__ (Appendix F of the C99 standard), so you can totally rely on that.
Another alternative is to manually check if the values in float.h, such as FLT_MAX, FLT_EPSILON, FLT_MAX_10_EXP, etc, match with the IEEE-754 limits, although theoretically there could be another representation with the same values.
First of all, you can find the details about the ISO/IEC/IEEE 60559 (or IEEE 754) in Wikipedia:
Floating point standard types
As F. Goncalvez has told you, the macro __STDC_IEC_559__ brings you information about your compiler, if it conform IEEE 754 or not.
In what follows, we
However, you can obtain additional information with the macro FLT_EVAL_METHOD.
The value of this macro means:
0 All operations and constants are evaluated in the range and precision of the type used.
1 The operations of types float and double are evaluated in the range and precision of double, and long double goes in your own way...
2 The evaluations of all types are done in the precision and range of long double.
-1 Indeterminate
Other negative values: Implementation defined (it depends on your compiler).
For example, if FLT_EVAL_METHOD == 2, and you hold the result of several calculations in a floating point variable x, then all operations and constants are calculated or processed in the best precition, that is, long double, but only the final result is rounded to the type that x has.
This behaviour reduces the immpact of numerical errors.
In order to know details about the floating point types, you have to watch the constant macros provided by the standard header <float.h>.
For example, see this link:
Çharacteristics of floating point types
In the sad case that your implementation does not conform to the IEEE 754 standard, you can try looking for details in the standard header <float.h>, if it exists.
Also, you have to read the documentation of your compiler.
For example, the compiler GCC explains what does with floating point:
Stadus of C99 features in GCC
No, Standard C18, p. 373 specifies that IEC 60559 is used for float, double...
Why do you think IEEE 754 is used?

How are floating point literals in C interpreted?

In a C program, when you write a floating point literal like 3.14159 is there standard interpretation or is it compiler or architecture dependent? Java is exceedingly clear about how floating point strings are interpreted, but when I read K&R or other C documentation the issue seems swept under the rug.
It is architecture dependent.
That generally means IEEE 754, but not necessarily.
The C standard (ISO 9899:1999) discusses this mainly in section 5.2.4.2.2 'Characteristics of floating types'.
From the C99 standard, section 6.4.4.2 Floating constants, paragraph 3 (emphasis mine):
The significand part is interpreted as a (decimal or hexadecimal) rational number; the
digit sequence in the exponent part is interpreted as a decimal integer. For decimal
floating constants, the exponent indicates the power of 10 by which the significand part is
to be scaled. For hexadecimal floating constants, the exponent indicates the power of 2
by which the significand part is to be scaled. For decimal floating constants, and also for
hexadecimal floating constants when FLT_RADIX is not a power of 2, the result is either
the nearest representable value, or the larger or smaller representable value immediately
adjacent to the nearest representable value, chosen in an implementation-defined manner.
For hexadecimal floating constants when FLT_RADIX is a power of 2, the result is
correctly rounded.
So, you're going to get a constant within one ULP in an implementation-defined manner. Recall that implementation-defined means that the implementation (in this case, the C runtime) can choose any of the options, but that choice must be documented. So, you can consult libc runtime documentation to find out how the rounding occurs.
You are not clear if you mean floating point literal as part of the source code (for the compiler to parse into architecture-dependent binary representation), or scanned by library functions, such as scanf(), atof(), strtol(), strtod() and strtold() (at run-time, to convert to in-memory float, double or long double value).
In the first case, it is part of ISO/IEC 9899:1999 (ISO C99), §6.4.4.2 "Floating constants". It defines both the lexicon and how it should be interpreted.
In the second case, the behavior of the library functions are defined in §7.20.1 "Numeric conversion functions".
I don't have a hard copy of the previous standard (ANSI C, 1989), but I'm pretty sure it also defines very precisely how floating point numbers are parsed and converted.
In the case you want to know if there is a standard to represent these values in binary format, in-memory, the answer is no. The C language is intended to be close to the architecture, and not impose constraints over it. So the in-memory representation is always architecture-dependent. But the C standard defines how arithmetic should be performed over floating point values. It follows IEC 60559 standard. In the ISO C99 standard, it is described in Annex F (normative), "IEC 60559 floating-point arithmetic". The implementation may or may not implement this standard. If it does, it must define the __STDC_IEC_559__ preprocessor name.

Resources