How to guarantee exact size of double in C? - c

So, I am aware that types from the stdint.h header provide standardized width integer types, however I am wondering what type or method does one uses to guarantee the size of a double or other floating point type across platforms? Specifically, this would deal with packing data in a void*
#include <stdio.h>
#include <stdlib.h>
void write_double(void* buf, double num)
{
*(double*)buf = num;
}
double read_double(void* buf)
{
return *(double*)buf;
}
int main(void) {
void* buffer = malloc(sizeof(double));
write_double(buffer, 55);
printf("The double is %f\n", read_double(buffer));
return 0;
}
Say like in the above program, if I wrote that void* to a file or if it was used on another system, would there be some standard way to guarantee size of a floating point type or double?

How to guarantee exact size of double in C?
Use _Static_assert()
#include <limits.h>
int main(void) {
_Static_assert(sizeof (double)*CHAR_BIT == 64, "Unexpected double size");
return 0;
}
_Static_assert available since C11. Otherwise code could use a run-time assert.
#include <assert.h>
#include <limits.h>
int main(void) {
assert(sizeof (double)*CHAR_BIT == 64);
return 0;
}
Although this will insure the size of a double is 64, it does not insure IEEE 754 double-precision binary floating-point format adherence.
Code could use __STDC_IEC_559__
An implementation that defines __STDC_IEC_559__ shall conform to the specifications in this annex` C11 Annex F IEC 60559 floating-point arithmetic
Yet that may be too strict. Many implementations adhere to most of that standard, yet still do no set the macro.
would there be some standard way to guarantee size of a floating point type or double?
The best guaranteed is to write the FP value as its hex representation or as an exponential with sufficient decimal digits. See Printf width specifier to maintain precision of floating-point value

The problem with floating point type is that the C standard doesn't specify how they should be represented. The use of IEEE 754 is not required.
If you're communicating between a system that uses IEEE 754 and one that doesn't, you won't be able to write on one and read on the other even if the sizes are the same.
You need to serialize the data in a known format. You can either use sprintf to convert it to a text format, or you can do some math to determine the base and mantissa and store those.

Floating point values are defined in the The IEEE Standard for Floating-Point Arithmetic (IEEE 754) and have standard sizes:
float, in full "single precision floating point number": 32 bits
double, in full "double precision floating point number": 64 bits
The following also exist:
Half-precision floating-point format
Quadruple precision floating-point format
Extended precision floating-point format
This format is reused in the C11 standard, Annex F "IEC 60559 floating-point arithmetic" of ISO/IEC 9899:2011(en).

Why use CHAR_BIT and assert at runtime? We can do this at compile time.
void write_double(void* buf, double num)
{
char checkdoublesize[(sizeof(double) == 8)?1:-1];
*(double*)buf = num;
}
Your code is still undefined as it doesn't gurantee IEEE or endianness but it will catch a bad double size. If your platform's new enough for htonq this will allow endianness to work
void write_double(void* buf, double num)
{
char checkdoublesize[(sizeof(double) == 8)?1:-1];
*(int64_t*)buf = htonq(*(volatile int64_t*)&num);
}
double read_double(void* buf)
{
int64_t n = ntohq(*(int64_t*)buf);
return *(volatile double*)&n;
}
Where volatile is merely the shortest way to tell the compiler the pointer cast really is defined. Usually it does the right thing anyway but after N levels of inlining maybe it won't anymore.

Related

long double in fabs, range and overflow errors

At wiki.sei.cmu.edu, they claim the following code is error-free for out-of-range floating-point errors during assignment; I've narrowed it down to the long double case:
Compliant Solution (Narrowing Conversion)
This compliant solution checks whether the values to be stored can be represented in the new type:
#include <float.h>
#include <math.h>
void func(double d_a, long double big_d) {
double d_b;
// ...
if (big_d != 0.0 &&
(isnan(big_d) ||
isgreater(fabs(big_d), DBL_MAX) ||
isless(fabs(big_d), DBL_MIN))) {
/* Handle error */
} else {
d_b = (double)big_d;
}
}
Unless I'm missing something, the declaration of fabs according to the C99 and C11 standards is double fabs(double x), which means it takes a double, so this code isn't compliant, and instead long double fabsl(long double x) should be used.
Further, I believe isgreater and isless should be declared as taking a long double as their first parameters (since that's what fabsl returns).
#include <stdio.h>
#include <math.h>
int main(void)
{
long double ld = 1.12345e506L;
printf("%lg\n", fabs(ld)); // UB: ld is outside the range of double (~ 1e308)
printf("%Lg\n", fabsl(ld)); // OK
return 0;
}
On my machine, this produces the following output:
inf
1.12345e+506
along with a warning (GCC):
warning: conversion from 'long double' to 'double' may change value [-Wfloat-conversion]
printf("%lg\n", fabs(ld));
^~
Am I therefore correct in saying their code results in undefined behavior?
On p. 211 of the C99 standard there's a footnote that reads:
Particularly on systems with wide expression evaluation, a <math.h> function might pass arguments
and return values in wider format than the synopsis prototype indicates.
and on some systems long double has the exact same value range, representation, etc. as double, but this doesn't mean the code above is portable.
Now I have a related question here, and I'd just like to ask for confirmation (I've read through dozens of questions and answers here, but I'm still a little confused because they often deal with specific examples and specific types, not all of them are sourced, or they're about C++, and I think it'd be a waste of time to ask each of these questions as a separate, "formal" question on Stack Overflow): according to the C99 and C11 standards, there's a difference between overflow, which occurs during an arithmetic operation, and a range error, which occurs when a value is too large to be represented in a given type. I've provided excerpts from the C99 standard that talk about this, and I'd appreciate it if someone could confirm that my interpretation is correct. (I'm aware of the fact that certain implementations define what happens when undefined behavior occurs, e.g. as explained here, but that's not what I'm interested in right now.)
for floating-point types, overflow results in some representation of a "large value" (i.e. as defined by the HUGE_VAL* macro definition as per 7.12.1):
A floating result overflows if the magnitude of the mathematical result is finite but so
large that the mathematical result cannot be represented without extraordinary roundoff
error in an object of the specified type. If a floating result overflows and default rounding
is in effect, or if the mathematical result is an exact infinity (for example log(0.0)),
then the function returns the value of the macro HUGE_VAL, HUGE_VALF, or HUGE_VALL according to the return type, with the same sign as the correct value of the
function;
On my system, HUGE_VAL* is defined as INFINITY cast to the appropriate floating-point type.
So this is completely legal, the value of HUGE_VAL* being implementation-defined or something like that notwithstanding.
for floating-point types, a range error results in undefined behavior (6.3.1.5):
When a double is demoted to float, a long double is demoted to double or
float, or a value being represented in greater precision and range than required by its
semantic type (see 6.3.1.8) is explicitly converted to its semantic type [...]. If the value being converted is outside the range of values that can be represented, the behavior is undefined.

Can I round-trip an aligned pointer through an IEEE double?

Can I round trip any 4-byte-aligned pointer through a double? And can I round trip any finite double through a string?
Specifically, on any platform that uses IEEE floating point, which conforms to C11, and on which neither static assertion fails, are the assertions in the following program guaranteed to pass?
#include <stdint.h>
#include <stdio.h>
#include <assert.h>
#include <string.h>
#include <math.h>
int main(void) {
struct {
void *dummy;
} main_struct;
main_struct.dummy = 0;
static_assert(_Alignof(main_struct) >= 4,
"Dummy struct insufficiently aligned");
static_assert(sizeof(double) == sizeof(uint64_t) && sizeof(double) == 8,
"double and uint64_t must have size 8");
double x;
uint64_t ptr = (uint64_t)&main_struct;
assert((ptr & 3) == 0);
ptr >>= 2;
memcpy(&x, &ptr, 8);
assert(!isnan(x));
assert(isfinite(x));
assert(x > 0);
char buf[1000];
snprintf(buf, sizeof buf, "Double is %#.20g\n", x);
double q;
sscanf(buf, "Double is %lg\n", &q);
assert(q == x);
assert(memcmp(&q, &ptr, 8) == 0);
}
Specifically, on any platform that uses IEEE floating point, which conforms to C11, and on which neither static assertion fails, are the assertions in the following program guaranteed to pass?
With only those requirements, then no. Among reasons that preclude it are the following:
You haven't asserted that pointers are 64 bits or less in size.
Nothing says that pointers and doubles use the same kind of endianness in memory. If pointers are big-endian and doubles are little-endian (or middle-endian, or use some other weird in-memory format), then your shifting does not preclude negative, infinite or NaN values.
Pointers are not guaranteed to translate simply into an integral value with lower-order bits guaranteed to be zero just because they point to an aligned value.
These objections may be somewhat pathological on current, practical platforms, but could certainly be true in theory, and nothing in your list of requirements stands against them.
It is, for instance, perfectly possible to imagine an architecture with a separate floating-point coprocessor that uses a different memory format than the main integer CPU. In fact, the Wikipedia article actually states that there are real examples of architectures that do this. As for weird pointer formats, the C FAQ provides some interesting historical examples.

Initializing floating point variable with large literal

#include <stdio.h>
int main(void) {
double x = 0.12345678901234567890123456789;
printf("%0.16f\n", x);
return 0;
};
In the code above I'm initializing x with literal that is too large to be represented by the IEEE 754 double. On my PC with gcc 4.9.2 it works well. The literal is rounded to the nearest value that fits into double. I'm wondering what happens behind the scene (on the compiler level) in this case? Does this behaviour depend on the platform? Is it legal?
When you write double x = 0.1;, the decimal number you have written is rounded to the nearest double. So what happens when you write 0.12345678901234567890123456789 is not fundamentally different.
The behavior is essentially implementation-defined, but most compilers will use the nearest representable double in place of the constant. The C standard specifies that it has to be either the double immediately above or the one immediately below.

Implicit conversion of integers to floats in c

I'm having trouble understading why this code's output is 2147483648:
#include <stdio.h>
int main (void){
float f = 2147483638;
printf("%f",f);
}
I tried to find explanation using IEEE 754 standard for float representation but using my calculations I get that output should be 2147483520, not 2147483648.
Thanks for help!
That is the way that float works on your system.
Note that the C standard is intentionally flexible as to the type and sizes of the floating point types. A float does not have to be an IEEE754 32 bit floating point type.

Odd behavior when converting C strings to/from doubles

I'm having trouble understanding C's rules for what precision to assume when printing doubles, or when converting strings to doubles. The following program should illustrate my point:
#include <errno.h>
#include <float.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv) {
double x, y;
const char *s = "1e-310";
/* Should print zero */
x = DBL_MIN/100.;
printf("DBL_MIN = %e, x = %e\n", DBL_MIN, x);
/* Trying to read in floating point number smaller than DBL_MIN gives an error */
y = strtod(s, NULL);
if(errno != 0)
printf(" Error converting '%s': %s\n", s, strerror(errno));
printf("y = %e\n", y);
return 0;
}
The output I get when I compile and run this program (on a Core 2 Duo with gcc 4.5.2) is:
DBL_MIN = 2.225074e-308, x = 2.225074e-310
Error converting '1e-310': Numerical result out of range
y = 1.000000e-310
My questions are:
Why is x printed as a nonzero number? I know compilers sometimes promote doubles to higher precision types for the purposes of computation, but shouldn't printf treat x as a 64-bit double?
If the C library is secretly using extended precision floating point numbers, why does strtod set errno when trying to convert these small numbers? And why does it produce the correct result anyway?
Is this behavior just a bug, a result of my particular hardware and development environment? (Unfortunately I'm not able to test on other platforms at the moment.)
Thanks for any help you can give. I will try to clarify the issue as I get feedback.
Because of the existence of denormal numbers in the IEEE-754 standard. DBL_MIN is the smallest normalised value.
Because the standard says so (C99 7.20.1.3):
If
the result underflows (7.12.1), the functions return a value whose magnitude is no greater
than the smallest normalized positive number in the return type; whether errno acquires
the value ERANGE is implementation-defined.
Returning the "correct" value (i.e. 1e-310) obeys the above constraint.
So not a bug. This is technically platform-dependent, because the C standard(s) place no requirements on the existence or behaviour of denormal numbers (AFAIK).
Here is what the standard says for strtod underflow (C99, 7.20.1.3p10)
"If the result underflows (7.12.1), the functions return a value whose magnitude is no greater than the smallest normalized positive number in the return type; whether errno acquires the value ERANGE is implementation-defined."
Regarding ERANGE on strtod underflow, here is what glibc says
"When underflow occurs, the underflow exception is raised, and zero (appropriately signed) is returned. errno may be set to ERANGE, but this is not guaranteed."
http://www.gnu.org/savannah-checkouts/gnu/libc/manual/html_node/Math-Error-Reporting.html
(Note that this page is explicitly linked on glibc strtod page "Parsing of Floats":
http://www.gnu.org/savannah-checkouts/gnu/libc/manual/html_node/Parsing-of-Floats.html

Resources