Odd behavior when converting C strings to/from doubles - c

I'm having trouble understanding C's rules for what precision to assume when printing doubles, or when converting strings to doubles. The following program should illustrate my point:
#include <errno.h>
#include <float.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char **argv) {
double x, y;
const char *s = "1e-310";
/* Should print zero */
x = DBL_MIN/100.;
printf("DBL_MIN = %e, x = %e\n", DBL_MIN, x);
/* Trying to read in floating point number smaller than DBL_MIN gives an error */
y = strtod(s, NULL);
if(errno != 0)
printf(" Error converting '%s': %s\n", s, strerror(errno));
printf("y = %e\n", y);
return 0;
}
The output I get when I compile and run this program (on a Core 2 Duo with gcc 4.5.2) is:
DBL_MIN = 2.225074e-308, x = 2.225074e-310
Error converting '1e-310': Numerical result out of range
y = 1.000000e-310
My questions are:
Why is x printed as a nonzero number? I know compilers sometimes promote doubles to higher precision types for the purposes of computation, but shouldn't printf treat x as a 64-bit double?
If the C library is secretly using extended precision floating point numbers, why does strtod set errno when trying to convert these small numbers? And why does it produce the correct result anyway?
Is this behavior just a bug, a result of my particular hardware and development environment? (Unfortunately I'm not able to test on other platforms at the moment.)
Thanks for any help you can give. I will try to clarify the issue as I get feedback.

Because of the existence of denormal numbers in the IEEE-754 standard. DBL_MIN is the smallest normalised value.
Because the standard says so (C99 7.20.1.3):
If
the result underflows (7.12.1), the functions return a value whose magnitude is no greater
than the smallest normalized positive number in the return type; whether errno acquires
the value ERANGE is implementation-defined.
Returning the "correct" value (i.e. 1e-310) obeys the above constraint.
So not a bug. This is technically platform-dependent, because the C standard(s) place no requirements on the existence or behaviour of denormal numbers (AFAIK).

Here is what the standard says for strtod underflow (C99, 7.20.1.3p10)
"If the result underflows (7.12.1), the functions return a value whose magnitude is no greater than the smallest normalized positive number in the return type; whether errno acquires the value ERANGE is implementation-defined."
Regarding ERANGE on strtod underflow, here is what glibc says
"When underflow occurs, the underflow exception is raised, and zero (appropriately signed) is returned. errno may be set to ERANGE, but this is not guaranteed."
http://www.gnu.org/savannah-checkouts/gnu/libc/manual/html_node/Math-Error-Reporting.html
(Note that this page is explicitly linked on glibc strtod page "Parsing of Floats":
http://www.gnu.org/savannah-checkouts/gnu/libc/manual/html_node/Parsing-of-Floats.html

Related

How to convert 64 bit hex value to double in c?

I'm using a gps module through which I'm getting the string
"0x3f947ae147ae147b"
which I need to convert to double. The expected value is 0.02.
I referred the following website for the reference
https://gregstoll.com/~gregstoll/floattohex/
How I can convert value in the C?
3F947AE147AE147B16 is the encoding for an IEEE-754 binary64 (a.k.a. “double precision”) datum with value 0.0200000000000000004163336342344337026588618755340576171875. Supposing your C implementation uses that format for double and has 64-bit integers with the same endianness, you can decode it (not convert it) by copying its bytes into a double and printing them:
#include <errno.h>
#include <limits.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char *string = "0x3f947ae147ae147b";
// Set errno to zero before using strtoull.
errno = 0;
char *end;
unsigned long long t = strtoull(string, &end, 16);
// Test whether stroull did not accept all characters.
if (*end)
{
fprintf(stderr,
"Error, string \"%s\", is not a proper hexadecimal numeral.\n",
string);
exit(EXIT_FAILURE);
}
// Move the value to a 64-bit unsigned integer.
uint64_t encoding = t;
/* Test whether the number is too large, either because strtoull reported
an error or because it does not fit in a uint64_t.
*/
if ((t == ULLONG_MAX && errno) || t != encoding)
{
fprintf(stderr, "Error, string \"%s\", is bigger than expected.\n",
string);
exit(EXIT_FAILURE);
}
// Copy the bytes into a double.
double x;
memcpy(&x, &encoding, sizeof x);
printf("%.9999g\n", x);
}
This should output “0.0200000000000000004163336342344337026588618755340576171875”.
If your C implementation does not support this format, you can decode it:
Separate the 64 bits into s, e, f, where s is the leading bit, e is the next 11 bits, and f is the remaining 52 bits.
If e is 2047 and f is zero, report the value is +∞ or −∞, according to whether s is 0 or 1, and stop.
If e is 2047 and f is not zero, report the value is a NaN (Not a Number) and stop.
If e is not zero, add 252 to f. If e is zero, change it to one.
The magnitude of the represented value is f•2−52•2e−1023, and its sign is + or − according to whether s is 0 or 1.
The usual way to convert a string of digits like "0x3f947ae147ae147b" into an actual integer is with one of the "strto" functions. Since you have 64 bits, and you're not interested in treating them as a signed integer (since you're about to, instead, try to treat them as a double), the appropriate choice is strtoull:
#include <stdlib.h>
char *str = "0x3f947ae147ae147b";
uint64_t x = strtoull(str, NULL, 16);
Now you have your integer, as you can verify by doing
printf("%llx\n", x);
But now the question is, how do you treat those bits as an IEEE-754 double value, instead of an integer? There are at least three ways to do it, in increasing levels of portability.
(1) Use pointers. Take a pointer to your integer value x, change it do a double pointer, then indirect on it, forcing the compiler to (try to) treat the bits of x as if they were a double:
double *dp = (double *)&x;
double d = *dp;
printf("%f\n", d);
This was once a decent and simple way to do it, but it is no longer legal as it runs afoul of the "strict aliasing rule". It might work for you, or it might not. Theoretically this sort of technique can also run into issues with alignment. For these reasons, this technique is not recommended.
(2) Use a union:
union u { uint64_t x; double d; } un;
un.x = strtoull(str, NULL, 16);
printf("%f\n", un.d);
Opinions differ on whether this technique is 100% strictly legal. I believe it's fine in C, but it may not be in C++. I'm not aware of machines where it won't work.
(3) Use memcpy:
#include <string.h>
uint64_t x = strtoull(str, NULL, 16);
double d;
memcpy(&d, &x, 8);
printf("%f\n", d);
This works by, literally, copying the individual bytes of the unsigned long int value x into the bytes of the double variable d. This is 100% portable (as long as x and d are the same size). I used to think it was wasteful, due to the extra function call, but these days it's a generally recommended technique, and I'm told that modern compilers are smart enough to recognize what you're trying to do, and emit perfectly efficient code (that is, just as efficient as techniques (1) or (2)).
Now, one other portability concern is that this all assumes that type double on your machine is in fact implemented using the same IEEE-754 double-precision format as your incoming hex string representation. That's actually a very safe assumption these days, although it's not strictly guaranteed by the C standards. If you like to be particularly careful about type correctness, you might add the lines
#include <assert.h>
assert(sizeof(uint64_t) == sizeof(double));
and change the memcpy call (if that's what you end up using) to
memcpy(&d, &x, sizeof(double));
(But note that these last few changes only guard against unexpected system-specific discrepancies in the size of type double, not its representation.)
One further point. Note that one technique which will most definitely not work is the superficially obvious
d = (double)x;
That line would perform an actual conversion of the value 0x3f947ae147ae147b. It won't just reinterpret the bits. If you try it, you'll get an answer like 4581421828931458048.000000. Where did that come from? Well, 0x3f947ae147ae147b in decimal is 4581421828931458171, and the closest value that type double can represent is 4581421828931458048. (Why can't type double represent the integer 4581421828931458171 exactly? Because it's a 62-bit number, and type double has at most 53 bits of precision.)

Using round() function in c

I'm a bit confused about the round() function in C.
First of all, man says:
SYNOPSIS
#include <math.h>
double round(double x);
RETURN VALUE
These functions return the rounded integer value.
If x is integral, +0, -0, NaN, or infinite, x itself is returned.
The return value is a double / float or an int?
In second place, I've created a function that first rounds, then casts to int. Latter on my code I use it as a mean to compare doubles
int tointn(double in,int n)
{
int i = 0;
i = (int)round(in*pow(10,n));
return i;
}
This function apparently isn't stable throughout my tests. Is there redundancy here? Well... I'm not looking only for an answer, but a better understanding on the subject.
The wording in the man-page is meant to be read literally, that is in its mathematical sense. The wording "x is integral" means that x is an element of Z, not that x has the data type int.
Casting a double to int can be dangerous because the maximum arbitrary integral value a double can hold is 2^52 (assuming an IEEE 754 conforming binary64 ), the maximum value an int can hold might be smaller (it is mostly 32 bit on 32-bit architectures and also 32-bit on some 64-bit architectures).
If you need only powers of ten you can test it with this little program yourself:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main(){
int i;
for(i = 0;i < 26;i++){
printf("%d:\t%.2f\t%d\n",i, pow(10,i), (int)pow(10,i));
}
exit(EXIT_SUCCESS);
}
Instead of casting you should use the functions that return a proper integral data type like e.g.: lround(3).
here is an excerpt from the man page.
#include <math.h>
double round(double x);
float roundf(float x);
long double roundl(long double x);
notice: the returned value is NEVER a integer. However, the fractional part of the returned value is set to 0.
notice: depending on exactly which function is called will determine the type of the returned value.
Here is an excerpt from the man page about which way the rounding will be done:
These functions round x to the nearest integer, but round halfway cases
away from zero (regardless of the current rounding direction, see
fenv(3)), instead of to the nearest even integer like rint(3).
For example, round(0.5) is 1.0, and round(-0.5) is -1.0.
If you want a long integer to be returned then please use lround:
long int tolongint(double in)
{
return lround(in));
}
For details please see lround which is available as of the C++ 11 standard.

pow numeric error in c

I'm wondering where does the numeric error happen, in what layer.
Let me explain using an example:
int p = pow(5, 3);
printf("%d", p);
I've tested this code on various HW and compilers (VS and GCC) and some of them print out 124, and some 125.
On the same HW (OS) i get different results in different compilers (VS and GCC).
On the different HW(OS) I get different results in the same compiler (cc (GCC) 4.8.1).
AFAIK, pow computes to 124.99999999 and that gets truncated to int, but where does this error happen?
Or, in other words, where does the correction happen (124.99->125)
Is it a compiler-HW interaction?
//****** edited:
Here's an additional snippet to play with (keep an eye on p=5, p=18, ...):
#include <stdio.h>
#include <math.h>
int main(void) {
int p;
for (p = 1; p < 20; p++) {
printf("\n%d %d %f %f", (int) pow(p, 3), (int) exp(3 * log(p)), pow(p, 3), exp(3 * log(p)));
}
return 0;
}
(First note that for an IEEE754 double precision floating point type, all integers up to the 53rd power of 2 can be represented exactly. Blaming floating point precision for integral pow inaccuracies is normally incorrect).
pow(x, y) is normally implemented in C as exp(y * log(x)). Hence it can "go off" for even quite small integral cases.
For small integral cases, I normally write the computation long-hand, and for other integral arguments I use a 3rd party library. Although a do-it-yourself solution using a for loop is tempting, there are effective optimisations that can be done for integral powers that such a solution might not exploit.
As for the observed different results, it could be down to some of the platforms using an 80 bit floating point intermediary. Perhaps some of the computations then are above 125 and others are below that.

strtod underflow, return value != 0

Here's my test code:
errno = 0;
d = strtod("1.8011670033376514e-308", NULL);
With this code, I get d == 1.8011670033376514e-308 and errno == ERANGE.
From strtod(3):
If the correct value would cause overflow, plus or minus HUGE_VAL (HUGE_VALF, HUGE_VALL) is returned (according to the sign of the value), and ERANGE is stored in errno. If the correct value would cause underflow, zero is returned and ERANGE is stored in errno.
So, it seems to me that either errno should be zero (no error) or d should be zero (underflow).
Is this a bug, or am I missing something? This happens for many different versions of eglibc and gcc.
In §7.22.1.3 The strtod(), strtof() and strtold() functions, the C11 standard (ISO/IEC 9899:2011) says:
The functions return the converted value, if any. If no conversion could be performed,
zero is returned. If the correct value overflows and default rounding is in effect (7.12.1),
plus or minus HUGE_VAL, HUGE_VALF, or HUGE_VALL is returned (according to the
return type and sign of the value), and the value of the macro ERANGE is stored in
errno. If the result underflows (7.12.1), the functions return a value whose magnitude is
no greater than the smallest normalized positive number in the return type; whether
errno acquires the value ERANGE is implementation-defined.
The standard also notes in §5.2.4.2.2 Characteristics of floating types that IEC 60559 (IEEE 754) floating point numbers have the limit:
DBL_MIN 2.2250738585072014E-308 // decimal constant
Since 1.8011670033376514e-308 is smaller than DBL_MIN, you get a sub-normal number, and ERANGE is quite appropriate (but optional).
On Mac OS X 10.9.4 with GCC 4.9.1, the following program:
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char *end;
errno = 0;
double d = strtod("1.8011670033376514e-308", &end);
if (errno != 0)
{
int errnum = errno;
printf("%d: %s\n", errnum, strerror(errnum));
}
printf("%24.16e\n", d);
unsigned char *p = (unsigned char *)&d;
const char *pad = "";
for (size_t i = 0; i < sizeof(double); i++)
{
printf("%s0x%.2X", pad, *p++);
pad = " ";
}
putchar('\n');
return 0;
}
produces the output:
34: Result too large
1.8011670033376514e-308
0x01 0x00 0x00 0x00 0xA8 0xF3 0x0C 0x00
The error message is ironically wrong — the value is too small — but you can't have everything.
The code is behaving according to The Open Group's POSIX specification of strtod():
If the correct value would cause an underflow, a value whose magnitude is no greater than the smallest normalized positive number in the return type shall be returned and errno set to [ERANGE].
I'd say what you're seeing is an error in detail in the Linux manpage.
If strtod() returned a non-zero value (that is not +/- HUGE_VAL), the call has succeeded (according to the man page you quoted).
Referring to the man page for errno.h:
The <errno.h> header file defines the integer variable errno, which
is set by system calls and some library functions in the event of an
error to indicate what went wrong. Its value is significant only
when the return value of the call indicated an error (i.e., -1 from
most system calls; -1 or NULL from most library functions); a
function that succeeds is allowed to change errno.
Thus, you can only check errno for an error if the return value of your function actually returns a value indicating an error has occurred.
A more complete explanation of errno (and an explanation of its relationship to strtod()) can be found on another StackExchange.

Does strtol("-2147483648", 0, 0) overflow if LONG_MAX is 2147483647?

Per the specification of strtol:
If the subject sequence has the expected form and the value of base is 0, the sequence of characters starting with the first digit shall be interpreted as an integer constant. If the subject sequence has the expected form and the value of base is between 2 and 36, it shall be used as the base for conversion, ascribing to each letter its value as given above. If the subject sequence begins with a minus-sign, the value resulting from the conversion shall be negated. A pointer to the final string shall be stored in the object pointed to by endptr, provided that endptr is not a null pointer.
The issue at hand is that, prior to the negation, the value is not in the range of long. For example, in C89 (where the integer constant can't take on type long long), writing -2147483648 is possibly an overflow; you have to write (-2147483647-1) or similar.
Since the wording using "integer constant" could be interpreted to apply the C rules for the type of an integer constant, this might be enough to save us from undefined behavior here, but the same issue (without such an easy out) would apply to strtoll.
Finally, note that even if it did overflow, the "right" value should be returned. So this question is really just about whether errno may or must be set in this case.
Although I cannot point to a particular bit of wording in the standard today, when I wrote strtol for 4BSD back in the 1990s I was pretty sure that this should not set errno, and made sure that I would not. Whether this was based on wording in the standard, or personal discussion with someone, I no longer recall.
In order to avoid overflow, this means the calculation has to be done pretty carefully. I did it in unsigned long and included this comment (still in the libc source in the various BSDs):
/*
* Compute the cutoff value between legal numbers and illegal
* numbers. That is the largest legal value, divided by the
* base. An input number that is greater than this value, if
* followed by a legal input character, is too big. One that
* is equal to this value may be valid or not; the limit
* between valid and invalid numbers is then based on the last
* digit. For instance, if the range for longs is
* [-2147483648..2147483647] and the input base is 10,
* cutoff will be set to 214748364 and cutlim to either
* 7 (neg==0) or 8 (neg==1), meaning that if we have accumulated
* a value > 214748364, or equal but the next digit is > 7 (or 8),
* the number is too big, and we will return a range error.
*
* Set 'any' if any `digits' consumed; make it negative to indicate
* overflow.
*/
I was (and still am, to some extent) annoyed by the asymmetry between this action in the C library and the syntax of the language itself (where negative numbers are two separate tokens, - followed by the number, so that writing -217483648 means -(217483648) which becomes -(217483648U) which is of course 217483648U and hence positive! (Assuming 32-bit int of course; the problematic value varies for other bit sizes.)
Based on the comp.std.c thread cited in a comment by ouah (9 years ago), the intent is clearly that it does not overflow. The actual language in the standard is still ambiguous:
If the subject sequence has the expected form and the value of base is zero, the sequence of characters starting with the first digit is interpreted as an integer constant according to the rules of 6.4.4.1. If the subject sequence has the expected form and the value of base is between 2 and 36, it is used as the base for conversion, ascribing to each letter its value as given above. If the subject sequence begins with a minus sign, the value resulting from the conversion is negated (in the return type).
In order to get the right behavior, you have to interpret the phrase "interpreted as an integer constant according to the rules of 6.4.4.1" as yielding an actual integer value, not a value within some C-language integer type, and the final "in the return type" as the negation happening with a typeless integer value as the operand, but a coerced type for the result.
Moreover, the error condition does not actually even define an "overflow" condition, but "correct value outside the range". This part of the text seems to be ignoring the unsigned issue addressed in DR006, since it only deals with the final value, not the pre-negation value:
If the correct value is outside the range of representable values, LONG_MIN, LONG_MAX, LLONG_MIN, LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned (according to the return type and sign of the value, if any), and the value of the macro ERANGE is stored in errno.
In short, this seems to still be a mess, due to the usual outcome where the committee says "yeah, it's supposed to mean what you think it should mean" and then never updates the ambiguous or outright wrong text in the standard...
On a 32-bit platform, -2147483648 is not an overflow under C89. It's LONG_MIN for and errno == 0.
Quoting directly from the standard
RETURN VALUE
Upon successful completion strtol() returns the converted value, if
any. If no conversion could be performed, 0 is returned and errno may
be set to [EINVAL]. If the correct value is outside the range of
representable values, LONG_MAX or LONG_MIN is returned (according to
the sign of the value), and errno is set to [ERANGE].
When tested, this seems to be in line with the following test:
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <limits.h>
int main(int argc, char *argv[]) {
long val = strtol(argv[1], NULL, 10);
fprintf(stderr, "long max: %ld, long min: %ld\n", LONG_MAX, LONG_MIN);
fprintf(stderr, "val: %ld, errno: %d\n", val, errno);
perror(argv[1]);
return 0;
}
When compiled as this on a 32-bit x86 system using:
gcc -std=c89 foo.c -o foo
produces the following outputs:
./foo -2147483648
Output:
long max: 2147483647, long min: -2147483648
val: -2147483648, errno: 0
-2147483648: Success
./foo -2147483649
Output:
long max: 2147483647, long min: -2147483648
val: -2147483648, errno: 34
-2147483649: Numerical result out of range

Resources