Can I round-trip an aligned pointer through an IEEE double? - c

Can I round trip any 4-byte-aligned pointer through a double? And can I round trip any finite double through a string?
Specifically, on any platform that uses IEEE floating point, which conforms to C11, and on which neither static assertion fails, are the assertions in the following program guaranteed to pass?
#include <stdint.h>
#include <stdio.h>
#include <assert.h>
#include <string.h>
#include <math.h>
int main(void) {
struct {
void *dummy;
} main_struct;
main_struct.dummy = 0;
static_assert(_Alignof(main_struct) >= 4,
"Dummy struct insufficiently aligned");
static_assert(sizeof(double) == sizeof(uint64_t) && sizeof(double) == 8,
"double and uint64_t must have size 8");
double x;
uint64_t ptr = (uint64_t)&main_struct;
assert((ptr & 3) == 0);
ptr >>= 2;
memcpy(&x, &ptr, 8);
assert(!isnan(x));
assert(isfinite(x));
assert(x > 0);
char buf[1000];
snprintf(buf, sizeof buf, "Double is %#.20g\n", x);
double q;
sscanf(buf, "Double is %lg\n", &q);
assert(q == x);
assert(memcmp(&q, &ptr, 8) == 0);
}

Specifically, on any platform that uses IEEE floating point, which conforms to C11, and on which neither static assertion fails, are the assertions in the following program guaranteed to pass?
With only those requirements, then no. Among reasons that preclude it are the following:
You haven't asserted that pointers are 64 bits or less in size.
Nothing says that pointers and doubles use the same kind of endianness in memory. If pointers are big-endian and doubles are little-endian (or middle-endian, or use some other weird in-memory format), then your shifting does not preclude negative, infinite or NaN values.
Pointers are not guaranteed to translate simply into an integral value with lower-order bits guaranteed to be zero just because they point to an aligned value.
These objections may be somewhat pathological on current, practical platforms, but could certainly be true in theory, and nothing in your list of requirements stands against them.
It is, for instance, perfectly possible to imagine an architecture with a separate floating-point coprocessor that uses a different memory format than the main integer CPU. In fact, the Wikipedia article actually states that there are real examples of architectures that do this. As for weird pointer formats, the C FAQ provides some interesting historical examples.

Related

How to convert 64 bit hex value to double in c?

I'm using a gps module through which I'm getting the string
"0x3f947ae147ae147b"
which I need to convert to double. The expected value is 0.02.
I referred the following website for the reference
https://gregstoll.com/~gregstoll/floattohex/
How I can convert value in the C?
3F947AE147AE147B16 is the encoding for an IEEE-754 binary64 (a.k.a. “double precision”) datum with value 0.0200000000000000004163336342344337026588618755340576171875. Supposing your C implementation uses that format for double and has 64-bit integers with the same endianness, you can decode it (not convert it) by copying its bytes into a double and printing them:
#include <errno.h>
#include <limits.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char *string = "0x3f947ae147ae147b";
// Set errno to zero before using strtoull.
errno = 0;
char *end;
unsigned long long t = strtoull(string, &end, 16);
// Test whether stroull did not accept all characters.
if (*end)
{
fprintf(stderr,
"Error, string \"%s\", is not a proper hexadecimal numeral.\n",
string);
exit(EXIT_FAILURE);
}
// Move the value to a 64-bit unsigned integer.
uint64_t encoding = t;
/* Test whether the number is too large, either because strtoull reported
an error or because it does not fit in a uint64_t.
*/
if ((t == ULLONG_MAX && errno) || t != encoding)
{
fprintf(stderr, "Error, string \"%s\", is bigger than expected.\n",
string);
exit(EXIT_FAILURE);
}
// Copy the bytes into a double.
double x;
memcpy(&x, &encoding, sizeof x);
printf("%.9999g\n", x);
}
This should output “0.0200000000000000004163336342344337026588618755340576171875”.
If your C implementation does not support this format, you can decode it:
Separate the 64 bits into s, e, f, where s is the leading bit, e is the next 11 bits, and f is the remaining 52 bits.
If e is 2047 and f is zero, report the value is +∞ or −∞, according to whether s is 0 or 1, and stop.
If e is 2047 and f is not zero, report the value is a NaN (Not a Number) and stop.
If e is not zero, add 252 to f. If e is zero, change it to one.
The magnitude of the represented value is f•2−52•2e−1023, and its sign is + or − according to whether s is 0 or 1.
The usual way to convert a string of digits like "0x3f947ae147ae147b" into an actual integer is with one of the "strto" functions. Since you have 64 bits, and you're not interested in treating them as a signed integer (since you're about to, instead, try to treat them as a double), the appropriate choice is strtoull:
#include <stdlib.h>
char *str = "0x3f947ae147ae147b";
uint64_t x = strtoull(str, NULL, 16);
Now you have your integer, as you can verify by doing
printf("%llx\n", x);
But now the question is, how do you treat those bits as an IEEE-754 double value, instead of an integer? There are at least three ways to do it, in increasing levels of portability.
(1) Use pointers. Take a pointer to your integer value x, change it do a double pointer, then indirect on it, forcing the compiler to (try to) treat the bits of x as if they were a double:
double *dp = (double *)&x;
double d = *dp;
printf("%f\n", d);
This was once a decent and simple way to do it, but it is no longer legal as it runs afoul of the "strict aliasing rule". It might work for you, or it might not. Theoretically this sort of technique can also run into issues with alignment. For these reasons, this technique is not recommended.
(2) Use a union:
union u { uint64_t x; double d; } un;
un.x = strtoull(str, NULL, 16);
printf("%f\n", un.d);
Opinions differ on whether this technique is 100% strictly legal. I believe it's fine in C, but it may not be in C++. I'm not aware of machines where it won't work.
(3) Use memcpy:
#include <string.h>
uint64_t x = strtoull(str, NULL, 16);
double d;
memcpy(&d, &x, 8);
printf("%f\n", d);
This works by, literally, copying the individual bytes of the unsigned long int value x into the bytes of the double variable d. This is 100% portable (as long as x and d are the same size). I used to think it was wasteful, due to the extra function call, but these days it's a generally recommended technique, and I'm told that modern compilers are smart enough to recognize what you're trying to do, and emit perfectly efficient code (that is, just as efficient as techniques (1) or (2)).
Now, one other portability concern is that this all assumes that type double on your machine is in fact implemented using the same IEEE-754 double-precision format as your incoming hex string representation. That's actually a very safe assumption these days, although it's not strictly guaranteed by the C standards. If you like to be particularly careful about type correctness, you might add the lines
#include <assert.h>
assert(sizeof(uint64_t) == sizeof(double));
and change the memcpy call (if that's what you end up using) to
memcpy(&d, &x, sizeof(double));
(But note that these last few changes only guard against unexpected system-specific discrepancies in the size of type double, not its representation.)
One further point. Note that one technique which will most definitely not work is the superficially obvious
d = (double)x;
That line would perform an actual conversion of the value 0x3f947ae147ae147b. It won't just reinterpret the bits. If you try it, you'll get an answer like 4581421828931458048.000000. Where did that come from? Well, 0x3f947ae147ae147b in decimal is 4581421828931458171, and the closest value that type double can represent is 4581421828931458048. (Why can't type double represent the integer 4581421828931458171 exactly? Because it's a 62-bit number, and type double has at most 53 bits of precision.)

Which "C" implementation(s) do not implement modulo arithmetic for signed integers?

In reference to C11 draft, section 3.4.3 and C11 draft, section H.2.2, I'm looking for "C" implementations that implement behaviour other than modulo arithmetic for signed integers.
Specifically, I am looking for instances where this is the default behaviour, possibly due to the underlying machine architecture.
Here's a code sample and terminal session that illustrates modulo arithmetic behaviour for signed integers:
overflow.c:
#include <stdio.h>
#include <limits.h>
int main(int argc, char *argv[])
{
int a, b;
printf ( "INT_MAX = %d\n", INT_MAX );
if ( argc == 2 && sscanf(argv[1], "%d,%d", &a, &b) == 2 ) {
int c = a + b;
printf ( "%d + %d = %d\n", a, b, c );
}
return 0;
}
Terminal session:
$ ./overflow 2000000000,2000000000
INT_MAX = 2147483647
2000000000 + 2000000000 = -294967296
Even with a "familiar" compiler like gcc, on a "familiar" platform like x86, signed integer overflow can do something other than the "obvious" twos-complement wraparound behavior.
One amusing (or possibly horrifying) example is the following (see on godbolt):
#include <stdio.h>
int main(void) {
for (int i = 0; i >= 0; i += 1000000000) {
printf("%d\n", i);
}
printf("done\n");
return 0;
}
Naively, you would expect this to output
0
1000000000
2000000000
done
And with gcc -O0 you would be right. But with gcc -O2 you get
0
1000000000
2000000000
-1294967296
-294967296
705032704
...
continuing indefinitely. The arithmetic is twos-complement wraparound, all right, but something seems to have gone wrong with the comparison in the loop condition.
In fact, if you look at the assembly output, you'll see that gcc has omitted the comparison entirely, and made the loop unconditionally infinite. It is able to deduce that if there were no overflow, the loop could never terminate, and since signed integer overflow is undefined behavior, it is free to have the loop not terminate in that case either. The simplest and "most efficient" legal code is therefore to never terminate at all, since that avoids an "unnecessary" comparison and conditional jump.
You might consider this either cool or perverse, depending on your point of view.
(For extra credit: look at what icc -O2 does and try to explain it.)
On many platforms, requiring that a compiler perform precise integer-size truncation would cause many constructs to run less efficiently than would be possible if they were allowed to use looser truncation semantics. For example, given int muldiv(int x, ind y) { return x*y/60; }, a compiler that was allowed to use loose integer semantics could replace muldiv(x,240); with x<<2, but one which was required to use precise semantics would need to actually perform the multiplication and division. Such optimizations are useful, and generally won't pose problems if casting operators are used in cases where programs need mod-reduced arithmetic, and compilers process a cast to a particular size as implying truncation to that size.
Even when using unsigned values, the presence of a cast in (uint32_t)(uint32a-uint32b) > uint32c will make the programmer's intention clearer, and would be necessary to ensure that code will operate the same on systems with 64-bit int as on those with 32-bit int, so if one wants to test for integer wraparound, even on a compiler that would define the behavior, I would regard (int)(x+someUnsignedChar) < x as superior to `x+someUnsignedChar < x because the cast would let a human reader know the code was deliberately treating values as something other than normal mathematical integers.
The big problem is that some compilers are prone to generate code which behaves nonsensically in case of integer overflow. Even a construct like unsigned mul_mod_65536(unsigned short x, unsigned short y) { return (x*y) & 0xFFFFu; } which the authors of the Standard expected commonplace implementations to process as in a way indistinguishable from unsigned math, will sometimes cause gcc to generate nonsensical code in cases where x would exceed INT_MAX/y.

How to guarantee exact size of double in C?

So, I am aware that types from the stdint.h header provide standardized width integer types, however I am wondering what type or method does one uses to guarantee the size of a double or other floating point type across platforms? Specifically, this would deal with packing data in a void*
#include <stdio.h>
#include <stdlib.h>
void write_double(void* buf, double num)
{
*(double*)buf = num;
}
double read_double(void* buf)
{
return *(double*)buf;
}
int main(void) {
void* buffer = malloc(sizeof(double));
write_double(buffer, 55);
printf("The double is %f\n", read_double(buffer));
return 0;
}
Say like in the above program, if I wrote that void* to a file or if it was used on another system, would there be some standard way to guarantee size of a floating point type or double?
How to guarantee exact size of double in C?
Use _Static_assert()
#include <limits.h>
int main(void) {
_Static_assert(sizeof (double)*CHAR_BIT == 64, "Unexpected double size");
return 0;
}
_Static_assert available since C11. Otherwise code could use a run-time assert.
#include <assert.h>
#include <limits.h>
int main(void) {
assert(sizeof (double)*CHAR_BIT == 64);
return 0;
}
Although this will insure the size of a double is 64, it does not insure IEEE 754 double-precision binary floating-point format adherence.
Code could use __STDC_IEC_559__
An implementation that defines __STDC_IEC_559__ shall conform to the specifications in this annex` C11 Annex F IEC 60559 floating-point arithmetic
Yet that may be too strict. Many implementations adhere to most of that standard, yet still do no set the macro.
would there be some standard way to guarantee size of a floating point type or double?
The best guaranteed is to write the FP value as its hex representation or as an exponential with sufficient decimal digits. See Printf width specifier to maintain precision of floating-point value
The problem with floating point type is that the C standard doesn't specify how they should be represented. The use of IEEE 754 is not required.
If you're communicating between a system that uses IEEE 754 and one that doesn't, you won't be able to write on one and read on the other even if the sizes are the same.
You need to serialize the data in a known format. You can either use sprintf to convert it to a text format, or you can do some math to determine the base and mantissa and store those.
Floating point values are defined in the The IEEE Standard for Floating-Point Arithmetic (IEEE 754) and have standard sizes:
float, in full "single precision floating point number": 32 bits
double, in full "double precision floating point number": 64 bits
The following also exist:
Half-precision floating-point format
Quadruple precision floating-point format
Extended precision floating-point format
This format is reused in the C11 standard, Annex F "IEC 60559 floating-point arithmetic" of ISO/IEC 9899:2011(en).
Why use CHAR_BIT and assert at runtime? We can do this at compile time.
void write_double(void* buf, double num)
{
char checkdoublesize[(sizeof(double) == 8)?1:-1];
*(double*)buf = num;
}
Your code is still undefined as it doesn't gurantee IEEE or endianness but it will catch a bad double size. If your platform's new enough for htonq this will allow endianness to work
void write_double(void* buf, double num)
{
char checkdoublesize[(sizeof(double) == 8)?1:-1];
*(int64_t*)buf = htonq(*(volatile int64_t*)&num);
}
double read_double(void* buf)
{
int64_t n = ntohq(*(int64_t*)buf);
return *(volatile double*)&n;
}
Where volatile is merely the shortest way to tell the compiler the pointer cast really is defined. Usually it does the right thing anyway but after N levels of inlining maybe it won't anymore.

pow numeric error in c

I'm wondering where does the numeric error happen, in what layer.
Let me explain using an example:
int p = pow(5, 3);
printf("%d", p);
I've tested this code on various HW and compilers (VS and GCC) and some of them print out 124, and some 125.
On the same HW (OS) i get different results in different compilers (VS and GCC).
On the different HW(OS) I get different results in the same compiler (cc (GCC) 4.8.1).
AFAIK, pow computes to 124.99999999 and that gets truncated to int, but where does this error happen?
Or, in other words, where does the correction happen (124.99->125)
Is it a compiler-HW interaction?
//****** edited:
Here's an additional snippet to play with (keep an eye on p=5, p=18, ...):
#include <stdio.h>
#include <math.h>
int main(void) {
int p;
for (p = 1; p < 20; p++) {
printf("\n%d %d %f %f", (int) pow(p, 3), (int) exp(3 * log(p)), pow(p, 3), exp(3 * log(p)));
}
return 0;
}
(First note that for an IEEE754 double precision floating point type, all integers up to the 53rd power of 2 can be represented exactly. Blaming floating point precision for integral pow inaccuracies is normally incorrect).
pow(x, y) is normally implemented in C as exp(y * log(x)). Hence it can "go off" for even quite small integral cases.
For small integral cases, I normally write the computation long-hand, and for other integral arguments I use a 3rd party library. Although a do-it-yourself solution using a for loop is tempting, there are effective optimisations that can be done for integral powers that such a solution might not exploit.
As for the observed different results, it could be down to some of the platforms using an 80 bit floating point intermediary. Perhaps some of the computations then are above 125 and others are below that.

How to get the upper-/lower machine-word of a double according to IEEE 754 (ansi-c)?

i want to use the sqrt implementation of fdlibm.
This implementation defines (according to the endianess) some macros for accessing the lower/upper 32-bit of a double) in the following way (here: only the little-endian-version):
#define __HI(x) *(1+(int*)&x)
#define __LO(x) *(int*)&x
#define __HIp(x) *(1+(int*)x)
#define __LOp(x) *(int*)x
The readme of flibm is saying the following (a little bit shortened)
Each double precision floating-point number must be in IEEE 754
double format, and that each number can be retrieved as two 32-bit
integers through the using of pointer bashing as in the example
below:
Example: let y = 2.0
double fp number y: 2.0
IEEE double format: 0x4000000000000000
Referencing y as two integers:
*(int*)&y,*(1+(int*)&y) = {0x40000000,0x0} (on sparc)
{0x0,0x40000000} (on 386)
Note: Four macros are defined in fdlibm.h to handle this kind of
retrieving:
__HI(x) the high part of a double x
(sign,exponent,the first 21 significant bits)
__LO(x) the least 32 significant bits of x
__HIp(x) same as __HI except that the argument is a pointer
to a double
__LOp(x) same as __LO except that the argument is a pointer
to a double
If the behavior of pointer bashing is undefined, one may hack on the
macro in fdlibm.h.
I want to use this implementation and these macros with the cbmc model checker, which should be conformable with ansi-c.
I don't know exactly whats wrong, but the following example shows that these macros aren't working (little-endian was chosen, 32-bit machine-word was chosen):
temp=24376533834232348.000000l (0100001101010101101001101001010100000100000000101101110010000111)
high=0 (00000000000000000000000000000000)
low=67296391 (00000100000000101101110010000111)
Both seem to be wrong. High seems to be empty for every value of temp.
Any new ideas for accessing the both 32-words with ansi-c?
UPDATE: Thanks for all your answers and comments. All of your proposals worked for me. For the moment i decided to use "R.."s version and marked this as favorite answer because it seems to be the most robust in my tool regarding endianness.
Why not use an union?
union {
double value;
struct {
int upper;
int lower;
} words;
} converter;
converter.value = 1.2345;
printf("%d",converter.words.upper);
(Note that the behaviour code is implementation-dependent and relies on internal representation and specific data sizes)
On top of that, if you make that struct contain bitfields, you can access the individual floating-point parts (sign, exponent and mantissa) separately:
union {
double value;
struct {
int upper;
int lower;
} words;
struct {
long long mantissa : 52; // not 2C!
int exponent : 11; // not 2C!
int sign : 1;
};
} converter;
Casting pointers like you're doing violates the aliasing rules of the C language (pointers of different types may be assumed by the compiler not to point to the same data, except in certain very restricted cases). A better approach might be:
#define REP(x) ((union { double v; uint64_t r; }){ x }).r
#define HI(x) (uint32_t)(REP(x) >> 32)
#define LO(x) (uint32_t)(REP(x))
Note that this also fixed the endian dependency (assuming the floating point and integer endianness are the same) and the illegal _-prefix on the macro names.
An even better way might be not breaking it into high/low portions at all, and using the uint64_t representation REP(x) directly.
From a standards perspective, this use of unions is a little bit suspect, but better than the pointer casts. Using a cast to unsigned char * and accessing the data byte-by-byte would be better in some ways, but worse in that you have to worry about endian considerations, and probably a lot slower..
I would suggest taking a look at the disassembly to see exactly why the existing "pointer-bashing" method does not work. In its absence, you might use something more traditional like a binary shift (if you're on a 64-bit system).

Resources