I am using very basic code to convert a string into a long and into a double. The CAN library I am using requires a double as an input. I am attempting to send the device ID as a double to another device on the CAN network.
If I use an input string of that is 6 bytes long the long and double values are the same. If I add a 7th byte to the string the values are slightly different.
I do not think I am hitting a max value limit. This code is run with ceedling for an automated test. The same behaviour is seen when sending this data across my CAN communications. In main.c the issue is not observed.
The test is:
void test_can_hal_get_spn_id(void){
struct dbc_id_info ret;
memset(&ret, NULL_TERMINATOR, sizeof(struct dbc_id_info));
char expected_str[8] = "smg123";
char out_str[8];
memset(&out_str, 0, 8);
uint64_t long_val = 0;
double phys = 0.0;
memcpy(&long_val, expected_str, 8);
phys = long_val;
printf("long %ld \n", long_val);
printf("phys %f \n", phys);
uint64_t temp = (uint64_t)phys;
memcpy(&out_str, &temp, 8);
printf("%s\n", expected_str);
printf("%s\n", out_str);
}
With the input = "smg123"
[test_can_hal.c]
- "long 56290670243187 "
- "phys 56290670243187.000000 "
- "smg123"
- "smg123"
With the input "smg1234"
[test_can_hal.c]
- "long 14692989459197299 "
- "phys 14692989459197300.000000 "
- "smg1234"
- "tmg1234"
Is this error just due to how floats are handled and rounded? Is there a way to test for that? Am I doing something fundamentally wrong?
Representing the char array as a double without the intermediate long solved the issue. For clarity I am using DBCPPP. I am using it in C. I should clarify my CAN library comes from NXP, DBCPPP allows my application to read a DBC file and apply the data scales and factors to my raw CAN data. DBCPPP accepts doubles for all data being encoded and returns doubles for all data being decoded.
The CAN library I am using requires a double as an input.
That sounds surprising, but if so, then why are you involving a long as an intermediary between your string and double?
If I use an input string of that is 6 bytes long the long and double values are the same. If I add a 7th byte to the string the values are slightly different.
double is a floating point data type. To be able to represent values with a wide range of magnitudes, some of its bits are used to represent scale, and the rest to represent significant digits. A typical C implementation uses doubles with 53 bits of significand. It cannot exactly represent numbers with more than 53 significant binary digits. That's enough for 6 bytes, but not enough for 7.
I do not think I am hitting a max value limit.
Not a maximum value limit. A precision limit. A 64-bit long has smaller numeric range but more significant digits than an IEEE-754 double.
So again, what role is the long supposed to be playing in your code? If the objective is to get eight bytes of arbitrary data into a double, then why not go directly there? Example:
char expected_str[8] = "smg1234";
char out_str[8] = {0};
double phys = 0.0;
memcpy(&phys, expected_str, 8);
printf("phys %.14e\n", phys);
memcpy(&out_str, &phys, 8);
printf("%s\n", expected_str);
printf("%s\n", out_str);
Do note, however, that there is some risk when (mis)using a double this way. It is possible for the data you put in to constitute a trap representation (a signaling NaN might be such a representation, for example). Handling such a value might cause a trap, or cause the data to be corrupted, or possibly produce other misbehavior. It is also possible to run into numeric issues similar to the one in your original code.
Possibly your library provides some relevant guarantees in that area. I would certainly hope so if doubles are really its sole data type for communication. Otherwise, you could consider using multiple doubles to covey data payloads larger than 53 bits, each of which you could consider loading via your original technique.
If you have a look at the IEEE-754 Wikipedia page, you'll see that the double precision values have a precision of "[a]pproximately 16 decimal digits". And that's roughly where your problem seems to appear.
Specifically, though it's a 64-bit type, it does not have the necessary encoding to provide 264 distinct floating point values. There are many bit patterns that map to the same value.
For example, NaN is encoded as the exponent field of binary 1111 1111 with non-zero fraction (23 bits) regardless of the sign (one bit). That's 2 * (223 - 1) (over 16 million) distinct values representing NaN.
So, yes, your "due to how floats are handled and rounded" comment is correct.
In terms of fixing it, you'll either have to limit your strings to values that can be represented by doubles exactly, or find a way to send the strings across the CAN bus.
For example (if you can't send strings), two 32-bit integers could represent an 8-character string value with zero chance of information loss.
Related
I need to transfer a double value (-47.1235648, for example) using sockets. Since I'll have a lot of platforms, I must convert to network byte order to ensure correct endian of all ends....but this convert doesn't accept double, just integer and short, so I'm 'cutting' my double into two integer to transfer, like this:
double lat = -47.848945;
int a;
int b;
a = (int)lat;
b = (int)(lat+1);
Now, I need to restore this on the other end, but using the minimum computation as possible (I saw some examples using POW, but looks like pow use a lot of resources for this, I'm not sure). Is there any way to join this as simples as possible, like bit manipulating?
Your code makes no sense.
The typical approach is to use memcpy():
const double lat = -47.848945;
uint32_t ints[sizeof lat / sizeof (uint32_t)];
memcpy(ints, &lat, sizeof lat);
Now send the elements of ints, which are just 32-bit unsigned integers.
This of course assumes:
That you know how to send uint32_ts in a safe manner, i.e. byte per byte or using endian-conversion functions.
That all hosts share the same binary double format (typically IEEE-754).
That you somehow can manage the byte order requirements when moving to/from a pair of integers from/to a single double value (see #JohnBollinger's answer).
I interpreted your question to mean all of these assumptions were safe, that might be a bit over the top. I can't delete this answer as long as it's accepted.
It's good that you're considering differences in numeric representation, but your idea for how to deal with this problem just doesn't work reliably.
Let us suppose that every machine involved uses 64-bit IEEE-754 format for its double representation. (That's already a potential point of failure, but in practice you probably don't have to worry about failures there.) You seem to postulate that the byte order for machines' doubles will map in a consistent way onto the byte order for their integers, but that is not a safe assumption. Moreover, even where that assumption holds true, you need exactly the right kind of mapping for your scheme to work, and that is not only not safe to assume, but very plausibly will not be what you actually see.
For the sake of argument, suppose machine A, which features big-endian integers, wants to transfer a double value to machine B, which features little-endian integers. Suppose further that on B, the byte order for its double representation is the exact reverse of the order on A (which, again, is not safe to assume). Thus, if on A, the bytes of that double are in the order
S T U V W X Y Z
then we want them to be in order
Z Y X W V U T S
on B. Your approach is to split the original into a pair (STUV, WXYZ), transfer the pair in a value-preserving manner to get (VUTS, ZYXW), and then put the pair back together to get ... uh oh ...
V U T S Z Y X W
. Don't imagine fixing that by first swapping the pair. That doesn't serve your purpose because you must avoid such a swap in the event that the two communicating machines have the same byte order, and you have no way to know from just the 8 bytes whether such a swap is needed. Thus even if we make simplifying assumptions that we know to be unsafe, your strategy is insufficient for the task.
Alternatives include:
transfer your doubles as strings.
transfer your doubles as integer (significand, scale) pairs. The frexp() and ldexp() functions can help with encoding and decoding such representations.
transfer an integer-based fixed-point representation of your doubles (the same as the previous option, but with pre-determined scale that is not transferred)
I need to transfer a double value (-47.1235648, for example) using sockets.
If the platforms have potentially different codings for double, then sending a bit pattern of the double is a problem. If code wants portability, a less than "just copy the bits" approach is needed. An alternative is below.
If platforms always have the same double format, just copy the n bits. Example:#Rishikesh Raje
In detail, OP's problem is only loosely defined. On many platforms, a double is a binary64 yet this is not required by C. That double can represent about 264 different values exactly. Neither -47.1235648 nor -47.848945 are one of those. So it is possible OP does not have a strong precision concern.
"using the minimum computation as possible" implies minimal code, usually to have minimal time. For speed, any solution should be rated on order of complexity and with code profiling.
A portable method is to send via a string. This approach addresses correctness and best possible precision first and performance second. It removes endian issues as data is sent via a string and there is no precision/range loss in sending the data. The receiving side, if the using the same double format will re-formed the double exactly. With different double machines, it has a good string representation to do the best it can.
// some ample sized buffer
#define N (sizeof(double)*CHAR_BIT)
double x = foo();
char buf[N];
#if FLT_RADIX == 10
// Rare based 10 platforms
// If macro DBL_DECIMAL_DIG not available, use (DBL_DIG+3)
sprintf(buf, "%.*e", DBL_DECIMAL_DIG-1, x);
#else
// print mantissa in hexadecimal notation with power-of-2 exponent
sprintf(buf, "%a", x);
#endif
bar_send_string(buf);
To reconstitute the double
char *s = foo_get_string();
double y;
// %f decode strings in decimal(f), exponential(e) or hexadecimal/exponential notation(a)
if (sscanf(s, "%f", &y) != 1) Handle_Error(s);
else use(y);
A much better idea would be to send the double directly as 8 bytes in network byte order.
You can use a union
typedef union
{
double a;
uint8_t bytes[8];
} DoubleUnionType;
DoubleUnionType DoubleUnion;
//Assign the double by
DoubleUnion.a = -47.848945;
Then you can make a network byte order conversion function
void htonfl(uint8_t *out, uint8_t *in)
{
#if LITTLE_ENDIAN // Use macro name as per architecture
out[0] = in[7];
out[1] = in[6];
out[2] = in[5];
out[3] = in[4];
out[4] = in[3];
out[5] = in[2];
out[6] = in[1];
out[7] = in[0];
#else
memcpy (out, in, 8);
#endif
}
And call this function before transmission and after reception.
I have a counter field from one TCP protocol which keeps track of samples sent out. It is defined as float64. I need to translate this protocol to a different one, in which the counter is defined as uint64.
Can I safely store the float64 value in the uint64 without losing any precision on the whole number part of the data? Assuming the fractional part can be ignored.
EDIT: Sorry if I didn't describe it clearly. There is no code. I'm just looking at two different protocol documentations to see if one can be translated to the other.
Please treat float64 as double, the documentation isn't well written and is pretty old.
Many thanks.
I am assuming you are asking about 64 bit floating point values such as IEEE Binary64 as documented in https://en.wikipedia.org/wiki/Double-precision_floating-point_format .
Converting a double represented as a 64 bit IEEE floating point value to a uint64_t will not lose any precision on the integral part of the value, as long as the value itself is non negative and less than 264. But if the value is larger than 253, the representation as a double does not allow complete precision, so whatever computation led to the value probably was inaccurate anyway.
Note that the reverse is false, IEEE floats have less precision than 64 bit uint64_t, so close but different large values will convert to the same double values.
Note that a counter implemented as a 64 bit double is intrinsically limited by the precision of the floating point type. Incrementing a value larger than 253 by one is likely to have no effect at all. Using a floating point type to implement a packet counter seems a very bad idea. Using a uint64_t counter directly seem a safer bet. You only have to worry about wrap around at 264, a condition that you can check for in the unlikely case where you would actually expect to count that far.
If you cannot change the format, verify that the floating point value is within range for the conversion and store an appropriate value if it is not:
#include <stdint.h>
...
double v = get_64bit_value();
uint64_t result;
if (v < 0) {
result = 0;
} else
if (v >= (double)UINT64_MAX) {
result = UINT64_MAX;
} else {
result = (uint64_t)v;
}
Yes, precision is lost. Negative numbers cannot be converted properly to uint64 (as this type is unsigned), as well as numbers greater than 2^64-1. In all other cases, the conversion is exact (providing you look at the float64 value as exact and rounding is done correctly).
Probably from the time I am trying to convert and wandering internet solely for the answer of this question but I could not find. I just got I can convert hexadecimal to decimal either by some serious programming or manually through math.
I am looking to convert. If there is any way to do that then please share. Well I have searched and found IEEE754 which seems not to be working or I am not comprehending it. Can I do it manually through any equation, I think I heard about it? Or a neat C program which may do it.
Please help! Any help would be highly appreciated.
You need to study the IEEE floating point spec.
This would be quite straightforward in Java. You have handy methods like Float.floatToRawIntBits(float x) and Float.intBitsToFloat(int x)
You might be able to do it with a union.
In C its a bit more hacky. You can abuse a union. Unions in C reuse the same memory for two different variables. A union like
union DoubleLong {
long l;
double d;
} u;
would allow you to treat the same bit of memory as either a long u.i or a double u.f. There are both 8 byte so they take the same space. So doing u.d = M_PI; printf("%lx\n", u.l); prints the binary representation of pi 0x400921fb54442d18.
For 16 byte we need the union to have an array or two 8 byte longs.
#include <stdio.h>
union Data {
long i[2];
long double f;
} u;
int main(int argc, char const *argv[]) {
// Using random IP6 address 2602:306:cecd:7130:5421:a679:6d71:a660
// Store in two separate 8-byte longs
u.i[0] = 0x2602306cecd7130;
u.i[1] = 0x5421a6796d71a660;
// Print out in hexidecimal
printf("%.15La %lx %lx\n", u.f,u.i[0],u.i[1]);
// print out in decimal
printf("%.15Le %ld %ld\n", u.f,u.i[0],u.i[1]);
return 0;
}
One problem is 16 byte hexadecimal floating point numbers might not be defined on you system. float is typically 32 bit - 4 byte, double is 64 bit - 8 byte. There is an long double type but on my mac its only 80-bit - 10 byte. It might be simpler to convert to two double precision numbers. So on my system only the last 4 hexadecimal digits of the second number are significant.
Not all hexadecimal numbers correspond to valid floating point numbers, a lot of values will correspond to NaN's. If the higher bits are 7FFF or FFFF (or 7FF, FFF for double) that will either give infinity of NaN.
I've recently enjoyed reading Beej's Guide to Network Programming. In section 7.4 he talks about problems related to sending floats. He offers a simple (and naive) solution where he "packs" floats by converting them to uint32_t's:
uint32_t htonf(float f)
{
uint32_t p;
uint32_t sign;
if (f < 0) { sign = 1; f = -f; }
else { sign = 0; }
p = ((((uint32_t)f)&0x7fff)<<16) | (sign<<31); // whole part and sign
p |= (uint32_t)(((f - (int)f) * 65536.0f))&0xffff; // fraction
return p;
}
float ntohf(uint32_t p)
{
float f = ((p>>16)&0x7fff); // whole part
f += (p&0xffff) / 65536.0f; // fraction
if (((p>>31)&0x1) == 0x1) { f = -f; } // sign bit set
return f;
}
Am I supposed to run the packed floats (that is, the results of htonf) through the standard htons before sending? If no, why not?
Beej doesn't mention this as far as I can tell. The reason I'm asking is that I cannot understand how the receiving machine can reliably reconstruct the uint32_ts that are to be passed to ntohf (the "unpacker") if the data isn't converted to network byte order before being sent.
Yes, you would also have to marshall the data in a defined order; the easiest way would be to use htonl.
But, aside from educational purposes, I'd really suggest staying away from this code. It has a very limited range, and silently corrupts most numbers. Also, it's really unnecessarily complicated for what it does. You might just as well multiply the float by 65536 and cast it to an int to send; cast to a float and divide by 65536.0 to receive. (As noted in a comment, it is even questionable whether the guide's code is educational: I'd say it is educational in the sense that critiquing it and/or comparing it with good code will teach you something: if nothing else, that not everything that glitters on the web is gold.)
Almost all CPUs actually out there these days use IEEE-754 format floats, but I wouldn't use Beej's second solution either because it's unnecessarily slow; the standard library functions frexp and ldexp will reliably convert between a double and the corresponding mantissa and integer binary exponent. Or you can use ilogb* and scalb*, if you prefer that interface. You can find the appropriate bit length for the mantissa on the host machine through the macros FLT_MANT_DIG, DBL_MANT_DIG and LDBL_MANT_DIG (in float.h). [See note 1]
Coding floating point data transfer properly is a good way to start to understand floating point representations, which is definitely worthwhile. But if you just want to transmit floating point numbers over the wire and you don't have some idiosyncratic processor to support, I'd suggest just sending the raw bits of the float or double as a 4-byte or 8-byte integer (in whatever byte order you've selected as standard), and restricting yourself to IEEE-754 32- and 64-bit representations.
Notes:
Implementation hint: frexp returns a mantissa between 0.5 and 1.0, but what you really want is an integer, so you should scale the mantissa by the correct power of 2 and subtract that from the binary exponent returned by frexp. The result is not really precision-dependent as long as you can transmit arbitrary precision integers, so you don't need to distinguish between float, double, or some other binary representation.
Run them through htonl (and vice versa), not htons.
These two functions, htonf and ntohf, are OK as far as they go (i.e., not very far at all), but their names are misleading. They produce a fixed-point 32-bit representation with 31 bits of that split up as: 15 bits of integer, 16 bits of fraction. The remaining bit holds the sign. This value is in the host's internal representation. (You could do the htonl etc right in the functions themselves, to fix this.)
Note that any float whose absolute value reaches or exceeds 32768, or is less than 2-16 (.0000152587890625), will be wrecked in the process of "network-izing", since those do not fit in a 15.16 format.
(Edit to add: It's better to use a packaged network-izer. Even something as old as the Sun RPC XDR routines will encode floating-point properly.)
I have a buffer with many positive 16bit values (which are stored as doubles) that I would like to quantize to 8bit (0-255 values).
According to Wikipedia the process would be:
Normalize 16 bit values. I.e. find the largest and divide with this.
Use the Q(x) formula with M=8.
So I wonder, if C have a function that can do this quantization, or does anyone know of a C implementation that I could use?
Lots of love,
Louise
Assuming the value d is in the interval [0.0, max]:
unsigned char quantize(double d, double max)
{
return (unsigned char)((d / max) * 255.0);
}
I'm not sure what you mean by "16-bit values;" double precision values are 64-bit on any system using IEEE-754. However, if you have values of another numeric type, the process is effectively the same.
This sounds like waveform audio processing, where your input is 16 PCM data, your output is 8 bit PCM data and you are using doubles as an intermediate value.
However, 8 bit PCM wave data is NOT just quantized, the representation is unsigned values in excess 128 notation. (like the way exponents are stored in floating point numbers)
Finding the largest value first would not only be quantizing, but also scaling. So in pseudo code
double dMax = max_of_all_values(); //
...
foreach (dValue in array_of_doubles)
{
signed char bValue = (signed char)((dValue / dMax)*127.0);
}
You could round rather than truncating if you want more accuracy, but in audio processing, it's generally better to randomize the truncation order or even shape it by essentially running a filtering algorithm as part of the truncation from doubles to signed chars.
Note: that signed char is NOT correct if the output is 8 bit PCM data, but since the questions doesn't specifically request that, I left it out.
Edit: if this is to be used as pixel data, then you want unsigned values. I see James already gave the correct answer for unsigned values when the input is unsigned (dB values from normalized data should be all negative, actually)
It is not clear from your question what the encoding is since "positive 16bit values (which are stored as doubles)" makes no real sense; they are either 16 bit or they are double, they cannot be both.
However assuming that this is 16 bit unsigned data normalised to 1.0 (so the values range from 0.0 <= s <= 1.0), then all you need to do to expand them to 8bit integer values is to multiply each sample by 255.
unsigned char s8 = s * 255 ;
If the range is not 0.0 <= s <= 1.0, but 0.0 <= s <= max then:
unsigned char s8 = s / max * 255 ;
Either way, there is no "quantisation" function other than one you might write yourself; but the necessary transform will no doubt be a simple arithmetic expression (although not so simple if the data is companded perhaps - i.e. μ-lay or A-law encoded for example).