Convert a 32 bits to float value - c

I am working on a DSP processor to implement a BFSK frequency hopping mechanism using C on a Linux system. On the receiver part of the program, I am getting an input of a set of samples which I de-modulate using Goertzel algorithm to determine whether the received bit was a 0 or 1.
Right now, I am able to detect the bits individually. But I have to return the data for processing in the form of a float array. So, I need to pack every set of 32 bits received to form a float value. Right I am doing something like :
uint32_t i,j,curBit,curBlk;
unint32_t *outData; //this is intiallized to address of some pre-defined location in DSP memory
float *output;
for(i=0; i<num_os_bits; i++) //Loop for number of data bits
{
//Demodulate the data and set curBit=0x0000 or 0x8000
curBlk=curBlk>>1;
curBlk=curBlk|curBit;
bitsCounter+=1;
if(i!=0 && (bitsCounter%32)==0) //32-bits processed, save the data in array
{
*(outData+j)=curBlk;
j+=1;
curBlk=0;
}
}
output=(float *)outData;
Now, the values of the output array are just the values of outData array with 0s after the decimal point.
example: if output[i]=12345 the `outData[i]=12345.0000'.
But while testing the program I am generating the sample test data of bits using an array of float
float data[] ={123.12456,45.789,297.0956};
So after the demodulation I am expecting the float array output to have a same values as data array.
Is there some other method to convert 32-bits of data to a float. Should I store the received bits to a char array and then convert it to float.

Not sure if I get your point - you sequentialy obtain bits and if you got 32 bit you want to make a float from them?
what about:
union conv32
{
uint32_t u32; // here_write_bits
float f32; // here_read_float
};
Which can be used like this in-line:
float f = ((union conv32){.u32 = my_uint}).f32;
Or with a helper variable:
union conv32 x = {.u32 = my_uint};
float f = x.f32;

You need to copy the bit pattern verbatim without invoking any implicit conversions. The simplest way to do that is to take the address of the data and reinterpret it as a pointer to the intermediate "carrier" type before dereferencing it.
Consider this:
float source_float = 1234.5678f ;
uint32_t transport_bits = *((uint32_t*)&source_float);
float destination_float = *((float*)&transport_bits);
The intermediate result in transport_bits is target dependent, in my test on x86, it is 0x449a522b, this being the bit representation of single precision float on that platform (floating point representation and endianness dependent).
Either way the result in destination_float is identical to source_float having been transported via the uint32_t transport_bits.
However this does require that the floating point representation of the originator is identical to that of the receiver which may not be a given. It is all a bit non-portable, The fact that your code does not work suggests perhaps that the representation does indeed differ between sender and receiver. Not only must the FP representation be identical but also the endianness. If they differ your code may have to do one or both of reorder the bits and calculate the floating point equivalent by extraction of exponent and mantissa. All so you nee to be sure that the transmission order of the bits is is the same order in which you are reassembling them. There are a number of opportunities to get this wrong.

Related

Convert Long To Double, Unexpected Results

I am using very basic code to convert a string into a long and into a double. The CAN library I am using requires a double as an input. I am attempting to send the device ID as a double to another device on the CAN network.
If I use an input string of that is 6 bytes long the long and double values are the same. If I add a 7th byte to the string the values are slightly different.
I do not think I am hitting a max value limit. This code is run with ceedling for an automated test. The same behaviour is seen when sending this data across my CAN communications. In main.c the issue is not observed.
The test is:
void test_can_hal_get_spn_id(void){
struct dbc_id_info ret;
memset(&ret, NULL_TERMINATOR, sizeof(struct dbc_id_info));
char expected_str[8] = "smg123";
char out_str[8];
memset(&out_str, 0, 8);
uint64_t long_val = 0;
double phys = 0.0;
memcpy(&long_val, expected_str, 8);
phys = long_val;
printf("long %ld \n", long_val);
printf("phys %f \n", phys);
uint64_t temp = (uint64_t)phys;
memcpy(&out_str, &temp, 8);
printf("%s\n", expected_str);
printf("%s\n", out_str);
}
With the input = "smg123"
[test_can_hal.c]
- "long 56290670243187 "
- "phys 56290670243187.000000 "
- "smg123"
- "smg123"
With the input "smg1234"
[test_can_hal.c]
- "long 14692989459197299 "
- "phys 14692989459197300.000000 "
- "smg1234"
- "tmg1234"
Is this error just due to how floats are handled and rounded? Is there a way to test for that? Am I doing something fundamentally wrong?
Representing the char array as a double without the intermediate long solved the issue. For clarity I am using DBCPPP. I am using it in C. I should clarify my CAN library comes from NXP, DBCPPP allows my application to read a DBC file and apply the data scales and factors to my raw CAN data. DBCPPP accepts doubles for all data being encoded and returns doubles for all data being decoded.
The CAN library I am using requires a double as an input.
That sounds surprising, but if so, then why are you involving a long as an intermediary between your string and double?
If I use an input string of that is 6 bytes long the long and double values are the same. If I add a 7th byte to the string the values are slightly different.
double is a floating point data type. To be able to represent values with a wide range of magnitudes, some of its bits are used to represent scale, and the rest to represent significant digits. A typical C implementation uses doubles with 53 bits of significand. It cannot exactly represent numbers with more than 53 significant binary digits. That's enough for 6 bytes, but not enough for 7.
I do not think I am hitting a max value limit.
Not a maximum value limit. A precision limit. A 64-bit long has smaller numeric range but more significant digits than an IEEE-754 double.
So again, what role is the long supposed to be playing in your code? If the objective is to get eight bytes of arbitrary data into a double, then why not go directly there? Example:
char expected_str[8] = "smg1234";
char out_str[8] = {0};
double phys = 0.0;
memcpy(&phys, expected_str, 8);
printf("phys %.14e\n", phys);
memcpy(&out_str, &phys, 8);
printf("%s\n", expected_str);
printf("%s\n", out_str);
Do note, however, that there is some risk when (mis)using a double this way. It is possible for the data you put in to constitute a trap representation (a signaling NaN might be such a representation, for example). Handling such a value might cause a trap, or cause the data to be corrupted, or possibly produce other misbehavior. It is also possible to run into numeric issues similar to the one in your original code.
Possibly your library provides some relevant guarantees in that area. I would certainly hope so if doubles are really its sole data type for communication. Otherwise, you could consider using multiple doubles to covey data payloads larger than 53 bits, each of which you could consider loading via your original technique.
If you have a look at the IEEE-754 Wikipedia page, you'll see that the double precision values have a precision of "[a]pproximately 16 decimal digits". And that's roughly where your problem seems to appear.
Specifically, though it's a 64-bit type, it does not have the necessary encoding to provide 264 distinct floating point values. There are many bit patterns that map to the same value.
For example, NaN is encoded as the exponent field of binary 1111 1111 with non-zero fraction (23 bits) regardless of the sign (one bit). That's 2 * (223 - 1) (over 16 million) distinct values representing NaN.
So, yes, your "due to how floats are handled and rounded" comment is correct.
In terms of fixing it, you'll either have to limit your strings to values that can be represented by doubles exactly, or find a way to send the strings across the CAN bus.
For example (if you can't send strings), two 32-bit integers could represent an 8-character string value with zero chance of information loss.

getting exponent of a floating number in c

Sorry if this is already been asked, and I've seen other way of extracting the exponent of a floating point number, however this is what is given to me:
unsigned f2i(float f)
{
union {
unsigned i;
float f;
} x;
x.i = 0;
x.f = f;
return x.i;
}
I'm having trouble understanding this union datatype, because shouldn't the return x.i at the end always make f2i return a 0?
Also, what application could this data type even be useful for? For example, say I have a function:
int getexponent(float f){
}
This function is supposed to get the exponent value of the floating point number with bias of 127. I've found many ways to make this possible, however how could I manipulate the f2i function to serve this purpose?
I appreciate any pointers!
Update!!
Wow, years later and this just seem trivial.
For those who may be interested, here is the function!
int getexponent(float f) {
unsigned f2u(float f);
unsigned int ui = (f2u(f)>>23) & 0xff ;//shift over by 23 and compare to 0xff to get the exponent with the bias
int bias = 127;//initialized bias
if(ui == 0) return 1-bias; // special case 0
else if(ui == 255) return 11111111; //special case infinity
return ui - bias;
}
I'm having trouble understanding this union datatype
The union data type is a way for a programmer to indicate that some variable can be one of a number of different types. The wording of the C11 standard is something like "a union contains at most one of its members". It is used for things like parameters that may be logically one thing or another. For example, an IP address might be an IPv4 address or an IPv6 address so you might define an address type as follows:
struct IpAddress
{
bool isIPv6;
union
{
uint8_t v4[4];
uint8_t v6[16];
} bytes;
}
And you would use it like this:
struct IpAddress address = // Something
if (address.isIPv6)
{
doSomeV6ThingWith(address.bytes.v6);
}
else
{
doSomeV4ThingWith(address.bytes.v4);
}
Historically, unions have also been used to get the bits of one type into an object of another type. This is because, in a union, the members all start at the same memory address. If I just do this:
float f = 3.0;
int i = f;
The compiler will insert code to convert a float to an integer, so the exponent will be lost. However, in
union
{
unsigned int i;
float f;
} x;
x.f = 3.0;
int i = x.i;
i now contains the exact bits that represent 3.0 in a float. Or at least you hope it does. There's nothing in the C standard that says float and unsigned int have to be the same size. There's also nothing in the C standard that mandates a particular representation for float (well, annex F says floats conform to IEC 60559 , but I don't know if that counts as part of the standard). So the above code is, at best, non portable.
To get the exponent of a float the portable way is the frexpf() function defined in math.h
how could I manipulate the f2i function to serve this purpose?
Let's make the assumption that a float is stored in IEC 60559 format in 32 bits which Wkipedia thinks is the same as IEEE 754. Let's also assume that integers are stored in little endian format.
union
{
uint32_t i;
float f;
} x;
x.f = someFloat;
uint32_t bits = x.i;
bits now contains the bit pattern of the floating point number. A single precision floating point number looks like this
SEEEEEEEEMMMMMMMMMMMMMMMMMMMMMMM
^ ^ ^
bit 31 bit 22 bit 0
Where S is the sign bit, E is an exponent bit, M is a mantissa bit.
So having got your int32_t you just need to do some shifting and masking:
uint32_t exponentWithBias = (bits >> 23) & 0xff;
Because it's a union it means that x.i and x.f have the same address, what this allows you to do is reinterpret one data type to another. In this scenario the union is first zeroed out by x.i = 0; and then filled with f. Then x.i is returned which is the integer representation of the float f. If you would then shift that value you would get the exponent of the original f because of the way a float is laid out in memory.
I'm having trouble understanding this union datatype, because shouldn't the return x.i at the end always make f2i return a 0?
The line x.i = 0; is a bit paranoid and shouldn't be necessary. Given that unsigned int and float are both 32 bits, the union creates a single chunk of 32 bits in memory, which you can access either as a float or as the pure binary representation of that float, which is what the unsigned is for. (It would have been better to use uint32_t.)
This means that the lines x.i = 0; and x.f = f; write to the very same memory area twice.
What you end up with after the function is the pure binary notation of the float. Parsing out the exponent or any other part from there is very much implementation-defined, since it depends on floating point format and endianess. How to represent FLOAT number in memory in C might be helpful.
That union type is strongly discouraged, as it is strongly architecture dependant and compiler implementation dependant.... both things make it almost impossible to determine a correct way to achieve the information you request.
There are portable ways of doing that, and all of them have to deal with the calculation of logarithm to the base ten. If you get the integer part of the log10(x) you'll get the number you want,
int power10 = (int)log10(x);
double log10(double x)
{
return log(x)/log(10.0);
}
will give you the exponent of 10 to raise to get the number to multiply the mantissa to get the number.... if you divide the original number by the last result, you'll get the mantissa.
Be careful, as the floating point numbers are normally internally stored in a power of two's basis, which means the exponent you get stored is not a power of ten, but a power of two.

Join two integers into one double

I need to transfer a double value (-47.1235648, for example) using sockets. Since I'll have a lot of platforms, I must convert to network byte order to ensure correct endian of all ends....but this convert doesn't accept double, just integer and short, so I'm 'cutting' my double into two integer to transfer, like this:
double lat = -47.848945;
int a;
int b;
a = (int)lat;
b = (int)(lat+1);
Now, I need to restore this on the other end, but using the minimum computation as possible (I saw some examples using POW, but looks like pow use a lot of resources for this, I'm not sure). Is there any way to join this as simples as possible, like bit manipulating?
Your code makes no sense.
The typical approach is to use memcpy():
const double lat = -47.848945;
uint32_t ints[sizeof lat / sizeof (uint32_t)];
memcpy(ints, &lat, sizeof lat);
Now send the elements of ints, which are just 32-bit unsigned integers.
This of course assumes:
That you know how to send uint32_ts in a safe manner, i.e. byte per byte or using endian-conversion functions.
That all hosts share the same binary double format (typically IEEE-754).
That you somehow can manage the byte order requirements when moving to/from a pair of integers from/to a single double value (see #JohnBollinger's answer).
I interpreted your question to mean all of these assumptions were safe, that might be a bit over the top. I can't delete this answer as long as it's accepted.
It's good that you're considering differences in numeric representation, but your idea for how to deal with this problem just doesn't work reliably.
Let us suppose that every machine involved uses 64-bit IEEE-754 format for its double representation. (That's already a potential point of failure, but in practice you probably don't have to worry about failures there.) You seem to postulate that the byte order for machines' doubles will map in a consistent way onto the byte order for their integers, but that is not a safe assumption. Moreover, even where that assumption holds true, you need exactly the right kind of mapping for your scheme to work, and that is not only not safe to assume, but very plausibly will not be what you actually see.
For the sake of argument, suppose machine A, which features big-endian integers, wants to transfer a double value to machine B, which features little-endian integers. Suppose further that on B, the byte order for its double representation is the exact reverse of the order on A (which, again, is not safe to assume). Thus, if on A, the bytes of that double are in the order
S T U V W X Y Z
then we want them to be in order
Z Y X W V U T S
on B. Your approach is to split the original into a pair (STUV, WXYZ), transfer the pair in a value-preserving manner to get (VUTS, ZYXW), and then put the pair back together to get ... uh oh ...
V U T S Z Y X W
. Don't imagine fixing that by first swapping the pair. That doesn't serve your purpose because you must avoid such a swap in the event that the two communicating machines have the same byte order, and you have no way to know from just the 8 bytes whether such a swap is needed. Thus even if we make simplifying assumptions that we know to be unsafe, your strategy is insufficient for the task.
Alternatives include:
transfer your doubles as strings.
transfer your doubles as integer (significand, scale) pairs. The frexp() and ldexp() functions can help with encoding and decoding such representations.
transfer an integer-based fixed-point representation of your doubles (the same as the previous option, but with pre-determined scale that is not transferred)
I need to transfer a double value (-47.1235648, for example) using sockets.
If the platforms have potentially different codings for double, then sending a bit pattern of the double is a problem. If code wants portability, a less than "just copy the bits" approach is needed. An alternative is below.
If platforms always have the same double format, just copy the n bits. Example:#Rishikesh Raje
In detail, OP's problem is only loosely defined. On many platforms, a double is a binary64 yet this is not required by C. That double can represent about 264 different values exactly. Neither -47.1235648 nor -47.848945 are one of those. So it is possible OP does not have a strong precision concern.
"using the minimum computation as possible" implies minimal code, usually to have minimal time. For speed, any solution should be rated on order of complexity and with code profiling.
A portable method is to send via a string. This approach addresses correctness and best possible precision first and performance second. It removes endian issues as data is sent via a string and there is no precision/range loss in sending the data. The receiving side, if the using the same double format will re-formed the double exactly. With different double machines, it has a good string representation to do the best it can.
// some ample sized buffer
#define N (sizeof(double)*CHAR_BIT)
double x = foo();
char buf[N];
#if FLT_RADIX == 10
// Rare based 10 platforms
// If macro DBL_DECIMAL_DIG not available, use (DBL_DIG+3)
sprintf(buf, "%.*e", DBL_DECIMAL_DIG-1, x);
#else
// print mantissa in hexadecimal notation with power-of-2 exponent
sprintf(buf, "%a", x);
#endif
bar_send_string(buf);
To reconstitute the double
char *s = foo_get_string();
double y;
// %f decode strings in decimal(f), exponential(e) or hexadecimal/exponential notation(a)
if (sscanf(s, "%f", &y) != 1) Handle_Error(s);
else use(y);
A much better idea would be to send the double directly as 8 bytes in network byte order.
You can use a union
typedef union
{
double a;
uint8_t bytes[8];
} DoubleUnionType;
DoubleUnionType DoubleUnion;
//Assign the double by
DoubleUnion.a = -47.848945;
Then you can make a network byte order conversion function
void htonfl(uint8_t *out, uint8_t *in)
{
#if LITTLE_ENDIAN // Use macro name as per architecture
out[0] = in[7];
out[1] = in[6];
out[2] = in[5];
out[3] = in[4];
out[4] = in[3];
out[5] = in[2];
out[6] = in[1];
out[7] = in[0];
#else
memcpy (out, in, 8);
#endif
}
And call this function before transmission and after reception.

Should a custom int representation of a float be run through htons before sending?

I've recently enjoyed reading Beej's Guide to Network Programming. In section 7.4 he talks about problems related to sending floats. He offers a simple (and naive) solution where he "packs" floats by converting them to uint32_t's:
uint32_t htonf(float f)
{
uint32_t p;
uint32_t sign;
if (f < 0) { sign = 1; f = -f; }
else { sign = 0; }
p = ((((uint32_t)f)&0x7fff)<<16) | (sign<<31); // whole part and sign
p |= (uint32_t)(((f - (int)f) * 65536.0f))&0xffff; // fraction
return p;
}
float ntohf(uint32_t p)
{
float f = ((p>>16)&0x7fff); // whole part
f += (p&0xffff) / 65536.0f; // fraction
if (((p>>31)&0x1) == 0x1) { f = -f; } // sign bit set
return f;
}
Am I supposed to run the packed floats (that is, the results of htonf) through the standard htons before sending? If no, why not?
Beej doesn't mention this as far as I can tell. The reason I'm asking is that I cannot understand how the receiving machine can reliably reconstruct the uint32_ts that are to be passed to ntohf (the "unpacker") if the data isn't converted to network byte order before being sent.
Yes, you would also have to marshall the data in a defined order; the easiest way would be to use htonl.
But, aside from educational purposes, I'd really suggest staying away from this code. It has a very limited range, and silently corrupts most numbers. Also, it's really unnecessarily complicated for what it does. You might just as well multiply the float by 65536 and cast it to an int to send; cast to a float and divide by 65536.0 to receive. (As noted in a comment, it is even questionable whether the guide's code is educational: I'd say it is educational in the sense that critiquing it and/or comparing it with good code will teach you something: if nothing else, that not everything that glitters on the web is gold.)
Almost all CPUs actually out there these days use IEEE-754 format floats, but I wouldn't use Beej's second solution either because it's unnecessarily slow; the standard library functions frexp and ldexp will reliably convert between a double and the corresponding mantissa and integer binary exponent. Or you can use ilogb* and scalb*, if you prefer that interface. You can find the appropriate bit length for the mantissa on the host machine through the macros FLT_MANT_DIG, DBL_MANT_DIG and LDBL_MANT_DIG (in float.h). [See note 1]
Coding floating point data transfer properly is a good way to start to understand floating point representations, which is definitely worthwhile. But if you just want to transmit floating point numbers over the wire and you don't have some idiosyncratic processor to support, I'd suggest just sending the raw bits of the float or double as a 4-byte or 8-byte integer (in whatever byte order you've selected as standard), and restricting yourself to IEEE-754 32- and 64-bit representations.
Notes:
Implementation hint: frexp returns a mantissa between 0.5 and 1.0, but what you really want is an integer, so you should scale the mantissa by the correct power of 2 and subtract that from the binary exponent returned by frexp. The result is not really precision-dependent as long as you can transmit arbitrary precision integers, so you don't need to distinguish between float, double, or some other binary representation.
Run them through htonl (and vice versa), not htons.
These two functions, htonf and ntohf, are OK as far as they go (i.e., not very far at all), but their names are misleading. They produce a fixed-point 32-bit representation with 31 bits of that split up as: 15 bits of integer, 16 bits of fraction. The remaining bit holds the sign. This value is in the host's internal representation. (You could do the htonl etc right in the functions themselves, to fix this.)
Note that any float whose absolute value reaches or exceeds 32768, or is less than 2-16 (.0000152587890625), will be wrecked in the process of "network-izing", since those do not fit in a 15.16 format.
(Edit to add: It's better to use a packaged network-izer. Even something as old as the Sun RPC XDR routines will encode floating-point properly.)

Does C have a Quantization function?

I have a buffer with many positive 16bit values (which are stored as doubles) that I would like to quantize to 8bit (0-255 values).
According to Wikipedia the process would be:
Normalize 16 bit values. I.e. find the largest and divide with this.
Use the Q(x) formula with M=8.
So I wonder, if C have a function that can do this quantization, or does anyone know of a C implementation that I could use?
Lots of love,
Louise
Assuming the value d is in the interval [0.0, max]:
unsigned char quantize(double d, double max)
{
return (unsigned char)((d / max) * 255.0);
}
I'm not sure what you mean by "16-bit values;" double precision values are 64-bit on any system using IEEE-754. However, if you have values of another numeric type, the process is effectively the same.
This sounds like waveform audio processing, where your input is 16 PCM data, your output is 8 bit PCM data and you are using doubles as an intermediate value.
However, 8 bit PCM wave data is NOT just quantized, the representation is unsigned values in excess 128 notation. (like the way exponents are stored in floating point numbers)
Finding the largest value first would not only be quantizing, but also scaling. So in pseudo code
double dMax = max_of_all_values(); //
...
foreach (dValue in array_of_doubles)
{
signed char bValue = (signed char)((dValue / dMax)*127.0);
}
You could round rather than truncating if you want more accuracy, but in audio processing, it's generally better to randomize the truncation order or even shape it by essentially running a filtering algorithm as part of the truncation from doubles to signed chars.
Note: that signed char is NOT correct if the output is 8 bit PCM data, but since the questions doesn't specifically request that, I left it out.
Edit: if this is to be used as pixel data, then you want unsigned values. I see James already gave the correct answer for unsigned values when the input is unsigned (dB values from normalized data should be all negative, actually)
It is not clear from your question what the encoding is since "positive 16bit values (which are stored as doubles)" makes no real sense; they are either 16 bit or they are double, they cannot be both.
However assuming that this is 16 bit unsigned data normalised to 1.0 (so the values range from 0.0 <= s <= 1.0), then all you need to do to expand them to 8bit integer values is to multiply each sample by 255.
unsigned char s8 = s * 255 ;
If the range is not 0.0 <= s <= 1.0, but 0.0 <= s <= max then:
unsigned char s8 = s / max * 255 ;
Either way, there is no "quantisation" function other than one you might write yourself; but the necessary transform will no doubt be a simple arithmetic expression (although not so simple if the data is companded perhaps - i.e. μ-lay or A-law encoded for example).

Resources