Converting int16 to float in C - c

How do i convert a 16 bit int to a floating point number?
I have a signed 16 bit variable which i'm told i need to display with an accuracy of 3 decimal places, so i presume this would involve a conversion to float?
I've tried the below which just copy's my 16 bits into a float but this doesn't seem right.
float myFloat = 0;
int16_t myInt = 0x3e00;
memcpy(&myFloat, &myInt, sizeof(int));
I've also read about the Half-precision floating-point format but am unsure how to handle this... if i need to.
I'm using GCC.
update:
The source of the data is a char array [2] which i get from an i2c interface. I then stitch this together into a signed int.
Can anyone help?

I have a signed 16 bit variable which i'm told i need to display with
an accuracy of 3 decimal places
If someone told you the integer value can be displayed this way he/she should start from the C begginers course.
The only possibility is that the integer value has been scaled (multiplied). For example the value of 12.456 can be stored in the integer if multiplied by 1000. If this is the case:
float flv;
int intv = 12456;
flv = (float)intv / 1000.0f;
You can also print this scaled integer without convering to float
printf("%s%d.%03d\n", intv < 0 ? "-": "", abs(intv / 1000), abs(intv % 1000));

Related

Implicit conversion of float to int and possibility of loss of value

I'm learning about data-types in C.
Our course material details as follows
When we assign different variables of different data-types, there is a
possibility of loss of value.
float f = 100.6537;
int i = f;
After execution of above code, i = 100. So correct me if I'm wrong, assigning float to int just chops of fractional value and assigns only the integral value to left of decimal point? and loss of value here being the removal of numbers after decimal point ?
But when I do,
int i = 100;
float f = i;
I think that there is no loss of value here ?
Not every int can be represented as a float. The float lacks enough "places" to represent all possible values. Remember, sizeof(int)==sizeof(float) on many machines. In IEEE-754 format you only get 24 bits of "value" in a float.
In other words:
int = snnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
float = seeeeeeeennnnnnnnnnnnnnnnnnnnnnn
Where the e part is the exponent. Note how the int has a lot more bits to represent the numerical value.
For anything that fits neatly in a 24 bit number you should be fine, but it's worth testing on your hardware to be sure.

C - how to store IEEE 754 double and single precision

I have to work with IEEE 745 double and single precision numbers.
I have no idea, how to work with them correctly.
I've got a buffer of binary data and I want to get the numbers like
uint8_t bufer[] = {................};
//data I want are at 8th position, (IEEE745_t is my imaginary format)
IEEE745double_t first8bytes = *(IEEE745double_t*)(buffer + 8);
IEEE745single_t next4bytes = *(IEEE745single_t*)(buffer + 16);
What do I put instead of IEE745double_t and IEEE745single_t ? Is it possible to do it with double and float? And if so, how can I guarantee, that they will be 8 and 4 bytes long on every platform?
First of all, you cannot do the pointer cast hack. There is absolutely no guarantee that your byte buffer is correctly aligned. This results in undefined behaviour. Use memcpy instead:
memcpy(&first8bytes, &buffer[8], 8);
memcpy(&next4bytes, &buffer[16], 4);
What do I put instead of IEE745double_t and IEEE745single_t? Is it possible to do it with double and float?
Yes, it's possible, if:
double is 8 bytes, float is 4 bytes and both use IEEE754 presentation.
Data on buffer uses same byte order as your host. If not, you need to copy to temporary unsigned integer type first, where you fix the endianness.
And if so, how can I guarantee, that they will be 8 and 4 bytes long on every platform?
Use static assertion to detect when it's not. For example:
static_assert(sizeof(float) == 4, "invalid float size");

I don't know how to convert 16 byte hexadecimal to floating point

Probably from the time I am trying to convert and wandering internet solely for the answer of this question but I could not find. I just got I can convert hexadecimal to decimal either by some serious programming or manually through math.
I am looking to convert. If there is any way to do that then please share. Well I have searched and found IEEE754 which seems not to be working or I am not comprehending it. Can I do it manually through any equation, I think I heard about it? Or a neat C program which may do it.
Please help! Any help would be highly appreciated.
You need to study the IEEE floating point spec.
This would be quite straightforward in Java. You have handy methods like Float.floatToRawIntBits(float x) and Float.intBitsToFloat(int x)
You might be able to do it with a union.
In C its a bit more hacky. You can abuse a union. Unions in C reuse the same memory for two different variables. A union like
union DoubleLong {
long l;
double d;
} u;
would allow you to treat the same bit of memory as either a long u.i or a double u.f. There are both 8 byte so they take the same space. So doing u.d = M_PI; printf("%lx\n", u.l); prints the binary representation of pi 0x400921fb54442d18.
For 16 byte we need the union to have an array or two 8 byte longs.
#include <stdio.h>
union Data {
long i[2];
long double f;
} u;
int main(int argc, char const *argv[]) {
// Using random IP6 address 2602:306:cecd:7130:5421:a679:6d71:a660
// Store in two separate 8-byte longs
u.i[0] = 0x2602306cecd7130;
u.i[1] = 0x5421a6796d71a660;
// Print out in hexidecimal
printf("%.15La %lx %lx\n", u.f,u.i[0],u.i[1]);
// print out in decimal
printf("%.15Le %ld %ld\n", u.f,u.i[0],u.i[1]);
return 0;
}
One problem is 16 byte hexadecimal floating point numbers might not be defined on you system. float is typically 32 bit - 4 byte, double is 64 bit - 8 byte. There is an long double type but on my mac its only 80-bit - 10 byte. It might be simpler to convert to two double precision numbers. So on my system only the last 4 hexadecimal digits of the second number are significant.
Not all hexadecimal numbers correspond to valid floating point numbers, a lot of values will correspond to NaN's. If the higher bits are 7FFF or FFFF (or 7FF, FFF for double) that will either give infinity of NaN.

Fixed point numbers in C without float

In C is it possible to present a fixed point number in binary form so it can be transmitted without the use floats ?
I know how to convert a float or double to the desired fixed point representation but I'm stuck when it shall be done without floating points. The problem is that the system I have to develop on has this limitation.
My idea is to create a struct which holds the full representation and a processable integer and fractional part. And after creating the struct with either only the received binary representation or the integer and fractional values there shall be a function which does the conversion.
Update:
My Question seems not to be precise enough so I'll add some details.
Within my code I have to create and receive Numbers in a certain fixed point representation. As described by the answers below this is nothing but a pointer to a sequence of bits. My problem is that i have to create this sequence of bits when sending or interpret it when receiving the information.
This conversion is my problem ignoring signdness it is quiet easy thing to do when you can use a float to convert from (code not tested, but must work like this):
float sourceValue = 12.223445;
int intPart = 0;
float fractPart = 0.0;
//integer part is easy, just cast it
intPart = (int)sourceValue;
//the fractinoal part is the rest
fractPart = sourceValue - intPart;
//multipling the fract part by the precision of the fixed point number (Q9.25)
//gets us the fractional part in the desired representation
u_int64_t factor = 1;
factor = factor << 25;
u_int64_t fractPart = fractPart * factor;
The rest can be done by some shifting and the use of logical bit operators.
But how can I do this without a float in the middle, starting with something like this:
int intPart = 12;
int fractPart = 223445;
Is it even possible ? As told, I'm kind a stuck here.
Thanks for your help!
I don't know what you are really up to, but a fixed-point number can be viewed as an integer number with a constant factor applied to it.
For example, if you want to express a number in the interval [0; 1) in 16 bits, you can map it to the range [0; 65536) by simply multiplying it with 65536.
This said, it completely depends on how your integer values look like and how they are intended to be represented. In almost any case, you can apply a multiplication or division to it and are done.
Everything boils down to bits, be it an integer, float, etc. All you need is the memory base address and the size of that certain memory. For example,
float src = 0.5;
float dest;
char bytes[sizeof(src)];
memcpy(bytes, &num, sizeof(src));
dest = *((float *)bytes);
should give you dest equal to src.
Hope this helped.

Does C have a Quantization function?

I have a buffer with many positive 16bit values (which are stored as doubles) that I would like to quantize to 8bit (0-255 values).
According to Wikipedia the process would be:
Normalize 16 bit values. I.e. find the largest and divide with this.
Use the Q(x) formula with M=8.
So I wonder, if C have a function that can do this quantization, or does anyone know of a C implementation that I could use?
Lots of love,
Louise
Assuming the value d is in the interval [0.0, max]:
unsigned char quantize(double d, double max)
{
return (unsigned char)((d / max) * 255.0);
}
I'm not sure what you mean by "16-bit values;" double precision values are 64-bit on any system using IEEE-754. However, if you have values of another numeric type, the process is effectively the same.
This sounds like waveform audio processing, where your input is 16 PCM data, your output is 8 bit PCM data and you are using doubles as an intermediate value.
However, 8 bit PCM wave data is NOT just quantized, the representation is unsigned values in excess 128 notation. (like the way exponents are stored in floating point numbers)
Finding the largest value first would not only be quantizing, but also scaling. So in pseudo code
double dMax = max_of_all_values(); //
...
foreach (dValue in array_of_doubles)
{
signed char bValue = (signed char)((dValue / dMax)*127.0);
}
You could round rather than truncating if you want more accuracy, but in audio processing, it's generally better to randomize the truncation order or even shape it by essentially running a filtering algorithm as part of the truncation from doubles to signed chars.
Note: that signed char is NOT correct if the output is 8 bit PCM data, but since the questions doesn't specifically request that, I left it out.
Edit: if this is to be used as pixel data, then you want unsigned values. I see James already gave the correct answer for unsigned values when the input is unsigned (dB values from normalized data should be all negative, actually)
It is not clear from your question what the encoding is since "positive 16bit values (which are stored as doubles)" makes no real sense; they are either 16 bit or they are double, they cannot be both.
However assuming that this is 16 bit unsigned data normalised to 1.0 (so the values range from 0.0 <= s <= 1.0), then all you need to do to expand them to 8bit integer values is to multiply each sample by 255.
unsigned char s8 = s * 255 ;
If the range is not 0.0 <= s <= 1.0, but 0.0 <= s <= max then:
unsigned char s8 = s / max * 255 ;
Either way, there is no "quantisation" function other than one you might write yourself; but the necessary transform will no doubt be a simple arithmetic expression (although not so simple if the data is companded perhaps - i.e. μ-lay or A-law encoded for example).

Resources