Does C have a Quantization function? - c

I have a buffer with many positive 16bit values (which are stored as doubles) that I would like to quantize to 8bit (0-255 values).
According to Wikipedia the process would be:
Normalize 16 bit values. I.e. find the largest and divide with this.
Use the Q(x) formula with M=8.
So I wonder, if C have a function that can do this quantization, or does anyone know of a C implementation that I could use?
Lots of love,
Louise

Assuming the value d is in the interval [0.0, max]:
unsigned char quantize(double d, double max)
{
return (unsigned char)((d / max) * 255.0);
}
I'm not sure what you mean by "16-bit values;" double precision values are 64-bit on any system using IEEE-754. However, if you have values of another numeric type, the process is effectively the same.

This sounds like waveform audio processing, where your input is 16 PCM data, your output is 8 bit PCM data and you are using doubles as an intermediate value.
However, 8 bit PCM wave data is NOT just quantized, the representation is unsigned values in excess 128 notation. (like the way exponents are stored in floating point numbers)
Finding the largest value first would not only be quantizing, but also scaling. So in pseudo code
double dMax = max_of_all_values(); //
...
foreach (dValue in array_of_doubles)
{
signed char bValue = (signed char)((dValue / dMax)*127.0);
}
You could round rather than truncating if you want more accuracy, but in audio processing, it's generally better to randomize the truncation order or even shape it by essentially running a filtering algorithm as part of the truncation from doubles to signed chars.
Note: that signed char is NOT correct if the output is 8 bit PCM data, but since the questions doesn't specifically request that, I left it out.
Edit: if this is to be used as pixel data, then you want unsigned values. I see James already gave the correct answer for unsigned values when the input is unsigned (dB values from normalized data should be all negative, actually)

It is not clear from your question what the encoding is since "positive 16bit values (which are stored as doubles)" makes no real sense; they are either 16 bit or they are double, they cannot be both.
However assuming that this is 16 bit unsigned data normalised to 1.0 (so the values range from 0.0 <= s <= 1.0), then all you need to do to expand them to 8bit integer values is to multiply each sample by 255.
unsigned char s8 = s * 255 ;
If the range is not 0.0 <= s <= 1.0, but 0.0 <= s <= max then:
unsigned char s8 = s / max * 255 ;
Either way, there is no "quantisation" function other than one you might write yourself; but the necessary transform will no doubt be a simple arithmetic expression (although not so simple if the data is companded perhaps - i.e. μ-lay or A-law encoded for example).

Related

Convert Long To Double, Unexpected Results

I am using very basic code to convert a string into a long and into a double. The CAN library I am using requires a double as an input. I am attempting to send the device ID as a double to another device on the CAN network.
If I use an input string of that is 6 bytes long the long and double values are the same. If I add a 7th byte to the string the values are slightly different.
I do not think I am hitting a max value limit. This code is run with ceedling for an automated test. The same behaviour is seen when sending this data across my CAN communications. In main.c the issue is not observed.
The test is:
void test_can_hal_get_spn_id(void){
struct dbc_id_info ret;
memset(&ret, NULL_TERMINATOR, sizeof(struct dbc_id_info));
char expected_str[8] = "smg123";
char out_str[8];
memset(&out_str, 0, 8);
uint64_t long_val = 0;
double phys = 0.0;
memcpy(&long_val, expected_str, 8);
phys = long_val;
printf("long %ld \n", long_val);
printf("phys %f \n", phys);
uint64_t temp = (uint64_t)phys;
memcpy(&out_str, &temp, 8);
printf("%s\n", expected_str);
printf("%s\n", out_str);
}
With the input = "smg123"
[test_can_hal.c]
- "long 56290670243187 "
- "phys 56290670243187.000000 "
- "smg123"
- "smg123"
With the input "smg1234"
[test_can_hal.c]
- "long 14692989459197299 "
- "phys 14692989459197300.000000 "
- "smg1234"
- "tmg1234"
Is this error just due to how floats are handled and rounded? Is there a way to test for that? Am I doing something fundamentally wrong?
Representing the char array as a double without the intermediate long solved the issue. For clarity I am using DBCPPP. I am using it in C. I should clarify my CAN library comes from NXP, DBCPPP allows my application to read a DBC file and apply the data scales and factors to my raw CAN data. DBCPPP accepts doubles for all data being encoded and returns doubles for all data being decoded.
The CAN library I am using requires a double as an input.
That sounds surprising, but if so, then why are you involving a long as an intermediary between your string and double?
If I use an input string of that is 6 bytes long the long and double values are the same. If I add a 7th byte to the string the values are slightly different.
double is a floating point data type. To be able to represent values with a wide range of magnitudes, some of its bits are used to represent scale, and the rest to represent significant digits. A typical C implementation uses doubles with 53 bits of significand. It cannot exactly represent numbers with more than 53 significant binary digits. That's enough for 6 bytes, but not enough for 7.
I do not think I am hitting a max value limit.
Not a maximum value limit. A precision limit. A 64-bit long has smaller numeric range but more significant digits than an IEEE-754 double.
So again, what role is the long supposed to be playing in your code? If the objective is to get eight bytes of arbitrary data into a double, then why not go directly there? Example:
char expected_str[8] = "smg1234";
char out_str[8] = {0};
double phys = 0.0;
memcpy(&phys, expected_str, 8);
printf("phys %.14e\n", phys);
memcpy(&out_str, &phys, 8);
printf("%s\n", expected_str);
printf("%s\n", out_str);
Do note, however, that there is some risk when (mis)using a double this way. It is possible for the data you put in to constitute a trap representation (a signaling NaN might be such a representation, for example). Handling such a value might cause a trap, or cause the data to be corrupted, or possibly produce other misbehavior. It is also possible to run into numeric issues similar to the one in your original code.
Possibly your library provides some relevant guarantees in that area. I would certainly hope so if doubles are really its sole data type for communication. Otherwise, you could consider using multiple doubles to covey data payloads larger than 53 bits, each of which you could consider loading via your original technique.
If you have a look at the IEEE-754 Wikipedia page, you'll see that the double precision values have a precision of "[a]pproximately 16 decimal digits". And that's roughly where your problem seems to appear.
Specifically, though it's a 64-bit type, it does not have the necessary encoding to provide 264 distinct floating point values. There are many bit patterns that map to the same value.
For example, NaN is encoded as the exponent field of binary 1111 1111 with non-zero fraction (23 bits) regardless of the sign (one bit). That's 2 * (223 - 1) (over 16 million) distinct values representing NaN.
So, yes, your "due to how floats are handled and rounded" comment is correct.
In terms of fixing it, you'll either have to limit your strings to values that can be represented by doubles exactly, or find a way to send the strings across the CAN bus.
For example (if you can't send strings), two 32-bit integers could represent an 8-character string value with zero chance of information loss.

Converting int16 to float in C

How do i convert a 16 bit int to a floating point number?
I have a signed 16 bit variable which i'm told i need to display with an accuracy of 3 decimal places, so i presume this would involve a conversion to float?
I've tried the below which just copy's my 16 bits into a float but this doesn't seem right.
float myFloat = 0;
int16_t myInt = 0x3e00;
memcpy(&myFloat, &myInt, sizeof(int));
I've also read about the Half-precision floating-point format but am unsure how to handle this... if i need to.
I'm using GCC.
update:
The source of the data is a char array [2] which i get from an i2c interface. I then stitch this together into a signed int.
Can anyone help?
I have a signed 16 bit variable which i'm told i need to display with
an accuracy of 3 decimal places
If someone told you the integer value can be displayed this way he/she should start from the C begginers course.
The only possibility is that the integer value has been scaled (multiplied). For example the value of 12.456 can be stored in the integer if multiplied by 1000. If this is the case:
float flv;
int intv = 12456;
flv = (float)intv / 1000.0f;
You can also print this scaled integer without convering to float
printf("%s%d.%03d\n", intv < 0 ? "-": "", abs(intv / 1000), abs(intv % 1000));

Any precision loss when converting float64 to uint64 in C? assuming only the whole number part of the data is meaningful

I have a counter field from one TCP protocol which keeps track of samples sent out. It is defined as float64. I need to translate this protocol to a different one, in which the counter is defined as uint64.
Can I safely store the float64 value in the uint64 without losing any precision on the whole number part of the data? Assuming the fractional part can be ignored.
EDIT: Sorry if I didn't describe it clearly. There is no code. I'm just looking at two different protocol documentations to see if one can be translated to the other.
Please treat float64 as double, the documentation isn't well written and is pretty old.
Many thanks.
I am assuming you are asking about 64 bit floating point values such as IEEE Binary64 as documented in https://en.wikipedia.org/wiki/Double-precision_floating-point_format .
Converting a double represented as a 64 bit IEEE floating point value to a uint64_t will not lose any precision on the integral part of the value, as long as the value itself is non negative and less than 264. But if the value is larger than 253, the representation as a double does not allow complete precision, so whatever computation led to the value probably was inaccurate anyway.
Note that the reverse is false, IEEE floats have less precision than 64 bit uint64_t, so close but different large values will convert to the same double values.
Note that a counter implemented as a 64 bit double is intrinsically limited by the precision of the floating point type. Incrementing a value larger than 253 by one is likely to have no effect at all. Using a floating point type to implement a packet counter seems a very bad idea. Using a uint64_t counter directly seem a safer bet. You only have to worry about wrap around at 264, a condition that you can check for in the unlikely case where you would actually expect to count that far.
If you cannot change the format, verify that the floating point value is within range for the conversion and store an appropriate value if it is not:
#include <stdint.h>
...
double v = get_64bit_value();
uint64_t result;
if (v < 0) {
result = 0;
} else
if (v >= (double)UINT64_MAX) {
result = UINT64_MAX;
} else {
result = (uint64_t)v;
}
Yes, precision is lost. Negative numbers cannot be converted properly to uint64 (as this type is unsigned), as well as numbers greater than 2^64-1. In all other cases, the conversion is exact (providing you look at the float64 value as exact and rounding is done correctly).

Fixed point numbers in C without float

In C is it possible to present a fixed point number in binary form so it can be transmitted without the use floats ?
I know how to convert a float or double to the desired fixed point representation but I'm stuck when it shall be done without floating points. The problem is that the system I have to develop on has this limitation.
My idea is to create a struct which holds the full representation and a processable integer and fractional part. And after creating the struct with either only the received binary representation or the integer and fractional values there shall be a function which does the conversion.
Update:
My Question seems not to be precise enough so I'll add some details.
Within my code I have to create and receive Numbers in a certain fixed point representation. As described by the answers below this is nothing but a pointer to a sequence of bits. My problem is that i have to create this sequence of bits when sending or interpret it when receiving the information.
This conversion is my problem ignoring signdness it is quiet easy thing to do when you can use a float to convert from (code not tested, but must work like this):
float sourceValue = 12.223445;
int intPart = 0;
float fractPart = 0.0;
//integer part is easy, just cast it
intPart = (int)sourceValue;
//the fractinoal part is the rest
fractPart = sourceValue - intPart;
//multipling the fract part by the precision of the fixed point number (Q9.25)
//gets us the fractional part in the desired representation
u_int64_t factor = 1;
factor = factor << 25;
u_int64_t fractPart = fractPart * factor;
The rest can be done by some shifting and the use of logical bit operators.
But how can I do this without a float in the middle, starting with something like this:
int intPart = 12;
int fractPart = 223445;
Is it even possible ? As told, I'm kind a stuck here.
Thanks for your help!
I don't know what you are really up to, but a fixed-point number can be viewed as an integer number with a constant factor applied to it.
For example, if you want to express a number in the interval [0; 1) in 16 bits, you can map it to the range [0; 65536) by simply multiplying it with 65536.
This said, it completely depends on how your integer values look like and how they are intended to be represented. In almost any case, you can apply a multiplication or division to it and are done.
Everything boils down to bits, be it an integer, float, etc. All you need is the memory base address and the size of that certain memory. For example,
float src = 0.5;
float dest;
char bytes[sizeof(src)];
memcpy(bytes, &num, sizeof(src));
dest = *((float *)bytes);
should give you dest equal to src.
Hope this helped.

Convert a 32 bits to float value

I am working on a DSP processor to implement a BFSK frequency hopping mechanism using C on a Linux system. On the receiver part of the program, I am getting an input of a set of samples which I de-modulate using Goertzel algorithm to determine whether the received bit was a 0 or 1.
Right now, I am able to detect the bits individually. But I have to return the data for processing in the form of a float array. So, I need to pack every set of 32 bits received to form a float value. Right I am doing something like :
uint32_t i,j,curBit,curBlk;
unint32_t *outData; //this is intiallized to address of some pre-defined location in DSP memory
float *output;
for(i=0; i<num_os_bits; i++) //Loop for number of data bits
{
//Demodulate the data and set curBit=0x0000 or 0x8000
curBlk=curBlk>>1;
curBlk=curBlk|curBit;
bitsCounter+=1;
if(i!=0 && (bitsCounter%32)==0) //32-bits processed, save the data in array
{
*(outData+j)=curBlk;
j+=1;
curBlk=0;
}
}
output=(float *)outData;
Now, the values of the output array are just the values of outData array with 0s after the decimal point.
example: if output[i]=12345 the `outData[i]=12345.0000'.
But while testing the program I am generating the sample test data of bits using an array of float
float data[] ={123.12456,45.789,297.0956};
So after the demodulation I am expecting the float array output to have a same values as data array.
Is there some other method to convert 32-bits of data to a float. Should I store the received bits to a char array and then convert it to float.
Not sure if I get your point - you sequentialy obtain bits and if you got 32 bit you want to make a float from them?
what about:
union conv32
{
uint32_t u32; // here_write_bits
float f32; // here_read_float
};
Which can be used like this in-line:
float f = ((union conv32){.u32 = my_uint}).f32;
Or with a helper variable:
union conv32 x = {.u32 = my_uint};
float f = x.f32;
You need to copy the bit pattern verbatim without invoking any implicit conversions. The simplest way to do that is to take the address of the data and reinterpret it as a pointer to the intermediate "carrier" type before dereferencing it.
Consider this:
float source_float = 1234.5678f ;
uint32_t transport_bits = *((uint32_t*)&source_float);
float destination_float = *((float*)&transport_bits);
The intermediate result in transport_bits is target dependent, in my test on x86, it is 0x449a522b, this being the bit representation of single precision float on that platform (floating point representation and endianness dependent).
Either way the result in destination_float is identical to source_float having been transported via the uint32_t transport_bits.
However this does require that the floating point representation of the originator is identical to that of the receiver which may not be a given. It is all a bit non-portable, The fact that your code does not work suggests perhaps that the representation does indeed differ between sender and receiver. Not only must the FP representation be identical but also the endianness. If they differ your code may have to do one or both of reorder the bits and calculate the floating point equivalent by extraction of exponent and mantissa. All so you nee to be sure that the transmission order of the bits is is the same order in which you are reassembling them. There are a number of opportunities to get this wrong.

Resources