How to store 10 bit float value in C [closed]

How to store 10 bit float value in C [closed] - c

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I have a radar device. Various data coming out from device like detected object speed, acceleration distance etc. But the data is in 10 bit and 13 bit float values. How to print that 10 bit and 13 bit float value and how to store that values ? As floats having 32 bit value. I tried to store it in float variable directly but it gives wrong values.

For the 10-bit case, input consists of:
Bit 15: sign
Bits 10-14: exponent (exponent = (bits 10-14) + 15)
Bits 0-9: signficant (1.(bits 0-9))
Something like this should work, although note I haven't tested it:
int halfFloatToInt16(unsigned int input, int shift)
{
int exp, output;
// Load precision: (1.(precision bits 0-9))
output = (int)(0x03FF & input);
output += 0x0400;
// Apply sign (bit 15)
output = (0x8000 & input)?(-output):output;
// Calculate exponent (bits 10 - 14)
// Adjustment -25 = -15 for exponent and -10 for significand
exp = (int)((0x001F) & (input >> 10));
exp -= 25;
// Apply shift to acheive desired fixed point precision
exp += shift;
// Shift output
if(exp > 0)
{
return(output << exp);
}
else
{
return(output >> exp);
}
}
Here the input will be the 16-bit floating point value, as specified above, casted to unsigned int. The shift identifies the shift that's been applied to the output. So the output of the function will be the value times two to the power specified by shift. For example if the expected maximum output value is 1 you would use a shift of 14. So 16384 represents one. If the maximum expected value of the output is 20000, you make the shift zero. This will optimize the overall precision of the output.

Related

Convert 2's complement to integer and calculate rms value

A similar question has been asked at Need fastest way to convert 2's complement to decimal in C, but I couldn't use it to get my answer, so posting this...
I have 32-bit data coming from an audio sensor in the following format:-
The Data Format is I2S, 24-bit, 2’s compliment, MSB first. The data precision is 18 bits; unused bits are zeros.
Without any audio input, I am able to read the following data from the sensor:-
0xFA578000
0xFA8AC000
0xFA85C000
0xFA828000
0xFA800000
0xFA7E4000
0xFA7D0000
0xFA7BC000
and so on...
I need to use these data samples to calculate their RMS value, then further use this RMS value to calculate the decibels (20 * log(rms)).
Here is my code with comments:-
//I have 32-bits, with data in the most-significant 24 bits.
inputVal &= 0xFFFFFF00; //Mask the least significant 8 bits.
inputVal = inputVal >> 8; //Data is shifted to least 24 bits. 24th bit is the sign bit.
inputVal &= 0x00FFFFC0; //Mask the least 6 bits, since data precision is 18 bits.
//So, I have got 24-bit data with masked 6 lsb bits. 24th bit is sign bit.
//Converting from 2's complement.
const int negative = (inputVal & (1 << 23)) != 0;
int nativeInt;
if (negative)
nativeInt = inputVal | ~((1 << 24) - 1);
else
nativeInt = inputVal;
return (nativeInt * nativeInt); //Returning the squared value to calculate RMS
After this, I take the average of sum of squared values and calculate its root to get the RMS value.
My questions are,
Am I doing the data bit-manipulations correctly?
Is it necessary to convert the data samples from 2's complement to integer to calculate their RMS values?
***********************************************Part-2*****************************************************
Continuing further with #Johnny Johansson's answer:-
It looks like all your sample values are close to -6800, so I assume that is an offset that you need to account for.
To normalize the sample set, I have calculated the mean value of the sample set and subtracted it from each value in the sample set.
Then, I found the maximum and minimum values form the sample set and calculated the peak-to-peak value.
// I have the sample set, get the mean
float meanval = 0;
for (int i=0; i <actualNumberOfSamples ; i++)
{
meanval += samples[i];
}
meanval /= actualNumberOfSamples;
printf("Average is: %f\n", meanval);
// subtract it from all samples to get a 'normalized' output
for (int i = 0; i < actualNumberOfSamples; i++)
{
samples[i] -= meanval;
}
// find the 'peak to peak' max
float minsample = 100000;
float maxsample = -100000;
float peakToPeakMax = 0.0;
for (int i = 0; i < actualNumberOfSamples; i++)
{
minsample = fmin(minsample, samples[i]);
maxsample = fmax(maxsample, samples[i]);
}
peakToPeakMax = (maxsample - minsample);
printf("The peak-to-peak maximum value is: %f\n", peakToPeakMax);
(This does not include the RMS part, which comes after you have correct signed integer values)
Now, I calculate the rms value by dividing the peak-to-peak value by square-root of 2.
Then, 20 * log10(rms) gives me the corresponding decibel value.
rmsValue = peak2peakValue / sqrt2;
DB_Val = 20 * log10(rmsValue);
Does the above code take care of the " offset " that you mentioned?
I am yet to find a test plan to verify the calculated decibels, but have I mathematically calculated the decibel value correctly?

The 2'complement part seems like it should work, but it is unnecessarily complicated, since regular integers are represented using 2'complement (unless you are on some very exotic hardware). You could simply do this instead:
signed int signedInputVal = (signed int)inputVal;
signedInputVal >>= 14;
This will give you a value in the range -(2^17) to (2^17-1).
It looks like all your sample values are close to -6800, so I assume that is an offset that you need to account for.
(This does not include the RMS part, which comes after you have correct signed integer values)

C program to find required number of bits for a given number [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am new at programming and trying to understand the following program.
This program gets the minimum number of bits needed to store an integer as a number.
#include <stdio.h>
/*function declaration
* name : countBit
* Desc : to get bits to store an int number
* Parameter : int
* return : int
*/
int countBit(int);
int main()
{
int num;
printf("Enter an integer number :");
scanf("%d",&num);
printf("Total number of bits required = %d\n",countBit(num));
return 0;
}
int countBit(int n)
{
int count=0,i;
if(n==0) return 0;
for(i=0; i< 32; i++)
{
if( (1 << i) & n)
count=i;
}
return ++count;
}
Can you please explain how the if( (1 << i) & n) condition works?

To begin you should read up on Bitwise Operators.
for(i=0; i< 32; i++)
{
// Check if the bit at position i is set to 1
if( (1 << i) & n)
count=i;
}
In plain english, this is checking what the highest position of all "set" bits is.
This program gets the minimum number of bits needed to store an integer as a number.
Getting the position of the largest "set" bit will tell us how many bits we need to store that number. If we used a lesser amount of bits, then we would be reducing our maximum possible number to below our desired integer.

"<<" and "&" are bitwise operators, that manipulate a given (usually unsigned integer) variable's bits. You can read more about such operators here. In your case,
1<<i
is the number whose binary representation is 1 followed by i-1 zeroes (and preceded only by zeroes as well). Overall, the check
(1<<i)&n
evaluates to true if the i-th bit of the variable n is 1, and false otherwise, and therefore the loop finds out what is the leftmost bit which is 1 in the given number.

Its very simple if you understand bitwise operators.
Shift operator: <<is a left shift operator, which shifts a value by designated bits. In C, x << 1, will shift x by 1-bit.
Lets just consider 8-bit values for now and lets say x is 100 in decimal, 0x64 in Hexadecimal numbering system and binary representation of the same would be 0110 0100
Using the shift operator, lets shift this value 1-bit. So,
0 1 1 0 0 1 0 0
becomes
0 1 1 0 0 1 0 0 0
^ ^
Discarded Padded
as the last (right extreme) bit will be padded with a 0.
The number becomes, 0xC8, which is 200 in decimal numbering system, which is double the previous value!
The same goes for a >> operator, try it yourselves if you haven't. Result should be the half, except for when you try to 0x01 :-)
As a side note, when you'll grow up and start looking at the way shell/console is used by developers, you'll understand that > has a different purpose.
The & operator: Firstly, && and & is different. First one is a logical operator and the latter one is a bitwise operator.
Lets pick a number again, 100.
In logical and operation, the end result is always true or false. For example, 0x64 && 0x64 will result in a true condition, all other combinations will result in a false result.
But, the bitwise and operation, is used this way: Is the ith bit of 0x64 set? If yes, results in true, else results in false.
The if statement:
if( (1 << i) & n)
is doing just that. For every iteration of loop, it left shifts 1 by i bits, and then checks if the ith bit of n is set, results in true if set, else results in false.
Programmers usually use a macro for this, which makes it more readable.
#define CHECK_BIT(value, position) ((value) & (1 << position))

Single floating point decimal to binary in C [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I am trying to write C program that initializes a variable of type float to 1000.565300 and extracts relevant fields of the IEEE 754 representation. I should extract the sign bit (bit 31), the exponent field (bits 30 to 23) and the significant field (bits 22 to 0). I should use bit masking and shift operations to extract these fields. My program should keep the extracted fields in 32-bit unsigned integers and should print their values in the hexadecimal and binary formats. And here is my program. I do not know how to bit masking

Well, one easy way to do all this is:
Interpret a float bits as an unsigned: uint32_t num = *(uint32_t*)&value
It means: I want to treat the address of value as the address of a 32 bit unsigned and then I want to take the value stored at that address
Sign: int sign = (~(~0u>>1)&num) ? -1 : 1 //checks if first bit of float is 1 or 0 , if it's 1 then it's a negative number
exp part: uint32_t exp = num&0x7F800000
mantissa : uint32_t mant = num&0x007FFFFF
If you don't know masks :
0x7F800000 : 0 11111111 00000000000000000000000
0x007FFFFF : 0 00000000 11111111111111111111111
As for printing bits , you can use this function:
void printbits(uint32_t num)
{
for(uint32_t m=~(~0u>>1);m;m>>=1) // initial m = 100..0 then 0100..0 and so on
putchar(n&m ? '1' : '0');
putchar('\n');
}

writing an 8 bit checksum in C [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I am having trouble writing an algorithm for a 1byte / 8 bit checksum.
Obviously with 8bits over a decimal value of 255 the Most significant bits have to wrap around. I think I am doing it correctly.
Here is the code...
#include <stdio.h>
int main(void)
{
int check_sum = 0; //checksum
int lcheck_sum = 0; //left checksum bits
int rcheck_sum = 0; //right checksum bits
short int mask = 0x00FF; // 16 bit mask
//Create the frame - sequence number (S) and checksum 1 byte
int c;
//calculate the checksum
for (c = 0; c < length; c++)
{
check_sum = (int)buf[c] + check_sum;
printf("\n Check Sum %d ", check_sum); //debug
}
printf("\nfinal Check Sum %d", check_sum); //debug
//Take checksum and make it a 8 bit checksum
if (check_sum > 255) //if greater than 8 bits then encode bits
{
lcheck_sum = check_sum;
lcheck_sum >> 8; //shift 8 bits to the right
rcheck_sum = check_sum & mask;
check_sum = lcheck_sum + rcheck_sum;
}
//Take the complement
check_sum = ~check_sum;
//Truncate - to get rid of the 8 bits to the right and keep the 8 LSB's
check_sum = check_sum & mask;
printf("\nTruncated and complemented final Check Sum %d\n",check_sum);
return 0;
}

Short answer: you are not doing it correctly, even if the algorithm would be as your code implies (which is unlikely).
Standard warning: Do not use int if your variable might wrap (undefined behaviour) or you want to right-shift potentially negative values (implementation defined). OTOH, for unsigned types, wrapping and shifting behaviour is well defined by the standard.
Further note: Use stdint.h types if you need a specific bit-size! The built-in standard types are not guaranteed (including char) to provide such.
Normally an 8 bit checksum of an 8 bit buffer is calculated as follows:
#include <stdint.h>
uint8_t chksum8(const unsigned char *buff, size_t len)
{
unsigned int sum; // nothing gained in using smaller types!
for ( sum = 0 ; len != 0 ; len-- )
sum += *(buff++); // parenthesis not required!
return (uint8_t)sum;
}
It is not clear what you are doing with all the typecasts or shifts; uint8_t as being guaranteed the smallest (unsigned) type, the upper bits are guaranteed to be "cut off".
Just compare this and your code and you should be able to see if your code will work.
Also note that there is not the single checksum algorithm. I did not invert the result in my code, nor did I fold upper and lower bytes as you did (the latter is pretty uncommon, as it does not add much more protection).
So, you have to verify the algorithm to use. If that really requires to fold the two bytes of a 16 bit result, change sum to uint16_t` and fold the bytes as follows:
uint16_t sum;
...
// replace return with:
while ( sum > 0xFFU )
sum = (sum & 0xFFU) + ((sum >> 8) & 0xFFU);
return sum;
This cares about any overflow from adding the two bytes of sum (the loop could also be unrolled, as the overflow can only occur once).
Sometimes, CRC algorithms are called "checksum", but these are actually a very different beast (mathematically, they are the remainder of a binary polynomial division) and require much more processing (either at run-time, or to generate a lookup-table). OTOH, CRCs provide a much better detection of data corruption - but not to manipulation.

16bit Float Multiplication in C

I'm working on a small project, where I need float multiplication with 16bit floats (half precision). Unhappily, I'm facing some problems with the algorithm:
Example Output
1 * 5 = 5
2 * 5 = 10
3 * 5 = 14.5
4 * 5 = 20
5 * 5 = 24.5
100 * 4 = 100
100 * 5 = 482
The Source Code
const int bits = 16;
const int exponent_length = 5;
const int fraction_length = 10;
const int bias = pow(2, exponent_length - 1) - 1;
const int exponent_mask = ((1 << 5) - 1) << fraction_length;
const int fraction_mask = (1 << fraction_length) - 1;
const int hidden_bit = (1 << 10); // Was 1 << 11 before update 1
int float_mul(int f1, int f2) {
int res_exp = 0;
int res_frac = 0;
int result = 0;
int exp1 = (f1 & exponent_mask) >> fraction_length;
int exp2 = (f2 & exponent_mask) >> fraction_length;
int frac1 = (f1 & fraction_mask) | hidden_bit;
int frac2 = (f2 & fraction_mask) | hidden_bit;
// Add exponents
res_exp = exp1 + exp2 - bias; // Remove double bias
// Multiply significants
res_frac = frac1 * frac2; // 11 bit * 11 bit → 22 bit!
// Shift 22bit int right to fit into 10 bit
if (highest_bit_pos(res_mant) == 21) {
res_mant >>= 11;
res_exp += 1;
} else {
res_mant >>= 10;
}
res_frac &= ~hidden_bit; // Remove hidden bit
// Construct float
return (res_exp << bits - exponent_length - 1) | res_frac;
}
By the way: I'm storing the floats in ints, because I'll try to port this code to some kind of Assembler w/o float point operations later.
The Question
Why does the code work for some values only? Did I forget some normalization or similar? Or does it work only by accident?
Disclaimer: I'm not a CompSci student, it's a leisure project ;)
Update #1
Thanks to the comment by Eric Postpischil I noticed one problem with the code: the hidden_bit flag was off by one (should be 1 << 10). With that change, I don't get decimal places any more, but still some calculations are off (e.g. 3•3=20). I assume, it's the res_frac shift as descibred in the answers.
Update #2
The second problem with the code was indeed the res_frac shifting. After update #1 I got wrong results when having 22 bit results of frac1 * frac2. I've updated the code above with a the corrected shift statement. Thanks to all for every comment and answer! :)

From a cursory look:
No attempt is made to determine the location of the high bit in the product. Two 11-bit numbers, each their high bit set, may produce a 21- or 22-bit number. (Example with two-bit numbers: 102•102 is 1002, three bits, but 112•112 is 10012, four bits.)
The result is truncated instead of rounded.
Signs are ignored.
Subnormal numbers are not handled, on input or output.
11 is hardcoded as a shift amount in one place. This is likely incorrect; the correct amount will depend on how the significand is handled for normalization and rounding.
In decoding, the exponent field is shifted right by fraction_length. In encoding, it is shifted left by bits - exponent_length - 1. To avoid bugs, the same expression should be used in both places.
From a more detailed look by chux:
res_frac = frac1 * frac2 fails if int is less than 23 bits (22 for the product and one for the sign).

This is more a suggestion for how to make it easier to get your code right, rather than analysis of what is wrong with the existing code.
There are a number of steps that are common to some or all of the floating point arithmetic operations. I suggest extracting each into a function that can be written with focus on one issue, and tested separately. Then when you come to write e.g. multiplication, you only have to deal with the specifics of that operation.
All the operations will be easier working with a structure that has the actual signed exponent, and the full significand in a wider unsigned integer field. If you were dealing with signed numbers, it would also have a boolean for the sign bit.
Here are some sample operations that could be separate functions, at least until you get it working:
unpack: Take a 16 bit float and extract the exponent and significand into a struct.
pack: Undo unpack - deal with dropping the hidden bit, applying the bias the expoent, and combining them into a float.
normalize: Shift the significand and adjust the exponent to bring the most significant 1-bit to a specified bit position.
round: Apply your rounding rules to drop low significance bits. If you want to do IEEE 754 style round-to-nearest, you need a guard digit that is the most significant bit that will be dropped, and an additional bit indicating if there are any one bits of lower significance than the guard bit.

One problem is that you are truncating instead of rounding:
res_frac >>= 11; // Shift 22bit int right to fit into 10 bit
You should compute res_frac & 0x7ff first, the part of the 22-bit result that your algorithm is about to discard, and compare it to 0x400. If it is below, truncate. If it is above, round away from zero. If it is equal to 0x400, round to the even alternative.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to store 10 bit float value in C [closed] - c

Related

Convert 2's complement to integer and calculate rms value

C program to find required number of bits for a given number [closed]

Single floating point decimal to binary in C [closed]

writing an 8 bit checksum in C [closed]

16bit Float Multiplication in C

Categories

Resources