Getting the distance b/w float and its neighbor - c

This is my function that takes in the bits of a float, and returns the distance between the next float and the given float. In it, I have assumed the float is 32 bits. My process was to extract the mantissa and exponent, increment the mantissa, increment the exponent if it overflows, reconstruct the value and subtract the distance between them.
I have a feeling I may be overcomplicating things, and the function doesn't seem to be making use of the bitwise operators used correctly, even though I have used an algorithm similar to this before. What is going wrong here? It seems straightforward enough?
unsigned int get_distance(unsigned int bitnumber)
{
unsigned int mantissa = 0xff;
for (int i = 0; i < 24; i++) {
if (((1 << i) & bitnumber) != 0) mantissa = mantissa | 1 << i;
}
mantissa = mantissa++; // increment the mantissa
unsigned int exponent = 0xff;
for (int i = 24; i < 31; i++) {
if (((1 << i) & floatbits) != 0) exponent = exponent | 1 << i;
}
if (mantissa != mantissa) exponent++; // if it overflows, increment the exponent too.
// create complete bit pattern
unsigned int final = 0xff;
for (int i = 0; i < 32; i++) {
if (i < 24) {
if (((1 << i) & mantissa) != 0) final = final | 1 << i;
}
if (i >= 24 && i < 31) {
if (((1 << i) & exponent) != 0) final = final | 1 << i;
}
if (i == 32) {
if (((1 << i) & bitnumber) != 0) final = final | 1 << i;
}
}
// get difference b/w original float and new float
unsigned int result = final - bitnumber;
return result;
}

First off, you are unnecessarily complicating the extraction of the various fields (and, you're doing so incorrectly). A much easier way to do this is:
unsigned int exponent = (bitnumber >> 23) & ((1 << 8) - 1);
unsigned int mantissa = bitnumber & ((1 << 23) - 1);
Basically, you shift bitnumber left until the LSB of your desired field is bit 0, and then bitwise-and out just the bits you want. The expression ((1 << N) - 1) is a mask consisting of N 1s in the lowest bits.
Second, the line mantissa = mantissa++; is actually undefined behavior, since the post-increment and assignment will both try to assign a value to mantissa. Instead, you just want mantissa++.
Next, the check for overflow will never pass, as mantissa != mantissa will always be false. Instead, you want to check if bit 23 or higher is set, which can be done with mantissa >= (1 << 23). I'd also recommend grouping this statement with the aforementioned increment, so you have
mantissa++;
if (mantissa >= (1 << 23))
exponent++;
Finally, reconstructing the new value is as simple as:
unsigned int final = (exponent << 23) | mantissa;
Note, however, that if exponent is too large, this will result in a different float than you expect (namely, a negative one). There are also a few other boundary cases regarding infinity/NaN and unnormalized numbers, but more than likely you won't run into them.
To actually use this, you'll need to do some pointer conversion, like this:
float value = 1.0f; // for example
unsigned int *up = (unsigned int *)&value; // NOTE: this might trigger undefined behavior
unsigned int next = next_value(*up); // next_value is a better name for your function than get_distance
float *fp = (float *)&next; // again, this may be UB
float difference = *fp - value; // this is what you want

Related

IEEE 754 Bit manipulation Rounding Error

Without using casts or functionality of libraries, I must cast an integer to a float with bit manipulation. Below is the code I am currently working on. It is based off of code that I found in Cast Integer to Float using Bit Manipulation breaks on some integers in C. The problem that I have ran into involves the rounding standards in IEEE 754. More specifically my code rounds towards 0, but it should round towards even numbers. What changes do I need to make?
unsigned inttofloat(int x) {
int bias = 127;
int man;
int exp = bias + 31; //8-bit exp
int count = 0;
int tmin = 1 << 31;
int manpattern = 0x7FFFFF;
int sign = 0;
if (x == 0){
return 0;
}
else if (x == tmin){
return 0xcf << 24;
}
if (x < 0) {
sign = tmin;
x = ~x + 1; // makes x negative so that we can accurately represent it later on.
}
while((x & tmin) == 0){
exp--;
x <<= 1;
count++;
}
exp <<= 23;
man = (x >> 8) & manpattern;
return (sign | exp | man);
}
To round toward nearest - ties to even, replace (x >> 8) with:
unsigned u = x; // avoid any potential signed shifting issues
unsigned lease_significant_bit = (u >> 8) & 1;
unsigned round_bit = (u >> 7) & 1; // Most significant bit shifted out
unsigned sticky_bit_flag = !!(u & 0x7F); // All other bits shifts out
// OP's shifted answer.
u = (u >> 8):
// round away if more than half-way or
// if at half-way and number is odd
u += (round_bit & sticky_bit_flag) | (round_bit & lease_significant_bit);
Leave it for OP to simplify
Note that u += 1 may propagate all the way through and require an exponent increase.

How to quadruple an unsigned number using bit-wise and logic operator in C

Goal:
4x ( 4.400000095 ) = 17.60000038
Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
Max ops: 30
Return bit-level equivalent of expression x + x + x + x for
floating point argument f.
My code:
unsigned 4x(unsigned uf) {
unsigned expn = (uf >> 23) & 0xFF;
unsigned sign = uf & 0x80000000;
unsigned frac = uf & 0x007FFFFF;
if (expn == 255 || (expn == 0 && frac == 0))
return uf;
if (expn) {
expn << 2;
} else if (frac == 0x7FFFFF) {
frac >> 2;
expn << 2;
} else {
frac <<= 2;
}
return (sign) | (expn << 23) | (frac);
}
As you can guess, my code does not work. Instead of quadrupling the input, the input is doubled. I don't know why since the fraction and exponent are always being right / left shifted by 2 instead of 1. Im working with single precision floating point values in 32 bit machines.
Note that
expn << 2;
does not modify expn. You probably want
expn <<= 2;
Ditto for
frac >> 2;
expn << 2;
However, as #chux pointed out, you only need to increase add 2 to the exponent, not multiply the exponent by 4.
Some untested code - leave that for OP. (GTG)
The tricky bit is dealing with sub-normal numbers that when *4 become normal. Also watch for large values that overflow to infinity. If you want to ignore sub-normals, just expn += 2 and check for overflow.
Another approach would expn += 2 for normal numbers. For sub-normals, shift the frac <<= 2 and handle cases that become normal.
Code is about 30 ops.
#include <stdint.h>
float x4(float x) {
// Use union to access the bits. Leap-of-faith here (float is 32 bits, endian)
union {
float f;
uint32_t u32;
} u;
u.f = x;
uint32_t expn = (u.u32 >> 23) & 0xFF;
uint32_t sign = u.u32 & 0x80000000;
uint32_t frac = u.u32 & 0x007FFFFF;
// Nan Inf
if (expn == 255) return u.f;
if (expn == 0) {
expn++; // Bring sub-normal into normal expo range
} else {
frac += 0x800000; // restore implied bit
}
// *4
frac <<= 2;
// normalize - this usually iterates twice, less for sub-normals
while (frac > 0xFFFFFF) {
expn++;
frac >>= 1; // 1's will not be shifted out as 2 LSB are 0 so no later rounding
}
// overflow to inf
if (expn >= 255) {
expn = 255;
frac = 0;
} else if (frac & 0x800000) {
frac ^= 0x800000; // clear implied bit
} else {
// still sub-normal
expn--; // should now be 0
}
u.u32 = sign | (expn << 23) | frac;
return u.f;
}

Calculate parity bit from a string in C

I am trying to calculate the parity bit in a string using the following code. I first calculate a parityByte for the string and then calculate
a parityBit for that byte.
From what I have gathered, these functions should do the trick, but right now I'm not so sure. The program in which I use them fails, and I would like to know if it's because of these or if I should look some other place.
char calculateParity(char *payload, int size){
char r = 0;
int i;
for(i = 0; i < size; i++){
r ^= payload[i];
}
return calcParityBit(r);
}
char calcParityBit(char x){
x ^= x >> 8;
x ^= x >> 4;
x ^= x >> 2;
x ^= x >> 1;
return x & 1;
}
With help from Bit Twiddling Hacks
char calcParityBit (unsigned char v)
{
return (0x6996u >> ((v ^ (v >> 4)) & 0xf)) & 1;
}
This is 5 operations versus 7 (after taking #squeamish ossifrage's good advice).
You must remember:
1) 'x >> a' the same thing for(int i = 0; i < a; i++) x/=2;
because, if you use operator '>>' for SIGNED type, you duplicate first bit, whitch == 1 in signed types;
2) operators '>>' and '<<' returns unsigned int value;
(Error example: unsigned char y = (x << 2) >> 2; for reset (in 0) two first bits)
As r3mainer comments: use unsigned char for the calculation. As char may be signed, the right shifting may replicate the sign bit.
Further, code typically runs best with a return value of int versus char. I recommend using a return value of int or even simply bool.
// Find parity (of any width up to the width of an unsigned)
int calcEvenParityBit(unsigned par, unsigned width) {
while (width > 1) {
par ^= par >> (width/2);
width -= width/2;
}
// Only return Least Significant Bit
return par % 2;
}
int calculateEvenParity(char *payload, int size) {
unsigned char r = 0;
int i;
for(i = 0; i < size; i++) {
r ^= payload[i];
}
return calcEvenParityBit(r, CHAR_BIT);
}
Invert the result for odd parity.
Your function:
char calcParityBit(char x){
x ^= x >> 8;
x ^= x >> 4;
x ^= x >> 2;
x ^= x >> 1;
return x & 1;
}
calculates parity for only three bits of your byte. To calculate parity of the entire 8 bits number, you can do something like this:
char calcParityBit(char x){
return ( (x>>7) ^
(x>>6) ^
(x>>5) ^
(x>>4) ^
(x>>3) ^
(x>>2) ^
(x>>1) ^
(x) ) & 1;
}
As you stick with the least significant bit, the fact that your argument is signed and the shift right operation may fill the shifted bits with '1' if the most significat bit was '1', is irrelevant for this solution (which is derived from yours)
Although it's good practice not to use number with sign if the sign is not of any actual use, and you treat the number as an unsigned one.

C Bitwise Logic Operations Puzzle

So I need to write a method to fulfill this puzzle:
float_f2i
Return bit-level equivalent of expression (int) f for floating point argument f.
Argument is passed as unsigned int, but it is to be interpreted as the bit-level
representation of a single-precision floating point value.
Anything out of range (including NaN and infinity) should return 0x80000000u.
So what I took out of this is that I'm given a number in hex and I have to write code to put it in integer format. A test case given to us was;
Argument [0x00800000], returns [0x0] because 0x00800000 is 1.1754....E-38, small enough to be returned as zero (So I am assuming)
What I have so far is:
int float_f2i(unsigned uf) {
unsigned sign = uf & (0x80000000);
unsigned exp = uf >> 23 & 0xff;
unsigned frac = uf & 0x7fffff;
//takes care of NaN and inifinity
if (exp == 255) {return 0x80000000u;}
if ((exp > 0) && (exp < 255)) //normalized
{
if ((sign >> 28) == 0x0) //if sign bit is 0
{
return (0x1); //NEEDS MORE HERE
}
else if ((sign >> 28) == 0x8) //if sign bit is 1
{
return (0x8); //NEEDS MORE HERE
}
}
else if (exp == 0)//denormalized
{
return 0; // rounds to zero anyway
}
}
I know that for this to work I have to add on the exponential part to the return statements (1.frac^(exp-127)) but I have no idea how to code that in. Shifting to the left multiplies by two but for negative exponents of 2, I would need to shift right but the >> operator does that arithmetically. Do I need to create a dynamic mask to kill off the 1 bits created by arithmetic shifting?
EDIT: Got an answer and I was going the whole wrong direction, future reference if anyone has to do this:
int float_f2i(unsigned uf) {
int exponent = (uf >> 23) & 0ff;
int exp = exponent - 127;
int frac = uf & 0x7fffff;
if(exponent == 0x7F800000)
return 0x80000000u;
if(!exponent)
return 0;
if(exp < 0)
return 0;
if(exp > 30)
return 0x80000000u;
frac = frac | 0x800000;
if (exp >= 23)
frac = frac << (exp - 23);
else
frac = frac >> (23 - exp);
if((uf >> 31) & 1)
return ~frac + 1;
return frac;
}
c only has one shift right operator >> but that is only an arithmetic shift when the number is a signed value type.
main()
{
int x = -2;
printf("%d\n",x>>1);
printf("%d\n",((unsigned int)x)>>1);
return 0;
}
Run it here
So you could cast to unsigned if you need to ensure non-arithmetic shift.

How to manually (bitwise) perform (float)x?

Now, here is the function header of the function I'm supposed to implement:
/*
* float_from_int - Return bit-level equivalent of expression (float) x
* Result is returned as unsigned int, but
* it is to be interpreted as the bit-level representation of a
* single-precision floating point values.
* Legal ops: Any integer/unsigned operations incl. ||, &&. also if, while
* Max ops: 30
* Rating: 4
*/
unsigned float_from_int(int x) {
...
}
We aren't allowed to do float operations, or any kind of casting.
Now I tried to implement the first algorithm given at this site: http://locklessinc.com/articles/i2f/
Here's my code:
unsigned float_from_int(int x) {
// grab sign bit
int xIsNegative = 0;
int absValOfX = x;
if(x < 0){
xIsNegative = 1;
absValOfX = -x;
}
// zero case
if(x == 0){
return 0;
}
if(x == 0x80000000){ //Updated to add this
return 0xcf000000;
}
//int shiftsNeeded = 0;
/*while(){
shiftsNeeded++;
}*/
unsigned I2F_MAX_BITS = 15;
unsigned I2F_MAX_INPUT = ((1 << I2F_MAX_BITS) - 1);
unsigned I2F_SHIFT = (24 - I2F_MAX_BITS);
unsigned result, i, exponent, fraction;
if ((absValOfX & I2F_MAX_INPUT) == 0)
result = 0;
else {
exponent = 126 + I2F_MAX_BITS;
fraction = (absValOfX & I2F_MAX_INPUT) << I2F_SHIFT;
i = 0;
while(i < I2F_MAX_BITS) {
if (fraction & 0x800000)
break;
else {
fraction = fraction << 1;
exponent = exponent - 1;
}
i++;
}
result = (xIsNegative << 31) | exponent << 23 | (fraction & 0x7fffff);
}
return result;
}
But it didn't work (see test error below):
ERROR: Test float_from_int(8388608[0x800000]) failed...
...Gives 0[0x0]. Should be 1258291200[0x4b000000]
I don't know where to go from here. How should I go about parsing the float from this int?
EDIT #1:
You might be able to see from my code that I also started working on this algorithm (see this site):
I assumed 10-bit, 2’s complement, integers since the mantissa is only
9 bits, but the process generalizes to more bits.
Save the sign bit of the input and take the absolute value of the input.
Shift the input left until the high order bit is set and count the number of shifts required. This forms the floating mantissa.
Form the floating exponent by subtracting the number of shifts from step 2 from the constant 137 or (0h89-(#of shifts)).
Assemble the float from the sign, mantissa, and exponent.
But, that doesn't seem right. How could I convert 0x80000000? Doesn't make sense.
EDIT #2:
I think it's because I say max bits is 15... hmmm...
EDIT #3: Screw that old algorithm, I'm starting over:
unsigned float_from_int(int x) {
// grab sign bit
int xIsNegative = 0;
int absValOfX = x;
if(x < 0){
xIsNegative = 1;
absValOfX = -x;
}
// zero case
if(x == 0){
return 0;
}
if (x == 0x80000000){
return 0xcf000000;
}
int shiftsNeeded = 0;
int counter = 0;
while(((absValOfX >> counter) & 1) != 1 && shiftsNeeded < 32){
counter++;
shiftsNeeded++;
}
unsigned exponent = shiftsNeeded + 127;
unsigned result = (xIsNegative << 31) | (exponent << 23);
return result;
Here's the error I get on this one (I think I got past the last error):
ERROR: Test float_from_int(-2139095040[0x80800000]) failed...
...Gives -889192448[0xcb000000]. Should be -822149120[0xceff0000]
May be helpful to know that:
absValOfX = 7f800000
(using printf)
EDIT #4: Ah, I'm finding the exponent wrong, need to count from the left, then subtract from 32 I believe.
EDIT #5: I started over, now trying to deal with weird rounding problems...
if (x == 0){
return 0; // 0 is a special case because it has no 1 bits
}
if (x >= 0x80000000 && x <= 0x80000040){
return 0xcf000000;
}
// Save the sign bit of the input and take the absolute value of the input.
unsigned signBit = 0;
unsigned absX = (unsigned)x;
if (x < 0)
{
signBit = 0x80000000u;
absX = (unsigned)-x;
}
// Shift the input left until the high order bit is set to form the mantissa.
// Form the floating exponent by subtracting the number of shifts from 158.
unsigned exponent = 158;
while ((absX & 0x80000000) == 0)
{
exponent--;
absX <<= 1;
}
unsigned negativeRoundUp = (absX >> 7) & 1 & (absX >> 8);
// compute mantissa
unsigned mantissa = (absX >> 8) + ((negativeRoundUp) || (!signBit & (absX >> 7) & (exponent < 156)));
printf("absX = %x, absX >> 8 = %x, exponent = %i, mantissa = %x\n", absX, (absX >> 8), exponent, mantissa);
// Assemble the float from the sign, mantissa, and exponent.
return signBit | ((exponent << 23) + (signBit & negativeRoundUp)) | ( (mantissa) & 0x7fffff);
-
absX = fe000084, absX >> 8 = fe0000, exponent = 156, mantissa = fe0000
ERROR: Test float_from_int(1065353249[0x3f800021]) failed...
...Gives 1316880384[0x4e7e0000]. Should be 1316880385[0x4e7e0001]
EDIT #6
Did it again, still, the rounding doesn't work properly. I've tried to hack together some rounding, but it just won't work...
unsigned float_from_int(int x) {
/*
If N is negative, negate it in two's complement. Set the high bit (2^31) of the result.
If N < 2^23, left shift it (multiply by 2) until it is greater or equal to.
If N ≥ 2^24, right shift it (unsigned divide by 2) until it is less.
Bitwise AND with ~2^23 (one's complement).
If it was less, subtract the number of left shifts from 150 (127+23).
If it was more, add the number of right shifts to 150.
This new number is the exponent. Left shift it by 23 and add it to the number from step 3.
*/
printf("---------------\n");
//printf("x = %i (%x), -x = %i, (%x)\n", x, x, -x, -x);
if(x == 0){
return 0;
}
if(x == 0x80000000){
return 0xcf000000;
}
// If N is negative, negate it in two's complement. Set the high bit of the result
unsigned signBit = 0;
if (x < 0){
signBit = 0x80000000;
x = -x;
}
printf("abs val of x = %i (%x)\n", x, x);
int roundTowardsZero = 0;
int lastDigitLeaving = 0;
int shiftAmount = 0;
int originalAbsX = x;
// If N < 2^23, left shift it (multiply it by 2) until it is great or equal to.
if(x < (8388608)){
while(x < (8388608)){
//printf(" minus shift and x = %i", x );
x = x << 1;
shiftAmount--;
}
} // If N >= 2^24, right shfit it (unsigned divide by 2) until it is less.
else if(x >= (16777215)){
while(x >= (16777215)){
/*if(x & 1){
roundTowardsZero = 1;
printf("zzz Got here ---");
}*/
lastDigitLeaving = (x >> 1) & 1;
//printf(" plus shift and x = %i", x);
x = x >> 1;
shiftAmount++;
}
//Round towards zero
x = (x + (lastDigitLeaving && (!(originalAbsX > 16777216) || signBit)));
printf("x = %i\n", x);
//shiftAmount = shiftAmount + roundTowardsZero;
}
printf("roundTowardsZero = %i, shiftAmount = %i (%x)\n", roundTowardsZero, shiftAmount, shiftAmount);
// Bitwise AND with 0x7fffff
x = x & 0x7fffff;
unsigned exponent = 150 + shiftAmount;
unsigned rightPlaceExponent = exponent << 23;
printf("exponent = %i, rightPlaceExponent = %x\n", exponent, rightPlaceExponent);
unsigned result = signBit | rightPlaceExponent | x;
return result;
The problem is that the lowest int is -2147483648, but the highest is 2147483647, so there is no absolute value of -2147483648. While you could work around it, I would just make a special case for that one bit pattern (like you do for 0):
if (x == 0)
return 0;
if (x == -2147483648)
return 0xcf000000;
The other problem is that you copied an algorithm that only works for numbers from 0 to 32767. Further down in the article they explain how to expand it to all ints, but it uses operations that you're likely not allowed to use.
I would recommend writing it from scratch based on the algorithm mentioned in your edit. Here's a version in C# that rounds towards 0:
uint float_from_int(int x)
{
if (x == 0)
return 0; // 0 is a special case because it has no 1 bits
// Save the sign bit of the input and take the absolute value of the input.
uint signBit = 0;
uint absX = (uint)x;
if (x < 0)
{
signBit = 0x80000000u;
absX = (uint)-x;
}
// Shift the input left until the high order bit is set to form the mantissa.
// Form the floating exponent by subtracting the number of shifts from 158.
uint exponent = 158;
while ((absX & 0x80000000) == 0)
{
exponent--;
absX <<= 1;
}
// compute mantissa
uint mantissa = absX >> 8;
// Assemble the float from the sign, mantissa, and exponent.
return signBit | (exponent << 23) | (mantissa & 0x7fffff);
}
The basic formulation of the algorithm is to determine the sign, exponent and mantissa bits, then pack the result into an integer. Breaking it down this way makes it easy to clearly separate the tasks in code and makes solving the problem (and testing your algorithm) much easier.
The sign bit is the easiest, and getting rid of it makes finding the exponent easier. You can distinguish four cases: 0, 0x80000000, [-0x7ffffff, -1], and [1, 0x7fffffff]. The first two are special cases, and you can trivially get the sign bit in the last two cases (and the absolute value of the input). If you're going to cast to unsigned, you can get away with not special-casing 0x80000000 as I mentioned in a comment.
Next up, find the exponent -- there's an easy (and costly) looping way, and a trickier but faster way to do this. My absolute favourite page for this is Sean Anderson's bit hacks page. One of the algorithms shows a very quick loop-less way to find the log2 of an integer in only seven operations.
Once you know the exponent, then finding the mantissa is easy. You just drop the leading one bit, then shift the result either left or right depending on the exponent's value.
If you use the fast log2 algorithm, you can probably end up with an algorithm which uses no more than 20 operations.
Dealing with 0x80000000 is pretty easy:
int xIsNegative = 0;
unsigned int absValOfX = x;
if (x < 0)
{
xIsNegative = 1;
absValOfX = -(unsigned int)x;
}
It gets rid of special casing -2147483648 since that value is representable as an unsigned value, and absValOfX should always be positive.

Resources