How the binary multiplication of floats can be represented in C?

How the binary multiplication of floats can be represented in C? - c

The binary multiplication algorithm with ints can be represented as follows:
unsigned int multiply(unsigned int multiplier, unsigned int multiplicand) {
unsigned int product = 0;
while (multiplier != 0) {
if ((multiplier & 1) != 0) {
product = product + multiplicand;
}
multiplier = multiplier >> 1;
multiplicand = multiplicand << 1;
}
return product;
}
This function performs the multiplication of two unsigned ints without the operator "*".
However it does not work with floats because a float is composed of three parts:
IEEE 754 single-precision binary floating-point format
These parts can be isolated as follows:
#include <stdio.h>
typedef union {
float f;
struct {
unsigned int mantisa : 23;
unsigned int exponent : 8;
unsigned int sign : 1;
} parts;
} float_cast;
int main() {
float_cast d1;
d1.f = 0.15625;
printf("sign = %x\n",d1.parts.sign);
printf("exponent = %x\n",d1.parts.exponent);
printf("mantisa = %x\n",d1.parts.mantisa);
return 0;
}
With the parts separated as ints I can manipulate the bits. But how to make a function that multiplies the parts of a float?
Thanks in advance

Multiply the mantissas as integers.
Add the exponents
Xor the signs.
There are some details.
The mantissas should both be normalized, meaning that either the high-order bit is 1 or the mantissa is 0. For full compliance you need to deal with denorms and other special cases -- infinities, NaNs, zeros -- and you may need to normalize, denorm, or overflow (set to infinity).
The product is in the range [1, 4), assuming the values were in [1, 2). If the product of the mantissas is greater than 2, some fixups are necessary: increment the exponents by 1; shift both mantissas right one.
Exponents are normally stored with an offset. Suppose the the real value of the exponent is e + m, where m is the constant offset. The m needs to be subtracted from the sum of the representations of the two exponents, in order to get the exponent of the product.

Here is my solution and Answer:
#include <stdio.h>
float multiplyfloat(float multiplier, float multiplicand) {
typedef union {
float f;
unsigned int i;
struct {
unsigned int mantissa : 23;
unsigned int exponent : 8;
unsigned int sign : 1;
} parts;
struct {
unsigned int mantissa : 23;
unsigned int b23 : 1;
unsigned int b31_24 : 8;
} parts2;
} float_cast;
float_cast product, f1, f2, m1, m2;
product.f = 0.f;
f1.f = multiplier;
f2.f = multiplicand;
m1 = f1;
m2 = f2;
m1.parts2.b23 = m2.parts2.b23 = 1;
m1.parts2.b31_24 = m2.parts2.b31_24 = 0;
while (m1.parts.mantissa) {
if (m1.parts2.b23) {
product.i += m2.i;
}
m2.i >>= 1;
m1.i <<= 1;
}
if (product.parts.exponent > 1) {
product.parts.mantissa >>= product.parts.exponent - 1;
}
product.parts.exponent += f1.parts.exponent + f2.parts.exponent - 128;
product.parts.sign = f1.parts.sign != f2.parts.sign;
return product.f;
}
int main() {
float a = 134.337368;
float b = 151.23000000001;
float res = multiplyfloat(a, b);
printf("result = %f\n", res);
printf("compare = %f\n", a * b);
system("pause");
return 1;
}
Any questions, just comment below. Thanks

Related

Converting floating point to binary

#include <stdio.h>
void printBinary(int n, int i) {
int k;
for(k = i-1;k>=0;k--){
if((n>>k)&1)
printf("1");
else
printf("0");
}
}
typedef union {
float f;
struct {
unsigned int mantissa : 23; //4
unsigned int exponent : 8; //3
unsigned int sign : 1;
}raw;
}myfloat;
void printIEEE(myfloat var){
printf("%d | ", var.raw.sign);
printBinary(var.raw.exponent,8); //3
printf(" | ");
printBinary(var.raw.mantissa, 23); //4
printf("\n");
}
int main(){
myfloat var;
var.f = -4.25;
printf("IEEE 754 represantation of %f is : \n",var.f);
printIEEE(var);
return 0;
}
I found this code from Internet. I get an error when I make some changes on it. For example, i want to change the number. I want to make 3 exponent and 4 mantissa when i change the number output is happening 0 000 0000.

Would you please try a cheat solution which share the bit pattern in the union:
#include <stdio.h>
#include <stdint.h>
union ieee754 {
uint32_t i;
float f;
};
void printBinary(uint32_t n, int i) {
uint32_t mask = 1 << (i - 1);
do putchar(n & mask ? '1' : '0');
while (mask >>= 1);
}
int main()
{
union ieee754 var;
var.f = -4.25;
printf("IEEE 754 represantation of %f is:\n", var.f);
printBinary(var.i, 32);
printf("\n");
return 0;
}
Output:
IEEE 754 represantation of -4.250000 is:
11000000100010000000000000000000
Interpretation (or verification) of the bit pattern:
11000000100010000000000000000000
sign bit ... 1
exponent ... 10000001 (= 129)
fraction ... 00010000000000000000000 (= 1/16)
decimal value = (-1)^1 * (1 + 1/16) * 2^(129 - 127))
= -4.250000

Bit representation for floats?

I know how to convert a float into it's binary representation using % 2 and / 2, but is there a shortcut or cleaner way of doing this? Is what I am doing even considered representing a float bitwise? Because I am supposed to be using bitwise comparison between two float numbers, but I'm not sure if that means using bitwise operations.
For example to obtain the binary representation for a number I'd store the resultant of a number like 10 % 2 into an array until the number reached 0 within a while loop and if the array were to be printed backwards it would represent the number in binary.
array[] = num % 2;
num = num / 2;
What I did was use the method above for two float numbers, loaded them up with their own individual arrays, and compared them both through their arrays.
I have them set up in IEEE floating point format within their arrays as well.
EDIT: I have to compare two numbers of type float by using bitwise comparison and operations to see if one number is greater, less than, or if they are equal with the floats represented in biased exponent notation. The specifics are that it tests whether a floating point number number1 is less than, equal to or greater than another floating point number number2, by simply comparing their floating point representations bitwise by using bitwise comparisons from left to right, stopping as soon as the first differing bit is encountered.

No, it won't. Dividing a float by 2 will result in half of the number like this:
#include <stdio.h>
int main(void)
{
float x = 5.0f;
float y = x / 2;
printf("%f\n", y);
}
Result:
2.50000
see? It has nothing to do with bits.
Binary representation of floating numbers consists of mantissa, exponent and a sign bit, which means that unlike for normal integers, the tricks you've mentioned won't apply here. You can learn more about this by reading an article on Wikipedia on IEEE floating points.
To make sure two floats have exactly the same bit configurations, you could compare their content using memcmp which compares things byte-by-byte, with no additional casts/arithmetic/whatever:
#include <stdio.h>
int main(void)
{
float x = 5.0f;
float y = 4.99999999999999f; //gets rounded up to 5.0f
float z = 4.9f;
printf("%d\n", memcmp(&x, &y, sizeof(float)) == 0);
printf("%d\n", memcmp(&x, &z, sizeof(float)) == 0);
}
...will print 1 and 0 respectively. You can also inspect the individual bits this way (e.g. by operating on a *(char*)&x.

This compares two IEEE 32-bit floats bit by bit, returning -1, 0, or 1, and also indicating the bit at which they differ. They can be compared as sign-and-magnitude numbers. The function float_comp below first compares them bit-by-bit as uint32_t and negates the comparison if they differ in the sign bit (bit 31).
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
static int float_comp(float f1, float f2, int *bit)
{
const uint32_t *a, *b;
int comp = 0;
a = (const uint32_t *)(const void *)&f1;
b = (const uint32_t *)(const void *)&f2;
for (*bit = 31; *bit >= 0; (*bit)--) {
if ((*a & (UINT32_C(1) << *bit))
&& !(*b & (UINT32_C(1) << *bit))) {
comp = 1;
break;
}
if (!(*a & (UINT32_C(1) << *bit))
&& (*b & (UINT32_C(1) << *bit))) {
comp = -1;
break;
}
}
if (*bit == 31)
comp = -comp; /* sign and magnitude conversion */
return comp;
}
int main(int argc, char **argv)
{
float f1, f2;
int comp, bit;
if (argc != 3) {
fprintf(stderr, "usage: %s: float1 float2\n", argv[0]);
return 2;
}
f1 = strtof(argv[1], NULL);
f2 = strtof(argv[2], NULL);
comp = float_comp(f1, f2, &bit);
if (comp == 0)
printf("%.8g = %.8g\n", f1, f2);
else if (comp < 0)
printf("%.8g < %.8g (differ at bit %d)\n", f1, f2, bit);
else
printf("%.8g > %.8g (differ at bit %d)\n", f1, f2, bit);
return 0;
}

Doing what you said will not give you the bits of floating point representation. Instead use union to convert between float and integer representations and print bits as usual:
#include <stdio.h>
#include <stdint.h>
typedef union {
uint32_t i;
float f;
} float_conv_t;
void
int_to_bin_print(uint32_t number)
{
char binaryNumber[33];
int i;
for (i = 31; i >= 0; --i)
{
binaryNumber[i] = (number & 1) ? '1' : '0';
number >>= 1;
}
binaryNumber[32] = '\0';
fprintf(stdout, "Number %s\n", binaryNumber);
}
int main(void) {
float_conv_t f;
f.f = 10.34;
int_to_bin_print(f.i);
f.f = -10.34;
int_to_bin_print(f.i);
f.f = 0.1;
int_to_bin_print(f.i);
f.f = 0.2;
int_to_bin_print(f.i);
return 0;
}
Output:
Number 01000001001001010111000010100100
Number 11000001001001010111000010100100
Number 00111101110011001100110011001101
My goal is to compare two floating point numbers by comparing their
floating point representations bitwise.
Then you can compare raw memory using memcmp:
float f1 = 0.1;
float f2 = 0.2;
if (memcmp(&f1, &f2, sizeof(float)) == 0)
// equal
SYNOPSIS
#include
int memcmp(const void *s1, const void *s2, size_t n);
DESCRIPTION
The memcmp() function compares the first n bytes (each interpreted as unsigned char) of the memory areas s1 and s2.
RETURN VALUE
The memcmp() function returns an integer less than, equal to, or greater than zero if the first n bytes of s1 is found,
respectively, to
be less than, to match, or be greater than the first n bytes of s2.

Getting the exponent from a floating point in C

I'm writing a function that will get the exponent of a floating point number (IEEE 754 standard) but for some reason when I use the right shift bitwise operator on the number it returns 0
Here is the function
int get_exp (int x)
{
return ( ((x >> 21) & 255) -127 );
}
I'm passing it 7.23 so the output should be 2, for some reason the (x >> 21) part returns 0 when it should actually be returning 129. The 255 is the mask I'm using to and (&) with the exponent part of the floating point number.

I'm guessing you're doing some kind of casting hocus-pocus to pass floating point as ints? I would use float frexpf (float x, int* exp); as defined in <math.h>.
#include <math.h>
int get_exp(float x)
{
int exp;
frexpf(x, &exp);
return exp;
}
It's guaranteed to work regardless of the sizes of the floating point types.
If you want to roll it yourself, you can adapt this code.
#define EXPONENT_BIAS (-127)
int get_exp(float f)
{
int i;
union {
// Set here, then use s or c to extract
float f;
// This may or may not work for you
struct {
unsigned int sign: 1;
unsigned int exponent: 8;
unsigned int mantissa: 23;
} s;
// For debugging purposes
unsigned char c[sizeof(float)];
} u;
// Assign, you might need to reverse the bytes!
u.f = f;
// You'll probably need this to figure out the field widths
for (i = 0; i < sizeof(float); i++)
fprintf(stderr, "%02x%s", u.c[i], (i + 1 < sizeof(float))? " ": "\n");
// Just return the exponent
return (int)u.s.exponent + EXPONENT_BIAS;
}
This will bite you if sizeof(float) != 4, or if you switch endian-ness.

Main issue is the passing of int rather than float and using 21 vs 23. #dbush
IEEE 754 standard (binary32) has a number of corner cases: Inifinty, NaN, sub-normal including zero. So additional code is needed to cope with them.
Assuming proper endian:
int get_exp(float x) {
assert(sizeof x == sizeof(uint32_t));
union {
float x;
uint32_t u32;
} u = { x };
#define EXPOSHIFT 23
#define EXPOMASK 255
#define EXPOBIAS 127
if (x == 0.0) return 0;
int expo = (int) (u.u32 >> EXPOSHIFT) & EXPOMASK;
if (expo == EXPOMASK) return INT_MAX; // x is infinity or NaN
if (expo == 0) return get_exp(x * (1L << EXPOSHIFT)) - EXPOSHIFT;
return expo - EXPOBIAS;
}

Working under the assumption that a float is 32 bit and is laid out as specified here, you have three issues:
Your function needs to accept a float.
You need to point a uint32_t to the address of the float so it sees the same bytes, then perform actions against the dereferenced pointer.
The exponent starts at the 24th (23 if you start from 0) bit, not the 22nd (21 if you start with 0), so you have to shift by 23.
#include <stdio.h>
#include <stdint.h>
int get_exp (float x)
{
uint32_t *i = (uint32_t *)&x;
return ( ((*i >> 23) & 255) -127 );
}
int main()
{
printf("exp=%d\n",get_exp(7.23));
}
Result:
exp=2

Should performance not be an issue, simply iterate:
int expof(float f) {
int expo = 0;
if (f < 0.0) f = -f;
while (f < 0.5f) {
f *= 2.0f;
expo--;
}
while (f >= 1.0f) {
f *= 0.5f;
expo++;
}
return expo;
}
Does not depend on any particular float implementation other than the exponent fits in int. It use no external functions as commented here.
Same result as from int expo; frexpf(f, &expo); return expo

The parameter list show
int x
and you pass a floating point number. Try to substitute with
float x

Round down float using bit operations in C

I am trying to round down a float using bit operations in C.
I start by converting the float to an unsigned int.
I think my strategy should be to get the exponent, and then zero out the bits after that, but I'm not sure how to code that. This is what I have so far:
float roundDown(float f);
unsigned int notRounded = *(unsigned int *)&f;
unsigned int copy = notRounded;
int exponent = (copy >> 23) & 0xff;
int fractional = 127 + 23 - exponent;
if(fractional > 0){
//not sure how to zero out the bits.
//Also don't know how to deal with the signed part.

Since its just for fun, and I'm not sure what the constraints are, here's a variant that DOES work for negative numbers:
float myRoundDown_1 (float v) { //only works right for positive numbers
return ((v-0.5f)+(1<<23)) - (1<<23);
}
float myRoundDown_2 (float v) { //works for all numbers
static union {
unsigned long i;
float f;
} myfloat;
unsigned long n;
myfloat.f = v;
n = myfloat.i & 0x80000000;
myfloat.i &= 0x7fffffff;
myfloat.f = myRoundDown_1(myfloat.f+(n>>31));
myfloat.i |= n;
return myfloat.f;
}

float roundDown(float f); should be float roundDown(float f) {.
unsigned int notRounded = *(unsigned int *)&f; is incompatible with modern compiler optimizations. Look up “strict aliasing”.
Here is a working function to round down to the power of two:
#include <stdio.h>
#include <assert.h>
#include <string.h>
float roundDown(float f) {
unsigned int notRounded;
assert(sizeof(int) == sizeof(float));
memcpy(&notRounded, &f, sizeof(int));
// zero out the significand (mantissa):
unsigned int rounded = notRounded & 0xFF800000;
float r;
memcpy(&r, &rounded, sizeof(int));
return r;
}
int main()
{
printf("%f %f\n", 1.33, roundDown(1.33));
printf("%f %f\n", 3.0, roundDown(3.0));
}
This should produce :
1.330000 1.000000
3.000000 2.000000

Encoding int value as an IEEE-754 float (binary32)

Given the 32 bits that represent an IEEE 754 floating-point number, how can the number be converted to an integer, using integer or bit operations on the representation (rather than using a machine instruction or compiler operation to convert)?
I have the following function but it fails in some cases:
Input: int x (contains 32 bit single precision number in IEEE 754 format)
if(x == 0) return x;
unsigned int signBit = 0;
unsigned int absX = (unsigned int)x;
if (x < 0)
{
signBit = 0x80000000u;
absX = (unsigned int)-x;
}
unsigned int exponent = 158;
while ((absX & 0x80000000) == 0)
{
exponent--;
absX <<= 1;
}
unsigned int mantissa = absX >> 8;
unsigned int result = signBit | (exponent << 23) | (mantissa & 0x7fffff);
printf("\nfor x: %x, result: %x",x,result);
return result;

C has the "union" to handle this type of view of data:
typedef union {
int i;
float f;
} u;
u u1;
u1.f = 45.6789;
/* now u1.i refers to the int version of the float */
printf("%d",u1.i);

&x gives the address of x so has float* type.
(int*)&x cast that pointer to a pointer to int ie to a int* thing.
*(int*)&x dereference that pointer into an int value. It won't do what you believe on machines where int and float have different sizes.
And there could be endianness issues.
This solution was used in the fast inverse square root algorithm.

// With the proviso that your compiler implementation uses
// the same number of bytes for an int as for a float:
// example float
float f = 1.234f;
// get address of float, cast as pointer to int, reference
int i = *((int *)&f);
// get address of int, cast as pointer to float, reference
float g = *((float *)&i);
printf("%f %f %08x\n",f,g,i);

float x = 43.133;
int y;
assert (sizeof x == sizeof y);
memcpy (&y, &x, sizeof x);
...

You can cast the float using a reference. A cast like this should never generate any code.
C++
float f = 1.0f;
int i = (int &)f;
printf("Float %f is 0x%08x\n", f, i);
Output:
Float 1.000000 is 0x3f800000
If you want c++ style cast use a reinterpret_cast, like this.
int i = reinterpret_cast<int &>(f);
It does not work with expressions, you have to store it in a variable.
int i_times_two;
float f_times_two = f * 2.0f;
i_times_two = (int &)f_times_two;
i_times_two = (int &)(f * 2.0f);
main.cpp:25:13: error: C-style cast from rvalue to reference type 'int &'

You cannot (meaningfully) convert a floating point number into an 'integer' (signed int or int) in this way.
It may end up having the integer type, but it's actually just an index into the encoding space of IEEE754, not a meaningful value in itself.
You might argue that an unsigned int serves dual purpose as a bit pattern and an integer value, but int does not.
Also there are platform issues with bit manipulation of signed ints.

Multiply float number a factor you want. In this case I multiplied with 100,000, because 5 decimals after fraction is have meaning in my operation.
Convert it to bytes and than join them and divide by 100,000 again.
double angleX, angleY;
angleX = 3.2342;
angleY = 1.34256;
printf("%f, %f", (double)angleX, (double)angleY);
int remain, j;
int TxData[8];
j=0;
remain=0;
unsigned long data = angleX*100000;
printf("\ndata : %d\n", data);
while(data>=256)
{
remain= data%256;
data = data/256;
TxData[j]= remain;
printf("\ntxData %d : %d", j, TxData[j]);
j++;
}
TxData[j] = data;
printf("\ntxData %d : %d", j, TxData[j]);
int i=0;
long int angleSon=0;
for(i=0;i<=j;i++)
{
angleSon += pow(256,i)*TxData[i];
printf("\nangleSon : %li", angleSon);
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How the binary multiplication of floats can be represented in C? - c

Related

Converting floating point to binary

Bit representation for floats?

Getting the exponent from a floating point in C

Round down float using bit operations in C

Encoding int value as an IEEE-754 float (binary32)

Categories

Resources