Bit representation for floats?

Bit representation for floats? - c

I know how to convert a float into it's binary representation using % 2 and / 2, but is there a shortcut or cleaner way of doing this? Is what I am doing even considered representing a float bitwise? Because I am supposed to be using bitwise comparison between two float numbers, but I'm not sure if that means using bitwise operations.
For example to obtain the binary representation for a number I'd store the resultant of a number like 10 % 2 into an array until the number reached 0 within a while loop and if the array were to be printed backwards it would represent the number in binary.
array[] = num % 2;
num = num / 2;
What I did was use the method above for two float numbers, loaded them up with their own individual arrays, and compared them both through their arrays.
I have them set up in IEEE floating point format within their arrays as well.
EDIT: I have to compare two numbers of type float by using bitwise comparison and operations to see if one number is greater, less than, or if they are equal with the floats represented in biased exponent notation. The specifics are that it tests whether a floating point number number1 is less than, equal to or greater than another floating point number number2, by simply comparing their floating point representations bitwise by using bitwise comparisons from left to right, stopping as soon as the first differing bit is encountered.

No, it won't. Dividing a float by 2 will result in half of the number like this:
#include <stdio.h>
int main(void)
{
float x = 5.0f;
float y = x / 2;
printf("%f\n", y);
}
Result:
2.50000
see? It has nothing to do with bits.
Binary representation of floating numbers consists of mantissa, exponent and a sign bit, which means that unlike for normal integers, the tricks you've mentioned won't apply here. You can learn more about this by reading an article on Wikipedia on IEEE floating points.
To make sure two floats have exactly the same bit configurations, you could compare their content using memcmp which compares things byte-by-byte, with no additional casts/arithmetic/whatever:
#include <stdio.h>
int main(void)
{
float x = 5.0f;
float y = 4.99999999999999f; //gets rounded up to 5.0f
float z = 4.9f;
printf("%d\n", memcmp(&x, &y, sizeof(float)) == 0);
printf("%d\n", memcmp(&x, &z, sizeof(float)) == 0);
}
...will print 1 and 0 respectively. You can also inspect the individual bits this way (e.g. by operating on a *(char*)&x.

This compares two IEEE 32-bit floats bit by bit, returning -1, 0, or 1, and also indicating the bit at which they differ. They can be compared as sign-and-magnitude numbers. The function float_comp below first compares them bit-by-bit as uint32_t and negates the comparison if they differ in the sign bit (bit 31).
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
static int float_comp(float f1, float f2, int *bit)
{
const uint32_t *a, *b;
int comp = 0;
a = (const uint32_t *)(const void *)&f1;
b = (const uint32_t *)(const void *)&f2;
for (*bit = 31; *bit >= 0; (*bit)--) {
if ((*a & (UINT32_C(1) << *bit))
&& !(*b & (UINT32_C(1) << *bit))) {
comp = 1;
break;
}
if (!(*a & (UINT32_C(1) << *bit))
&& (*b & (UINT32_C(1) << *bit))) {
comp = -1;
break;
}
}
if (*bit == 31)
comp = -comp; /* sign and magnitude conversion */
return comp;
}
int main(int argc, char **argv)
{
float f1, f2;
int comp, bit;
if (argc != 3) {
fprintf(stderr, "usage: %s: float1 float2\n", argv[0]);
return 2;
}
f1 = strtof(argv[1], NULL);
f2 = strtof(argv[2], NULL);
comp = float_comp(f1, f2, &bit);
if (comp == 0)
printf("%.8g = %.8g\n", f1, f2);
else if (comp < 0)
printf("%.8g < %.8g (differ at bit %d)\n", f1, f2, bit);
else
printf("%.8g > %.8g (differ at bit %d)\n", f1, f2, bit);
return 0;
}

Doing what you said will not give you the bits of floating point representation. Instead use union to convert between float and integer representations and print bits as usual:
#include <stdio.h>
#include <stdint.h>
typedef union {
uint32_t i;
float f;
} float_conv_t;
void
int_to_bin_print(uint32_t number)
{
char binaryNumber[33];
int i;
for (i = 31; i >= 0; --i)
{
binaryNumber[i] = (number & 1) ? '1' : '0';
number >>= 1;
}
binaryNumber[32] = '\0';
fprintf(stdout, "Number %s\n", binaryNumber);
}
int main(void) {
float_conv_t f;
f.f = 10.34;
int_to_bin_print(f.i);
f.f = -10.34;
int_to_bin_print(f.i);
f.f = 0.1;
int_to_bin_print(f.i);
f.f = 0.2;
int_to_bin_print(f.i);
return 0;
}
Output:
Number 01000001001001010111000010100100
Number 11000001001001010111000010100100
Number 00111101110011001100110011001101
My goal is to compare two floating point numbers by comparing their
floating point representations bitwise.
Then you can compare raw memory using memcmp:
float f1 = 0.1;
float f2 = 0.2;
if (memcmp(&f1, &f2, sizeof(float)) == 0)
// equal
SYNOPSIS
#include
int memcmp(const void *s1, const void *s2, size_t n);
DESCRIPTION
The memcmp() function compares the first n bytes (each interpreted as unsigned char) of the memory areas s1 and s2.
RETURN VALUE
The memcmp() function returns an integer less than, equal to, or greater than zero if the first n bytes of s1 is found,
respectively, to
be less than, to match, or be greater than the first n bytes of s2.

Related

Why does left-shifting an integer by 24-bit yield the wrong result?

I tried left-shifting a 32-bit integer by 24:
char *int_to_bin(int num) {
int i = 0;
static char bin[64];
while (num != 0) {
bin[i] = num % 2 + 48;
num /= 2;
i++;
}
bin[i] = '\0';
return (bin);
}
int main() {
int number = 255;
printf("number: %s\n", int_to_bin(number));
printf("shifted number: %s\n", int_to_bin(number << 24));
return 0;
}
OUTPUT:
number: 11111111
shifted number: 000000000000000000000000/
and i left-shift with 23-bit it yields this result:
0000000000000000000000011111111
Well Why is it like that and what's the matter with '/' at the end of the wrong result?

Two things:
If number has the value 255 then number << 24 has the numerical value 4278190080, which overflows a 32-bit signed integer whose largest possible value is 2147483647. Signed integer overflow is undefined behavior in C, so the result could be anything at all.
What probably happens in this case is that the result of the shift is negative. When num is negative then num % 2 may take the value -1, so you store character 47 in the string, which is /.
Bit shifting math is usually better to do with unsigned types, where overflow is well-defined (it wraps around and bits just shift off the left and vanish) and num % 2 can only be 0 or 1. (Or write num & 1 instead.)
Your int_to_bin routine puts the least-significant bits at the beginning of the string (on the left), so the result is backwards from the way people usually write numbers (with the least-significant bits on the right). You may want to rewrite it.

Shift works fine, you simply print it from the wrong direction.
char *int_to_bin(char *buff, int num)
{
unsigned mask = 1U << (CHAR_BIT * sizeof(num) - 1);
char *wrk = buff;
for(; mask; mask >>= 1)
{
*wrk++ = '0' + !!((unsigned)num & mask);
}
*wrk = 0;
return buff;
}
int main()
{
char buff[CHAR_BIT * sizeof(int) + 1];
int number = 255;
printf("number: %s\n", int_to_bin(buff, number));
printf("shifted number: %s\n", int_to_bin(buff, number << 24));
return 0;
}
Shifting signed integers left is OK, but the right shift is implementation-defined. Many systems use arithmetic shift right and the result is not the same as using the bitwise one:
https://godbolt.org/z/e7f3shxd4

you are storing numbers backwards
you are using signed int32 while shifting by 23 results needs more than 32 bits to handle that operation ...you should use long long int
signed integer can lead to wrong answers as 1<<31 is -1 which results in bad characters in string
finally using unsigned long long int with storing numbers in correct order will produce correct string
you should try re write code on your own before seeing this improved version of your code
#include<stdio.h>
#include<stdlib.h>
char *int_to_bin( unsigned long long int num) {
int i = 0;
static char bin[65];
while (i != 64) {
bin[63-i] = num % 2 + 48;
num /= 2;
i++;
}
bin[64] = '\0';
return (bin);
}
int main() {
unsigned long long int number = 255;
printf("number 1: %s\n", int_to_bin(number));
printf("number 2: %s\n", int_to_bin(number << 24));
return 0;
}

Rotation of binary number in C

I have an issue i can't solve. The code below should get a number from the user and a number of rotations. The code should calculate the number after the rotations. For negative number of rotations the code should rotate the number left and for positive number of rotation the code should rotate the number right.
For example: for the input x=1010111011111011
my_rotate(x, -3) will return 0111010111011111
my_rotate(x, 3) will return 0111011111011101
Here is the code i wrote so far:
#include <stdio.h>
unsigned short my_rotate(unsigned short, char);
int main()
{
unsigned short num, res;
char rotations;
printf("\nPlease enter a number and number of rotations\n");
scanf("%hu %d", &num, &rotations);
res = my_rotate(num, rotations);
return 0;
}
unsigned short my_rotate(unsigned short a, char b)
{
unsigned short bitsNum = sizeof(unsigned short) * 8;
unsigned short temp1, temp2, result;
if(b == 0)
return a;
else if(b < 0)
{
temp1 = a << (bitsNum + b);
temp2 = a >> (-b);
result = temp1 + temp2;
}
else /* b > 0 */
{
temp1 = (a >> (bitsNum - (unsigned short)b));
temp2 = (a << (unsigned short)b);
result = temp1 + temp2;
}
return result;
}
I always get 0 as a result and i don't know why. What's wrong with my code?

in main :
unsigned short num, res;
char rotations;
printf("\nPlease enter a number and number of rotations\n");
scanf("%hu %d", &num, &rotations);
the last argument of scanf must be a pointer to an int (format is %d) but you give the address of a char, the behavior is undefined. Use an int for rotations for the format %d
In my_rotate b is a char and you do if(b < 0), the result depends if the char are signed or not, type n with signed char if you expect a char to be signed
If rotations is an int and b a signed char :
44795 (1010111011111011) and -3 produce 30175 being 111010111011111
44795 (1010111011111011) and 3 produce 30685 being 111011111011101
as you expected.
Note for me an unsigned short is on 16 bits, of course the result is not the same if short are on a different number of bit.

#bruno well explained a problem with input.
A rotation count may exceed +/- bitsNum, so a good first step is to limit the rotation count.
unsigned short my_rotate(unsigned short a, int b) {
unsigned short bitsNum = sizeof(unsigned short) * 8;
//add
b %= bitsNum;
....
Highly portable code would not use bitsNum as that is derived by the size of unsigned short (and assumes 8 bits/char) and an unsigned short could have padding bits. Certainly this is more of a rare machine concern. Code should derive the bit width based on USHRT_MAX instead.

Program to count the number of bits set in c

I have tried to count the number of bits set in an integer value in c.
But for some values it is showing the correct bit set count and for some values it is not.
PFB program code
int main()
{
int a=512,i=0,j=1,count=0,k=0;
for(i=0;i<31;i++)
{
if(k=a&j)
{
count++;
j=j<<1;
}
}
printf("the total bit set countis %d",count);
}
The output of set bit value count of 512 is showing as zero and if the value used is 511 count is showing as 9.
Please help me to correct the program.

Stanford University has a page of different ways to implement common bit-twiddling operations. They list 5 different algorithms to count the bits set, all with C examples.
https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
Their simplest implementation:
unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
for (c = 0; v; v >>= 1)
{
c += v & 1;
}

If you're using gcc/clang compiler, you can use the builtin function __builtin_popcount
unsigned int user_input = 100
int count = __builtin_popcount(n); // count == 3
When I'm not looking for cross-platform I'll use this function since its highly optimised.

Generally you would count bits in an unsigned integer. The reason being that you're usually checking for bits set in a register or a mask, for example. Signed integers are represented using twos-compliment and I can't think why you'd want to count set bits in a signed integer (would be interested why if you definitely do want this).
Note in C that right or left shifting a signed integer is implementation defined behaviour if the number is negative. From C standard sectn 6.5.7:
... The result of E1 << E2 is E1 left-shifted E2 bit positions; ... If E1
has a signed type and nonnegative value, and E1 << E2 is representable
in the result type, then that is the resulting value; otherwise, the
behavior is undefined.
The result of E1 >> E2 is E1 right-shifted E2
bit positions. ... If E1 has a signed type and a negative value, the
resulting value is implementation-defined ...
If you want to count 1's in an arbitrary sized unsigned integer you could use this example:
#include <stdio.h>
int main(void) {
unsigned int value = 1234;
unsigned int ones = 0;
while(value > 0) {
ones += value & 0x1;
value >>= 1;
}
printf("#Ones = %u", ones);
}
Using this example value could be unsigned char, unsigned long, whatever unsigned integer type...
Note: Do not shift signed values or floats/doubles.

You can use the division / and the modulo % operator to check the bits that are set in an integer.
int main()
{
int a = 512, count = 0;
while(a != 0)
{
if(a % 2 == 1)
{
count++;
}
a /= 2;
}
printf("The total bit set is %d", count);
}

You have a couple of mistakes:
for(i=0;i<32;i++) // <<< this should be 32, not 31
{
if(k=a&j)
{
count++;
}
j=j<<1; // <<< this needs to be outside the if block
}
Note that instead of using a hard-coded value of 32 for the no of bits in an int, it would be better to do it like this:
for(i=0;i<sizeof(int)*CHAR_BIT;i++)
This way the code will still work if the size of an int is e.g. 16 bits or 64 bits.

Although this is not C strictly speaking, you can use inline assembly to call the POPCNT x86 operation:
// GCC syntax
unsigned a = 1234;
unsigned int count;
__asm__(
" POPCNT %0, %1\n"
:"=r" (count)
:"r" (a)
);
return count;
According to this benchmark, calling __builtin_popcount as in idok's answer is just as fast as the above code and they both are much faster than any other C implementation. You can also check the linked repo for other solutions as well.

You are checking the value of a&j , and if a&j is 0, then you do nothing else but try again.
Your j-bitshift needs to be outside the if-then.

#include<stdio.h>
#include<conio.h>
int rem, binary = 0;
unsigned int
countSetBits (unsigned int n){
unsigned int count = 0;
while (n){
count += n & 1;
n >>= 1;
}
printf ("\n\t Number of 1's in the binary number is : %d",count);
}
int dec_bin (int n){
int i=1;
while (n != 0){
rem = n % 2;
n = n / 2;
binary = binary + (rem * i);
i = i * 10;
}
printf("\n\t The converted Binary Equivalent is : %d",binary);
}
int main(){
int i = 0;
printf ("\n\t Enter the Decimal Nummber: ");
scanf ("%d", &i);
int n= i;
dec_bin(n);
countSetBits (i);
return 0;
}

Getting the exponent from a floating point in C

I'm writing a function that will get the exponent of a floating point number (IEEE 754 standard) but for some reason when I use the right shift bitwise operator on the number it returns 0
Here is the function
int get_exp (int x)
{
return ( ((x >> 21) & 255) -127 );
}
I'm passing it 7.23 so the output should be 2, for some reason the (x >> 21) part returns 0 when it should actually be returning 129. The 255 is the mask I'm using to and (&) with the exponent part of the floating point number.

I'm guessing you're doing some kind of casting hocus-pocus to pass floating point as ints? I would use float frexpf (float x, int* exp); as defined in <math.h>.
#include <math.h>
int get_exp(float x)
{
int exp;
frexpf(x, &exp);
return exp;
}
It's guaranteed to work regardless of the sizes of the floating point types.
If you want to roll it yourself, you can adapt this code.
#define EXPONENT_BIAS (-127)
int get_exp(float f)
{
int i;
union {
// Set here, then use s or c to extract
float f;
// This may or may not work for you
struct {
unsigned int sign: 1;
unsigned int exponent: 8;
unsigned int mantissa: 23;
} s;
// For debugging purposes
unsigned char c[sizeof(float)];
} u;
// Assign, you might need to reverse the bytes!
u.f = f;
// You'll probably need this to figure out the field widths
for (i = 0; i < sizeof(float); i++)
fprintf(stderr, "%02x%s", u.c[i], (i + 1 < sizeof(float))? " ": "\n");
// Just return the exponent
return (int)u.s.exponent + EXPONENT_BIAS;
}
This will bite you if sizeof(float) != 4, or if you switch endian-ness.

Main issue is the passing of int rather than float and using 21 vs 23. #dbush
IEEE 754 standard (binary32) has a number of corner cases: Inifinty, NaN, sub-normal including zero. So additional code is needed to cope with them.
Assuming proper endian:
int get_exp(float x) {
assert(sizeof x == sizeof(uint32_t));
union {
float x;
uint32_t u32;
} u = { x };
#define EXPOSHIFT 23
#define EXPOMASK 255
#define EXPOBIAS 127
if (x == 0.0) return 0;
int expo = (int) (u.u32 >> EXPOSHIFT) & EXPOMASK;
if (expo == EXPOMASK) return INT_MAX; // x is infinity or NaN
if (expo == 0) return get_exp(x * (1L << EXPOSHIFT)) - EXPOSHIFT;
return expo - EXPOBIAS;
}

Working under the assumption that a float is 32 bit and is laid out as specified here, you have three issues:
Your function needs to accept a float.
You need to point a uint32_t to the address of the float so it sees the same bytes, then perform actions against the dereferenced pointer.
The exponent starts at the 24th (23 if you start from 0) bit, not the 22nd (21 if you start with 0), so you have to shift by 23.
#include <stdio.h>
#include <stdint.h>
int get_exp (float x)
{
uint32_t *i = (uint32_t *)&x;
return ( ((*i >> 23) & 255) -127 );
}
int main()
{
printf("exp=%d\n",get_exp(7.23));
}
Result:
exp=2

Should performance not be an issue, simply iterate:
int expof(float f) {
int expo = 0;
if (f < 0.0) f = -f;
while (f < 0.5f) {
f *= 2.0f;
expo--;
}
while (f >= 1.0f) {
f *= 0.5f;
expo++;
}
return expo;
}
Does not depend on any particular float implementation other than the exponent fits in int. It use no external functions as commented here.
Same result as from int expo; frexpf(f, &expo); return expo

The parameter list show
int x
and you pass a floating point number. Try to substitute with
float x

Divide by power of 2 resulting in float

I find myself needing to compute 16-bit unsigned integer divided by power of 2, which should result in a 32-bit float (standard IEEE format). This is on embedded system and the routine is repeatedly used so I am looking for something better than (float)x/(float)(1<<n). In addition, C compiler is pretty limited (no math lib, bit field, reinterpret_cast, etc).

If you don't mind some bit twiddling then the obvious way to go is to convert the integer to float and then subtract n from the exponent bits to achieve the division by 2^n:
y = (float)x; // convert to float
uint32_t yi = *(uint32_t *)&y); // get float value as bits
uint32_t exponent = yi & 0x7f800000; // extract exponent bits 30..23
exponent -= (n << 23); // subtract n from exponent
yi = yi & ~0x7f800000 | exponent; // insert modified exponent back into bits 30..23
y = *(float *)&yi; // copy bits back to float
Note that this fails for x = 0, so you should check x > 0 before conversion.
Total cost is one int-float conversion plus a handful of integer bitwise/arithmetic operations. If you use a union you can avoid having separate int/float representations and just work directly on the float.

Use ldexpf(x, -n). This function is defined by the C standard to do exactly what you want, return x•2-n, so any decent compiler will provide good code for this. (This requires either part of a math library or a compiler that optimizes this to inline code.)
If n is known at compile time, you can also consider x * (1.f/(1<<n)). A good compiler will compute (1.f/(1<<n)) at compile time, so the executable code will be two operations: Convert x to float and multiply by a constant. That might be faster than the code generated for ldexpf(x, -n) if the compiler does not optimize ldexpf(x, -n) as well as it might.

A quick and easy solution is to precompute a table of float values of 2-n for n >= 0 (what's the upper limit for n, around 31?) and then multiply x by the nth element of the table.
This may not be the fastest if your code emulates floating point multiplication because the CPU doesn't support it directly.
You may, however, do it quicker using integer math only.
Example (assuming IEEE-754 32-bit floats):
#include <limits.h>
#include <string.h>
#include <stdio.h>
#define C_ASSERT(expr) extern char CAssertExtern[(expr)?1:-1]
C_ASSERT(CHAR_BIT == 8);
C_ASSERT(sizeof(float) == 4);
C_ASSERT(sizeof(int) == 4);
float div(int x, unsigned n)
{
float res;
unsigned e = 0;
unsigned sign = x < 0;
unsigned m = sign ? -x : x;
if (m)
{
while (m >= (1u << 24))
m >>= 1, e++;
while (m < (1u << 23))
m <<= 1, e--;
e += 0x7F + 23;
e -= n; // divide by 1<<n
m ^= 1u << 23; // reset the implicit 1
m |= (e & 0xFF) << 23; // mix in the exponent
m |= sign << 31; // mix in the sign
}
memcpy(&res, &m, sizeof m);
return res;
}
void Print4Bytes(unsigned char buf[4])
{
printf("%02X%02X%02X%02X ", buf[3], buf[2], buf[1], buf[0]);
}
int main(void)
{
int x = 0x35AA53;
int n;
for (n = 0; n < 31; n++)
{
float v1 = (float)x/(1u << n);
float v2 = div(x, n);
Print4Bytes((void*)&v1);
printf("%c= ", "!="[memcmp(&v1, &v2, sizeof v1) == 0]);
Print4Bytes((void*)&v2);
printf("%14.6f %14.6f\n", v1, v2);
}
return 0;
}
Output (ideone):
4A56A94C == 4A56A94C 3517011.000000 3517011.000000
49D6A94C == 49D6A94C 1758505.500000 1758505.500000
4956A94C == 4956A94C 879252.750000 879252.750000
48D6A94C == 48D6A94C 439626.375000 439626.375000
4856A94C == 4856A94C 219813.187500 219813.187500
47D6A94C == 47D6A94C 109906.593750 109906.593750
4756A94C == 4756A94C 54953.296875 54953.296875
46D6A94C == 46D6A94C 27476.648438 27476.648438
4656A94C == 4656A94C 13738.324219 13738.324219
45D6A94C == 45D6A94C 6869.162109 6869.162109
4556A94C == 4556A94C 3434.581055 3434.581055
44D6A94C == 44D6A94C 1717.290527 1717.290527
4456A94C == 4456A94C 858.645264 858.645264
43D6A94C == 43D6A94C 429.322632 429.322632
4356A94C == 4356A94C 214.661316 214.661316
42D6A94C == 42D6A94C 107.330658 107.330658
4256A94C == 4256A94C 53.665329 53.665329
41D6A94C == 41D6A94C 26.832664 26.832664
4156A94C == 4156A94C 13.416332 13.416332
40D6A94C == 40D6A94C 6.708166 6.708166
4056A94C == 4056A94C 3.354083 3.354083
3FD6A94C == 3FD6A94C 1.677042 1.677042
3F56A94C == 3F56A94C 0.838521 0.838521
3ED6A94C == 3ED6A94C 0.419260 0.419260
3E56A94C == 3E56A94C 0.209630 0.209630
3DD6A94C == 3DD6A94C 0.104815 0.104815
3D56A94C == 3D56A94C 0.052408 0.052408
3CD6A94C == 3CD6A94C 0.026204 0.026204
3C56A94C == 3C56A94C 0.013102 0.013102
3BD6A94C == 3BD6A94C 0.006551 0.006551
3B56A94C == 3B56A94C 0.003275 0.003275

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Bit representation for floats? - c

Related

Why does left-shifting an integer by 24-bit yield the wrong result?

Rotation of binary number in C

Program to count the number of bits set in c

Getting the exponent from a floating point in C

Divide by power of 2 resulting in float

Categories

Resources