Converting floating point to unsigned int while preserving order - c

I have found a lot of answers on SO focusing on converting float to int.
I am manipulating only positive floating point values.
One simple method I have been using is this:
unsigned int float2ui(float arg0) {
float f = arg0;
unsigned int r = *(unsigned int*)&f;
return r;
}
The above code works well yet it fails to preserve the numeric order.
By order I mean this:
float f1 ...;
float f2 ...;
assert( ( (f1 >= f2) && (float2ui(f1) >= float2ui(f2)) ) ||
( (f1 < f2) && (float2ui(f1) < vfloat2ui(f2)) ));
I have tried to use unions with the same results.
Any idea?
I use Homebrew gcc 5.3.0.

The code you're using, as writen, has undefind behavior. If you want to access the representation of floats semi-portably (implementation-defined, well-defined assuming IEEE 754 and that float and integer endianness match), you should do:
uint32_t float2ui(float f){
uint32_t r;
memcpy(&r, &f, sizeof r);
return r;
}
For non-negative values, this mapping between floating point values and representation is order-preserving. If you think you're seeing it fail to preserve order, we'll need to see exactly what values you think are a counterexample.

If f1 and f2 are floating points, and f1 <= f2, and (int)f1 and (int)f2 are valid conversions, then (int)f1 <= (int)f2.
In other words, a truncation to an integral type never swaps an order round.
You could replace float2ui with simply (int)arg0, having checked the float is in the bounds of an int.
Note that the behaviour of float to int and float to unsigned is undefined if the truncated float value is out of the range for the type.
Your current code - somehow intrepreting the float memory as int memory - has undefined behaviour. Even type-punning through a union will give you implementation defined results; note in particular that sizeof(int) isn't necessarily the same as sizeof(float).
If you are using an IEEE754 single-precision float, a 32 bit 2's complement int with no trap representation, a positive value for conversion, consistent endianness, and some allowances for the various patterns represented by NaN and +-Inf, then the transformation effected by a type pun is order preserving.

Extracting the bits from a float using a union should work. There is some discussion if the c standard actually supports this. But whatever the standard says, gcc seems to support it. And I would expect there is too much existing code that demands it, for the compilers to remove support.
There are some things you must be aware of when putting a float in an int and keeping order.
Funny values like nan does not have any order to keep
floats are stored as magnitude and sign bit, while ints are twos compliment
(assuming a sane architecture). So for negative values, you must flip all the
bits except the sign bit
If float and int does not have the same endianess on your architecture, you
must also convert the endianess
Here is my implementation, tested with gcc (Gentoo 6.4.0-r1 p1.3) 6.4.0 on x64
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
union ff_t
{
float f;
unsigned char a[4];
int i;
};
int same_endianess = 0;
void
swap_endianess(union ff_t *ff)
{
if (!same_endianess)
{
unsigned char tmp;
tmp = ff->a[0];
ff->a[0] = ff->a[3];
ff->a[3] = tmp;
tmp = ff->a[1];
ff->a[1] = ff->a[2];
ff->a[2] = tmp;
}
}
void
test_endianess()
{
union ff_t ff = { ff.f = 1 };
if (ff.i == 0x3f800000)
same_endianess = 1;
else if (ff.i == 0x803f)
same_endianess = 0;
else
{
fprintf(stderr, "Architecture has some weird endianess");
exit(1);
}
}
float
random_float()
{
float f = random();
f -= RAND_MAX/2;
return f;
}
int
f2i(float f)
{
union ff_t ff = { .f = f };
swap_endianess(&ff);
if (ff.i >= 0)
return ff.i;
return ff.i ^ 0x3fffffff;
}
float
i2f(int i)
{
union ff_t ff;
if (i >= 0)
ff.i = i;
else
ff.i = i ^ 0x3fffffff;
swap_endianess(&ff);
return ff.f;
}
int
main()
{
/* Test if floats and ints uses the same endianess */
test_endianess();
for (int n = 0; n < 10000; n++)
{
float f1 = random_float();
int i1 = f2i(f1);
float f2 = random_float();
int i2 = f2i(f2);
printf("\n");
printf("0x%08x, %f\n", i1, f1);
printf("0x%08x, %f\n", i2, f2);
assert ( f1 == i2f(i1));
assert ( f2 == i2f(i2));
assert ( (f1 <= f2) == (i1 <= i2));
}
}

Related

How would I produce an integer from a float in the sense of removing the decimal point, despite floating-point precision errors?

In C, how can I produce, for example 314159 from 3.14159 or 11 from 1.1 floats? I may not use #include at all, and I am not allowed to use library functions. It must be completely cross platform, and fit in a single function.
I tried this:
while (Number-(int)Number) {
Number *= 10;
}
and this:
Number *= 10e6;
and floating-point precision errors get in my way. How can I do this? How can I accurately transform all digits in a float into an integer?
In response to a comment, they are a float argument to a function:
char *FloatToString(char *Dest, float Number, register unsigned char Base) {
if (Base < 2 || Base > 36 || !Dest) {
return (char *)0;
}
char *const RDest = Dest;
if (Number < 0) {
Number = -Number;
*Dest = '-';
Dest++;
}
register unsigned char WholeDigits = 1;
for (register unsigned int T = (int)Number/Base; T; T /= Base) {
WholeDigits++;
}
Dest[WholeDigits] = '.';
// I need to now effectively "delete" the decimal point to further process it. Don't answer how to convert a float to a string, answer the title.
return RDest;
}
The essential problem you have is that floating point numbers can't represent your example numbers, so your input is always going to be slightly different. So if you accurately produce output, it will be different from what you expect as the input numbers are different from what you think they are.
If you don't have to worry about very large numbers, you can do this most easily by converting to a long:
v = v - (long)v; // remove the integer part
int frac = (int)(v * 100000);
will give you the 5 digits after the decimal point. The problem with this is that it give undefined behavior if the initial value is too large to be converted to a long. You might also want to be rounding differently (converting to int truncates towards zero) -- if you want the closest value rather than the leading 5 digits of the fraction, you can use (int)(v * 100000 + (v > 0 ? 0.5 : -0.5))
New version :
#include <stdio.h>
int main()
{
double x;
int i;
char s[10];
x = 9999.12504;
x = (x-(int)x);
sprintf(s,"%0.5g\n",x);
sscanf((s+2),"%d",&i);
printf("%d",i);
return 0;
}
Old version
#include <stdio.h>
int main()
{
float x;
int i;
x = -3.14159;
x = (x-(int)x);
if (x>=0)
i = 100000*x;
else
i = -100000*x;
printf("%d",i);
return 0;
}
#include <stdio.h>
#include <stdint.h>
#include <limits.h>
int main(void) {
double t = 0.12;
unsigned long x = 0;
t = (t<0)? -t : t; // To handle negative numbers.
for(t = t-(int)t; x < ULONG_MAX/10; t = 10*t-(int)(10*t))
{
x = 10*x+(int)(10*t);
}
printf("%lu\n", x);
return 0;
}
Output:
11999999999999999644
I feel like you should use modulo to get the decimal portion, convert it to a string, count the number of characters, and use that to multiply your remainder before casting it to an int.

IEEE 754 to decimal in C language

I'm looking the best way to transform a float number to its decimal representation in C. I'll try to give you an example: the user introduces a number in IEEE754 (1 1111111 10101...) and the program has to return the decimal representation (ex. 25.6)
I've tried with masks, and bitwise operations, but I haven't got any logical result.
I believe the following is performing the operation you describe:
I use the int as an intermediate representation because it has the same number of bits as the float (on my machine), and it allowed easy conversion from the binary string.
#include <stdio.h>
union {
int i;
float f;
} myunion;
int binstr2int(char *s)
{
int rc;
for (rc = 0; '\0' != *s; s++) {
if ('1' == *s) {
rc = (rc * 2) + 1;
} else if ('0' == *s) {
rc *= 2;
}
}
return rc;
}
int main(void) {
// the input binary string (4 bytes)
char * input = "11000000110110011001100110011010";
float *output;
// convert to int, sizeof(int) == sizeof(float) == 4
int converted = binstr2int(input);
// strat 1: point memory of float at the int
output = (float*)&converted; // cast to suppress warning
printf("%f\n", *output); // -6.8
// strat 2: use a union to share memory
myunion.i = converted;
printf("%f\n", myunion.f); // -6.8
return 0;
}
As #DanielKamilKozar points out, the correct type for that int is uint32_t. However, that would require including <stdint.h>.

Two's complement and loss of information in C

I want do the two's complement of a float data.
unsigned long Temperature ;
Temperature = (~(unsigned long)(564.48))+1;
But the problem is that the cast loses information, 564 instead of 564.48.
Can i do the two's complement without a loss of information?
That is a very weird thing to do; floating-point numbers are not stored as 2s complement, so it doesn't make a lot of sense.
Anyway, you can perhaps use the good old union trick:
union {
float real;
unsigned long integer;
} tmp = { 564.48 };
tmp.integer = ~tmp.integer + 1;
printf("I got %f\n", tmp.real);
When I tried it (on ideone) it printed:
I got -0.007412
Note that this relies on unspecified behavior, so it's possible it might break if your compiler does not implement the access in the most straight-forward manner. This is distinct form undefined behavior (which would make the code invalid), but still not optimal. Someone did tell me that newer standards make it clearer, but I've not found an exact reference so ... consider yourself warned.
You can't use ~ over floats (it must be an integer type):
#include <stdio.h>
void print_binary(size_t const size, void const * const ptr)
{
unsigned char *b = (unsigned char *) ptr;
unsigned char byte;
int i, j;
for (i = size - 1; i >= 0; i--) {
for (j = 7; j >= 0; j--) {
byte = b[i] & (1 << j);
byte >>= j;
printf("%u", byte);
}
}
printf("\n");
}
int main(void)
{
float f = 564.48f;
char *p = (char *)&f;
size_t i;
print_binary(sizeof(f), &f);
for (i = 0; i < sizeof(float); i++) {
p[i] = ~p[i];
}
print_binary(sizeof(f), &f);
f += 1.f;
return 0;
}
Output:
01000100000011010001111010111000
10111011111100101110000101000111
Of course print_binary is there for test the result, remove it, and (as pointed out by #barakmanos) print_binary assumes little endian, the rest of the code is not affected by endiannes:
#include <stdio.h>
int main(void)
{
float f = 564.48f;
char *p = (char *)&f;
size_t i;
for (i = 0; i < sizeof(float); i++) {
p[i] = ~p[i];
}
f += 1.f;
return 0;
}
Casting a floating-point value to an integer value changes the "bit contents" of that value.
In order to perform two's complement on the "bit contents" of a floating-point value:
float f = 564.48f;
unsigned long Temperature = ~*(unsigned long*)&f+1;
Make sure that sizeof(long) == sizeof(float), or use double instead of float.

How would you count the number of bits set in a floating point number?

How do you count the number of bits set in a floating point number using C functions?
#include <stdio.h> /* for printf() */
#include <limits.h> /* for CHAR_BIT */
int main(void) {
/* union method */
{
/* a union can only be initialized for the first option in the union */
union { float f; char cs[sizeof(float)]; } const focs = { 1.0 };
int j,k;
int count = 0;
for (j = 0; j < sizeof(float); j++)
{
char const byte = focs.cs[j];
for (k = 0; k < CHAR_BIT; k++)
{
if ((1 << k) & byte)
{
count++;
}
}
}
printf("count(%2.1f) = %d\n", focs.f, count);
}
/* cast method */
{
float const f = 2.5;
int j,k;
int count = 0;
for (j = 0; j < sizeof(float); j++)
{
char const byte = ((char *)&f)[j];
for (k = 0; k < CHAR_BIT; k++)
{
if ((1 << k) & byte)
{
count++;
}
}
}
printf("count(%2.1f) = %d\n", f, count);
}
return 0;
}
If you want to work on the actual bitwise representation of a floating point number, you should do something like this:
float f; /* whatever your float is */
int i = *(int *)&f;
What this does is take the address of f with the address-of operator, &. This address is of type float *, a pointer to a float. Then it recasts it with (int *), which says "pretend this pointer doesn't point to a float anymore, but now it points to an int". Note that it doesn't change the value at f at all. Then the last * (or first, since we read right-to-left) dereferences this pointer, which is a pointer to an int, and therefore returns an int, a.k.a. the integer with the same bitwise representation as the float.
To do the opposite (convert and int i back to a float f), do the opposite:
f = *(float *)&i;
Unless I am mistaken, this operation is undefined by the C standard, but will probably work on most computers and compilers. It is undefined because I believe the actual floating-point representation of numbers is implementation-dependent, and can be left to the CPU or the compiler, and therefore the value of i is almost impossible to predict after this operation (same goes for the value of f in the reverse operation). It is famously used in John Carmack's inverse square root function for the same nefarious purpose.
Anyway, if you're doing this in real code, you should probably stop and think twice about what you're trying to do and why you're using floats to do it. However, if you're just doing this out of curiosity, or you have thought about these and are sure of your design and methods, go for it.
I'm led to believe that you already know how to count the number of bits set in a regular integer, as this is a much easier task. If you don't know, your compiler (or the C language, I don't even know) may have a function to count bits, or you could use something from the wonderful Bit-Twiddling Hacks website, which has ways to do things like this with bitwise operations (which should be pretty fast).
A nice function for counting set bits in an integer mentioned by the first answer:
int NumberOfSetBits(int i)
{
i = i - ((i >> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
return ((i + (i >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
}
To use it on your float you would do something like this:
//...
float f;
//...
int numBitsOfF = NumberOfSetBits(*(int*) &f);
You mean the bits set in the IEEE-754 single precision representation of a number? If so, cast it to int (both float and int are 32bit wide) and do a regular bit count: SO question #109023.
The following function will find the number of bits in a 32-bit number. Just type case your float with integer and call this function by a cast
float f=3.14f;
count_bits(*(int *)&f);
int count_bits(int v)
{
// count the number of bits set in v
int c; // c accumulates the total bits set in v
int b=v;
for (c = 0; v; c++)
{
v &= v - 1; // clear the least significant bit set
}
//printf("No of bits in %d is %d\n",b,c);
return c;
}

What is the fastest way to test if a double number is integer (in modern intel X86 processors)

Our server application does a lot of integer tests in a hot code path, currently we use the following function:
inline int IsInteger(double n)
{
return n-floor(n) < 1e-8
}
This function is very hot in our workload, so I want it to be as fast as possible. I also want to eliminate the "floor" library call if I can. Any suggestions?
Here are a couple of answers:
#include <stdint.h>
#include <stdio.h>
#include <math.h>
int IsInteger1(double n)
{
union
{
uint64_t i;
double d;
} u;
u.d = n;
int exponent = ((u.i >> 52) & 0x7FF) - 1023;
uint64_t mantissa = (u.i & 0x000FFFFFFFFFFFFFllu);
return n == 0.0 ||
exponent >= 52 ||
(exponent >= 0 && (mantissa << (12 + exponent)) == 0);
}
int IsInteger2(double n)
{
return n - (double)(int)n == 0.0;
}
int IsInteger3(double n)
{
return n - floor(n) == 0.0;
}
And a test harness:
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
int IsInteger1(double);
int IsInteger2(double);
int IsInteger3(double);
#define TIMEIT(expr, N) \
gettimeofday(&start, NULL); \
for(i = 0; i < N; i++) \
{ \
expr; \
} \
gettimeofday(&end, NULL); \
printf("%s: %f\n", #expr, (end.tv_sec - start.tv_sec) + 0.000001 * (end.tv_usec - start.tv_usec))
int main(int argc, char **argv)
{
const int N = 100000000;
struct timeval start, end;
int i;
double d = strtod(argv[1], NULL);
printf("d=%lf %d %d %d\n", d, IsInteger(d), IsInteger2(d), IsInteger3(d));
TIMEIT((void)0, N);
TIMEIT(IsInteger1(d), N);
TIMEIT(IsInteger2(d), N);
TIMEIT(IsInteger3(d), N);
return 0;
}
Compile as:
gcc isinteger.c -O3 -c -o isinteger.o
gcc main.c isinteger.o -o isinteger
My results, on an Intel Core Duo:
$ ./isinteger 12345
d=12345.000000 1 1 1
(void)0: 0.357215
IsInteger1(d): 2.017716
IsInteger2(d): 1.158590
IsInteger3(d): 2.746216
Conclusion: the bit twiddling isn't as fast as I might have guessed. The extra branches are probably what kills it, even though it avoids floating-point operations. FPUs are fast enough these days that doing a double-to-int conversion or a floor really isn't that slow.
A while back I ran a bunch of timings on the most efficient way to convert between floats and integers, and wrote them up. I also timed techniques for rounding.
The short story for you is: converting from a float to an int, or using union hacks, is unlikely to be an improvement due to a CPU hazard called a load-hit-store -- unless the floats are coming from RAM and not a register.
Because it is an intrinsic, abs(floor(x)-eps) is probably the fastest solution. But because this is all very sensitive to the particular architecture of your CPU -- depending on very sensitive things like pipeline depth and store forwarding -- you'll need to time a variety of solutions to find one that is really optimal.
If doubles on your machine are IEEE-754 compliant, this union describes the double's layout.
union
{
double d;
struct
{
int sign :1
int exponent :11
int mantissa :52
};
} double_breakdown;
This will tell you if the double represents an integer.
Disclaimer 1: I'm saying integer, and not int, as a double can represent numbers that are integers but whose magnitudes are too great to store in an int.
Disclaimer 2: Doubles will hold the closest possible value that they can to any real number. So this can only possibly return whether the represented digits form an integer. Extremely large doubles, for example, are always integers because they don't have enough significant digits to represent any fractional value.
bool is_integer( double d )
{
const int exponent_offset = 1023;
const int mantissa_bits = 52;
double_breakdown *db = &d;
// See if exponent is too large to hold a decimal value.
if ( db->exponent >= exponent_offset + mantissa_bits )
return true; // d can't represent non-integers
// See if exponent is too small to hold a magnitude greater than 1.0.
if ( db->exponent <= exponent_offset )
return false; // d can't represent integers
// Return whether any mantissa bits below the decimal point are set.
return ( db->mantissa << db->exponent - exponent_offset == 0 );
}
If you really want to get hackish, see the IEEE 754 spec. Floating point numbers are implemented as a significand and an exponent. I'm not sure exactly how to do it, but you could probably do something like:
union {
float f;
unsigned int i;
}
This would get you bitwise access to both the significand and exponent. Then you could bit-twiddle your way around.
Another alternative:
inline int IsInteger(double n)
{
double dummy;
return modf(n, &dummy) == 0.0;
}

Resources