casting signed to double different result than casting to float then double - c

So as part of an assignment I am working if a expression : (double) (float) x == (double) x
returns awlays 1 or not.(x is a signed integer)
it works for every value except for INT_MAX. I was wondering why is it so? if i print the values, they both show the same value,even for INT_MAX.
x = INT_MAX ;
printf("Signed X: %d\n",x);
float fx1 = (float)x;
double dx1 = (double)x;
double dfx = (double)(float)x;
printf("(double) x: %g\n",dx1);
printf("(float) x: %f \n",fx1);
printf("(double)(float)x: %g\n",dfx);
if((double) (float) x == (double) x){
printf("RESULT:%d\n", ((double)(float) x == (double) x));
}
EDIT: the entire program:
#include<stdio.h>
#include<stdlib.h>
#include<limits.h>
int main(int argc, char *argv[]){
//create random values
int x = INT_MAX ;
printf("Signed X: %d\n",x);
float fx1 = (float)x;
double dx1 = (double)x;
double dfx = (double)(float)x;
printf("(double) x: %g\n",dx1);
printf("(float) x: %f \n",fx1);
printf("(double)(float)x: %g\n",dfx);
if((double) (float) x == (double) x){
printf("RESULT:%d\n", ((double)(float) x == (double) x));
}
return 0;
}//end of main function

int and float have most likely the same number of bits in their representation, namely 32. float has a mantissa, an exponent and a sign bit, so the mantissa must have less than 31 bit, needed for the bigger int values like INT_MAX. So there loss of precision when storing in float.

Related

Compare 2 floats by their bitwise representation in C

I had this question on my exam, and I couldn't realy solve it, will appreciate some help.
Fill the blanks only, function must return true if and only if x<y.
Assume x,y cannot be NaN (but can be +-inf) no casting is allowed, use only ux, uy, sx, sy
bool func(float x, float y) {
unsigned* uxp = ______________ ;
unsigned* uyp = ______________ ;
unsigned ux = *uxp;
unsigned uy = *uyp;
unsigned sx = (ux>>31);
unsigned sy = (uy>>31);
return ___________________________;
}
Presumably the assignment assumes float uses IEEE-754 binary32 and unsigned is 32 bits.
It is not proper to alias float objects with an unsigned type, although some C implementations support it. Instead, you can create a compound literal union, initialize its float member with the float value, and access its unsigned member. (This is supported by the C standard but not by C++.)
After that, it is simply a matter of dividing the comparison into cases depending on the sign bits:
#include <stdbool.h>
bool func(float x, float y) {
unsigned* uxp = & (union { float f; unsigned u; }) {x} .u;
unsigned* uyp = & (union { float f; unsigned u; }) {y} .u;
unsigned ux = *uxp;
unsigned uy = *uyp;
unsigned sx = (ux>>31);
unsigned sy = (uy>>31);
return
sx && sy ? uy < ux : // Negative values are in "reverse" order.
sx && !sy ? (uy | ux) & 0x7fffffffu : // Negative x is always less than positive y except for x = -0 and y = +0.
!sx && sy ? 0 : // Positive x is never less than negative y.
ux < uy ; // Positive values are in "normal" order.
}
#include <stdio.h>
int main(void)
{
// Print expected values and function values for comparison.
printf("1, %d\n", func(+3, +4));
printf("1, %d\n", func(-3, +4));
printf("0, %d\n", func(+3, -4));
printf("0, %d\n", func(-3, -4));
printf("0, %d\n", func(+4, +3));
printf("1, %d\n", func(-4, +3));
printf("0, %d\n", func(+4, -3));
printf("1, %d\n", func(-4, -3));
}
Sample output:
1, 1
1, 1
0, 0
0, 0
0, 0
1, 1
0, 0
1, 1

Separating float integral and fractional

I am trying to separate a float into it's integral and fractal parts. My method works fine for some values, but does not when I encounter a value that has a longer decimal representation.
fractalValue = modf(value, &z);
l_integral = z;
l_fractal = fractalValue * 1000.0;
For example, when I have value = 13.1800, this works fine.
But when value = 2.24213798933e-36 the program fails.
In this instance, the modf function returns 2.24213798933e-36 fractalValue variable, and 0 to z.
2.24213798933e-36 is 0.00000000000000000000000000000000224213798933 (may be off by one or two zeros, but you should get the idea). The integral portion of the value is 0, and the fractional portion is effectively 0, and multiplying "effectively 0" by 1000.0 leaves you with "effectively 0".
You are getting entirely expected results. Just because you have a non-zero leading digit doesn’t mean the integer portion of the value isn’t 0, it just means you have a normalized value (i.e, written such that there’s a single non-zero digit to the left of the decimal). 1.23e-3 is the normalized form of 0.00123 - your code would return 0 and 1.23.
You want to get the the parts from the exponent display form so for 2.435343454e36 it will be 2 and 0.435343454
If yes toy need to "normalize" your number (remove the e part). Then you can use your formulas to get the number parts
double remove_e(double val)
{
int sign = val < 0 ? -1 : 1;
if(val != 0.0)
{
val *= sign;
double logten = log10(val);
val /= pow(10, ceil(logten)-1);
}
return sign * val;
}
int main(void)
{
double x = 2.34546465;
printf("%e -> %f\n", x, remove_e(x));
x = 0.0;
printf("%e -> %f\n", x, remove_e(x));
x = 2.34546465e-36;
printf("%e -> %f\n", x, remove_e(x));
x = 2.34546465e36;
printf("%e -> %f\n", x, remove_e(x));
x = -0.0;
printf("%e -> %f\n", x, remove_e(x));
x = -2.34546465e-36;
printf("%e -> %f\n", x, remove_e(x));
x = -2.34546465e36;
printf("%e -> %f\n", x, remove_e(x));
}
https://godbolt.org/z/MevdfP

Problems with rounding a float number

I have the following source code but the result is not rounding to 2 decimal places.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
int main(int argc, char *argv[])
{
float x1=0;
float x2 = 0;
float result=0;
x1 = 8961.650391;
result = x1 * 100 + 0.5;
result = (float)floor(result);
printf("Result = <%f>\n", result);
result = result/100;
x2 = result;
printf("x2 = <%f>\n", x2);
return 0;
}
Please help to resolve the problem.
Result = <896165.000000>
x2 = <8961.650391>
How can obtain x3 = 8961.650000?
use "%0.2f" instead of %f , it will print value upto 2 decimal
x2= roundf(result * 100) / 100;
printf("x2 = <%0.2f>\n", x2);
float can typically represent about 232 different numbers exactly.
After all, it is typically encoded using 32-bits.
8961.65 is not one of them. The closest float to 8961.65 is 8961.650390625f. The below shows the previous and subsequent float.
To print a float to the nearest 0.01, use "%.2f" as suggest well by #pritesh agrawal.
Recommend rounding with rint() or round().
int main(void) {
float x = 8961.650391f;
float x100 = rint(x * 100.0);
float result = x100 / 100.0f;
printf("%f %.2f\n", nextafterf(x, 0), nextafterf(x, 0));
printf("%f %.2f\n", x, x);
printf("%f %.2f\n", nextafterf(x, x * 2), nextafterf(x, x * 2));
printf("%f %.2f\n", x100, x100);
printf("%f %.2f\n", result, result);
return 0;
}
Output
8961.649414 8961.65
8961.650391 8961.65
8961.651367 8961.65
896165.000000 896165.00
8961.650391 8961.65
How can obtain x3 = 8961.650000?
x3 cannot have the exact value of 8961.650000. To print a value, rounded to 2 decimal places followed by 4 zeros, the below can be used, but it is a bit of chicanery.
printf("%.2f0000\n", 8961.650390625f);
// output 8961.650000

Manually implementing a rounding function in C

I have written a C program (which is part of my project) to round off a float value to the given precision specified by the user. The function is something like this
float round_offf (float num, int precision)
What I have done in this program is convert the float number into a string and then processed it.
But is there a way to keep the number as float itself and implement the same.
Eg. num = 4.445 prec = 1 result = 4.4
Of course there is. Very simple:
#include <math.h>
float custom_round(float num, int prec)
{
int trunc = round(num * pow(10, prec));
return (float)trunc / pow(10, prec);
}
Edit: it seems to me that you want this because you think you can't have dynamic precision in a format string. Apparently, you can:
int precision = 3;
double pie = 3.14159265358979323648; // I'm hungry, I need a double pie
printf("Pi equals %.*lf\n", precision, pie);
This prints 3.142.
Yes:
float round_offf(float num, int precision)
{
int result;
int power;
power = pow(10, precision + 1);
result = num * power;
if ((result % 10) > 5)
result += 10;
result /= 10;
return ((float)result / (float)power);
}

Function to scale float values to (0-100) in C

I am trying to convert a float variable into an integer of value between 0 and 100. The float is always positive. the corresponding integer value should reflect the size of the float value compared to the maximum value for a 32-bit float, e.g. 0.0 converts to 0 and 3.402823466 E + 38 converts to a 100, and anything else goes in between.
Here is what I have so far but I keep getting -1 as the output for any non-zero input.
int convFloat(float x){
int y;
y = (int) (x/3.4e38) * 100;
return y;
}
What am I doing wrong here?
This:
y = (int) (x/3.4e38) * 100;
// ^--------------^
// cast (x/3.4e38)to int
Should be:
y = (int) ((x/3.4e38) * 100);
// ^----------------------^
// cast ((x/3.4e38) * 100)to int
((union { float f; uint32_t u; }){ val }.u>>23&255)*100/255

Resources