Unsure how to tell if an overflow is possible. I am given this sample code:
char x;
float f, g;
// some values get assigned to x and f here...
g = f + x;
Can someone please explain?
A float, at its highest limits (binary exponent of 127), does not have sufficient precision (23 bits) to show a difference of the largest possible char (127, 7 bits), and so overflow is not possible since addition will have no effect (a precision of 127-7=120 would be required).
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I want to convert float negative values to unsigned int values. Is it possible?
For example:
float x = -10000.0;
int y;
y = x;
When we assign x value to y, can the negative value be stored in an integer?
If not, how can we store the negative values into integer variables?
can the negative (float f) value be stored in an integer?
Yes, with limitations.
With a signed integer type like int16_t i, i = f is well defined for
-32768.999... to 32767.999...
With an unsigned integer type like unt16_t u, u = f is well defined for
-0.999... to 65535.999...
The result is a truncated value (fraction thrown away). All other float values result in undefined behavior.
If not, how can we store the negative values into integer variables?
Best to use wide signed integer types and test for range limitations.
In any case, the fraction of the float is lost. A -0.5f can be stored in an unsigned, yet the value becomes 0u.
The below performs some simply tests to insure y is in range.
#include <limits.h>
float x = ...;
int y = 0;
if (x >= INT_MAX + 1u) puts("Too pos");
else if (x <= INT_MIN - 1.0) puts("Too neg");
else y = (int) x;
Note the tests above are illustrative as they lack high portability.
Example: INT_MIN - 1.0 in inexact in select situations.
To cope, with common 2's complement int, the below is better reformed. As 2's complement, INT_MIN is a power of 2 (negated) and usually in the range of float, thus making for an exact subtraction near the negative threshold. `
// if (x <= INT_MIN - 1.0)
if (x - INT_MIN <= - 1.0f)
Another alternative is to explore a union. Leave that for others to explain its possibilities and limitations.
union {
float f;
unsigned u;
} x;
float x = 10000.0;
int a;
a = (int)(x+0.5);
I understand there are several topics same as mine, but I still don't really get it, so I'm expecting someone could explain this in a more simple but explicit way for me instead of pasting other topics' links, thanks.
Here's a sample code:
int a = 960;
int b = 16;
float c = a*0.001;
float d = a*0.001 + b;
double e = a*0.001 + b;
printf("%f\n%f\n%lf", c, d, e);
which outputs:
0.960000
16.959999
16.960000
My two questions are:
Why does adding an integer to a float ends up as the second output, but changing float to double solves the problem as the third output?
Why does the third output have the same number of digits with the first and second output after the decimal point since it should be a more precise value?
The reason why they produce the same number of decimal places, is because 6 is the default value. You can change that as in the edited example below, where the syntax is %.*f. The * can be either a number as shown below, or in the second case, supplied as another argument.
#include <stdio.h>
int main(void) {
int a = 960;
int b = 16;
float c = a*0.001;
float d = a*0.001 + b;
double e = a*0.001 + b;
printf("%.9f\n", c);
printf("%.*f\n", 9, d);
printf("%.16f\n", e);
}
Program output:
0.959999979
16.959999084
16.9600000000000009
The extra decimal places now shows that none of the results is exact. One reason is because 0.001 cannot be exactly coded as a floating point value. There are other reasons too, which have been extensively covered.
One easy way to understand why, is that a float has about 2^32 different values that can be encoded, however there is an infinity of real numbers within the range of float, and only about 2^32 of them can be represented exactly. In the case of the fraction 1/1000, in binary it is a recurring value (as is the fraction 1/3 in decimal).
I think the calculation a*0.001 will be done in double precision in both cases, then some precision is lost when you store it as a float.
You can choose how many decimal digits are printed by printf by writing e.g. "%.10lf" (to get 10 digits) instead of just "%lf".
This question already has answers here:
How to extract the decimal part from a floating point number in C?
(16 answers)
Closed 5 years ago.
I want to split the float number to two separate part as real and non real part.
For example: if x = 45.678, then my function have to give real= 45 and non_real=678. I have tried the following logic.
split ( float x, unsigned int *real, unsigned int *non_real)
{
*real = x;
*non_real = ((int)(x*N_DECIMAL_POINTS_PRECISION)%N_DECIMAL_POINTS_PRECISION);
printf ("Real = %d , Non_Real = %d\n", *real, *non_real);
}
where N_DECIMAL_POINTS_PRECISION = 10000. It would give decimal part till 4 digits, not after.
It works only for specific set of decimal point precision. The code is not generic, it has to work for all floating numbers also like 9.565784 and 45.6875322 and so on. So if anyone could help me on this, it would be really helpful.
Thanks in advance.
Use floor() to find the integer part, and then subtract the integer part from the original value to find the fractional part.
Note: The problem you're most likely having is that some numbers are too large for the integer part to fit in the range of an int.
--Added--
If and only if you are able to assume that an unsigned int is larger than the floating point representation's significand (e.g. 32-bit unsigned int and IEEE standard single-precision floating point with only 23 fractional bits, where "32 < 23" is true); then a number that is too large for an unsigned int can't have any fractional bits. This leads to a solution like:
if(x > UINT_MAX) {
integer_part = x;
fractional_part = 0;
} else {
integer_part = (int)x;
fractional_part = x - integer_part;
}
I cannot figure out how to convert the value of a referenced float pointer when it is referenced from an integer casted into a float pointer. I'm sorry if I'm wording this incorrectly. Here is an example of what I mean:
#include <stdio.h>
main() {
int i;
float *f;
i = 1092616192;
f = (float *)&i;
printf("i is %d and f is %f\n", i, *f);
}
the output for f is 10. How did I get that result?
Normally, the value of 1092616192 in hexadecimal is 0x41200000.
In floating-point, that will give you:
sign = positive (0b)
exponent = 130, 2^3 (10000010b)
significand = 2097152, 1.25 (01000000000000000000000b)
2^3*1.25
= 8 *1.25
= 10
To explain the exponent part uses an offset encoding, so you have to subtract 127 from it to get the real value. 130 - 127 = 3. And since this is a binary encoding, we use 2 as the base. 2 ^ 3 = 8.
To explain the significand part, you start with an invisible 'whole' value of 1. the uppermost (leftmost) bit is half of that, 0.5. The next bit is half of 0.5, 0.25. Because only the 0.25 bit and the default '1' bit is set, the significand represents 1 + 0.25 = 1.25.
What you are trying to do is called type-punning. It should be done via a union, or using memcpy() and is only meaningful on an architecture where sizeof(int) == sizeof(float) without padding bits. The result is highly dependent on the architecture: byte ordering and floating point representation will affect the reinterpreted value. The presence of padding bits would invoke undefined behavior as the representation of float 15.0 could be a trap value for type int.
Here is how you get the number corresponding to 15.0:
#include <stdio.h>
int main(void) {
union {
float f;
int i;
unsigned int u;
} u;
u.f = 15;
printf("re-interpreting the bits of float %.1f as int gives %d (%#x in hex)\n",
u.f, u.i, u.u);
return 0;
}
output on an Intel PC:
re-interpreting the bits of float 15.0 as int gives 1097859072 (0x41700000 in hex)
You are trying to predict the consequence of an undefined activity - it depends on a lot of random things, and on the hardware and OS you are using.
Basically, what you are doing is throwing a glass against the wall and getting a certain shard. Now you are asking how to get a differently formed shard. well, you need to throw the glass differently against the wall...
In C programming, I find a weird problem, which counters my intuition. When I declare a integer as the INT_MAX (2147483647, defined in the limits.h) and implicitly convert it to a float value, it works fine, i.e., the float value is same with the maximum integer. And then, I convert the float back to an integer, something interesting happens. The new integer becomes the minimum integer (-2147483648).
The source codes look as below:
int a = INT_MAX;
float b = a; // b is correct
int a_new = b; // a_new becomes INT_MIN
I am not sure what happens when the float number b is converted to the integer a_new. So, is there any reasonable solution to find the maximum value which can be switched forth and back between integer and float type?
PS: The value of INT_MAX - 100 works fine, but this is just an arbitrary workaround.
This answer assumes that float is an IEEE-754 single precision float encoded as 32-bits, and that an int is 32-bits. See this Wikipedia article for more information about IEEE-754.
Floating point numbers only have 24-bits of precision, compared with 32-bits for an int. Therefore int values from 0 to 16777215 have an exact representation as floating point numbers, but numbers greater than 16777215 do not necessarily have exact representations as floats. The following code demonstrates this fact (on systems that use IEEE-754).
for ( int a = 16777210; a < 16777224; a++ )
{
float b = a;
int c = b;
printf( "a=%d c=%d b=0x%08x\n", a, c, *((int*)&b) );
}
The expected output is
a=16777210 c=16777210 b=0x4b7ffffa
a=16777211 c=16777211 b=0x4b7ffffb
a=16777212 c=16777212 b=0x4b7ffffc
a=16777213 c=16777213 b=0x4b7ffffd
a=16777214 c=16777214 b=0x4b7ffffe
a=16777215 c=16777215 b=0x4b7fffff
a=16777216 c=16777216 b=0x4b800000
a=16777217 c=16777216 b=0x4b800000
a=16777218 c=16777218 b=0x4b800001
a=16777219 c=16777220 b=0x4b800002
a=16777220 c=16777220 b=0x4b800002
a=16777221 c=16777220 b=0x4b800002
a=16777222 c=16777222 b=0x4b800003
a=16777223 c=16777224 b=0x4b800004
Of interest here is that the float value 0x4b800002 is used to represent the three int values 16777219, 16777220, and 16777221, and thus converting 16777219 to a float and back to an int does not preserve the exact value of the int.
The two floating point values that are closest to INT_MAX are 2147483520 and 2147483648, which can be demonstrated with this code
for ( int a = 2147483520; a < 2147483647; a++ )
{
float b = a;
int c = b;
printf( "a=%d c=%d b=0x%08x\n", a, c, *((int*)&b) );
}
The interesting parts of the output are
a=2147483520 c=2147483520 b=0x4effffff
a=2147483521 c=2147483520 b=0x4effffff
...
a=2147483582 c=2147483520 b=0x4effffff
a=2147483583 c=2147483520 b=0x4effffff
a=2147483584 c=-2147483648 b=0x4f000000
a=2147483585 c=-2147483648 b=0x4f000000
...
a=2147483645 c=-2147483648 b=0x4f000000
a=2147483646 c=-2147483648 b=0x4f000000
Note that all 32-bit int values from 2147483584 to 2147483647 will be rounded up to a float value of 2147483648. The largest int value that will round down is 2147483583, which the same as (INT_MAX - 64) on a 32-bit system.
One might conclude therefore that numbers below (INT_MAX - 64) will safely convert from int to float and back to int. But that is only true on systems where the size of an int is 32-bits, and a float is encoded per IEEE-754.