Implicit conversion of integers to floats in c - c

I'm having trouble understading why this code's output is 2147483648:
#include <stdio.h>
int main (void){
float f = 2147483638;
printf("%f",f);
}
I tried to find explanation using IEEE 754 standard for float representation but using my calculations I get that output should be 2147483520, not 2147483648.
Thanks for help!

That is the way that float works on your system.
Note that the C standard is intentionally flexible as to the type and sizes of the floating point types. A float does not have to be an IEEE754 32 bit floating point type.

Related

GNU MP low precision while using mpf_pow function

While writing this answer, I used the mpf_pow function to calculate 12.3 ^ 123, and the result is different from the one given by WolframAlpha (which by the way also uses GMP).
I casted the code to pure C to simplify:
#include <stdio.h>
#include <gmp.h>
int main (void) {
mpf_t a, c;
unsigned long int b = 123UL;
mpf_set_default_prec(100000);
mpf_inits(a, c, NULL);
mpf_set_d(a, 12.3);
mpf_pow_ui(c, a, b);
gmp_printf("c = %.50Ff\n", c);
return 0;
}
Which results in
114374367934618002778643226182707594198913258409535335775583252201365538178632825702225459029661601216944929436371688246107986574246790.32099077871758646985223686110515186972735931183764
While WolframAlpha returns
1.14374367934617190099880295228066276746218078451850229775887975052369504785666896446606568365201542169649974727730628842345343196581134895919942820874449837212099476648958359023796078549041949007807220625356526926729664064846685758382803707100766740220839267 × 10^134
which starts to disagree with mpf_pow at the 15th digit.
Am I doing something wrong in the code, is this a limitation of GMP, or is WolframAlpha giving an incorrect result?
Am I doing something wrong in the code, is this a limitation of GMP, or is WolframAlpha giving an incorrect result?
You are doing something different from what Wolfram is doing (obviously). Your code is not wrong, per se, but it is not doing what you probably think it is doing. Compare the output of this variation:
#include <stdio.h>
#include <gmp.h>
int main (void) {
mpf_t a, c;
unsigned long int b = 123UL;
mpf_set_default_prec(100000);
mpf_inits(a, c, NULL);
mpf_set_d(a, 12.3);
mpf_pow_ui(c, a, b);
gmp_printf("c = %.50Ff\n", c);
putchar('\n');
mpf_t a1, c1;
mpf_inits(a1, c1, NULL);
mpf_set_str(a1, "12.3", 10);
mpf_pow_ui(c1, a1, b);
gmp_printf("c' = %.50Ff\n", c1);
return 0;
}
...
c = 114374367934618002778643226182707594198913258409535335775583252201365538178632825702225459029661601216944929436371688246107986574246790.32099077871758646985223686110515186972735931183764
c' = 114374367934617190099880295228066276746218078451850229775887975052369504785666896446606568365201542169649974727730628842345343196581134.89591994282087444983721209947664895835902379607855
The difference between the two output values arises because my C implementation and yours represent values of type double in binary floating point, and 12.3 is not exactly representable in binary floating point (see Is floating point math broken?). C provides the closest approximation available, which, assuming 64-bit IEEE 754 representation, matches to about 15 decimal digits of precision. When you initialize a GMP variable with such a value, you get an exact GMP representation of the actual double value, which is only an approximation to 12.3 decimal.
But GMP can represent 12.3 (decimal) to whatever precision you choose.* You chose a very high precision, so when you use a decimal string to initialize your MP-float variable you get a much closer approximation than when you used a double. Naturally, performing the same operation on those different values produces different results. The GMP result in the latter case appears to agree with the Wolfram result to the full precision in which it is expressed.
Note also that in a general sense, one can also use decimal floating-point, in software or (if you are so equipped) in hardware. The value 12.3 (decimal) can be represented exactly in such a format, but that's not what GMP uses.
* Or indeed, GMP can represent 12.3 exactly as a MP rational, though that's not what the code above does.
This gives a result similar to WolframAlpha's:
from decimal import Decimal
from decimal import getcontext
getcontext().prec = 200
print(Decimal('12.3') ** 123)
So you must be doing something wrong in your GMP configuration.

Facing 'invalid operands to binary expression ('float' and 'float')' while using C

While doing CS50 problem set 1 - Cash, I faced the following problem when I try to write my code. I have declared the variables to integer. Why is it still happening? Thanks a lot for the help.
"invalid operands to binary expression ('float' and 'float')"
#include <stdio.h>
#include <cs50.h>
#include <math.h>
int main(){
float owe_in_dollars;
float owe_in_cent;
int coin_count = 0;
do
{
owe_in_dollars = get_float("Change: ");
}while(owe_in_dollars<0);
owe_in_cent = (int)(owe_in_dollars*100);
if (owe_in_cent%(int)25 > 0){
coin_count++;
}
printf("%i", coin_count);
}
There are several issues with this code, but I think the particular problem which produces the compiler error is
if (owe_in_cent%(int)25 > 0){
owe_in_cent is a float. There is no reason for it to be floating point, since you have assigned it to an integer value. But you declared it float, so that's what it is. 25 is an int, so there's no point in casting it to an int, but with or without the cast, it will be converted to a float in order to do arithmetic with owe_in_cent, because all arithmetic operators require that there operands be of the same type. Search for "usual arithmetic conversions" for details, but the bottom line is that these automatic conversions are always integer → floating point, never floating point → integer.
Then the problem shows up, because the % operator requires its operands to be integers, not floating point. There is a math function which can compute a floating point modulus, but you really want integer arithmetic so your best bet is to make owe_in_cent an int rather than a float.
And actually, you really should get into the habit of using double for floating point values. float is very imprecise and, other than in video chips and embedded processors, there's no point in using so inexact a representation. It saves you nothing.
Finally, remember two important facts about floating point:
It cannot precisely represent fractions whose denominators are not powers of two. In other words, 5.25 has an exact representation, because .25 is one-quarter, which is a power of two, but 5.26 cannot be exactly represented and will end up being a number either slightly greater than or slightly less than 5.26. when you mulitiply that number by 100, you will end up with something which is slightly more or slightly less than 526.
Casting a floating point number to an integer just drops the fractional part, no matter how close to 1.0 it is. So, for example, (int)525.9997 is 525, not 526. You should be able to see the problem that could produce.
There is a library function called round which rounds a floating point number to the closest integer, which is probably what you wanted.

How to guarantee exact size of double in C?

So, I am aware that types from the stdint.h header provide standardized width integer types, however I am wondering what type or method does one uses to guarantee the size of a double or other floating point type across platforms? Specifically, this would deal with packing data in a void*
#include <stdio.h>
#include <stdlib.h>
void write_double(void* buf, double num)
{
*(double*)buf = num;
}
double read_double(void* buf)
{
return *(double*)buf;
}
int main(void) {
void* buffer = malloc(sizeof(double));
write_double(buffer, 55);
printf("The double is %f\n", read_double(buffer));
return 0;
}
Say like in the above program, if I wrote that void* to a file or if it was used on another system, would there be some standard way to guarantee size of a floating point type or double?
How to guarantee exact size of double in C?
Use _Static_assert()
#include <limits.h>
int main(void) {
_Static_assert(sizeof (double)*CHAR_BIT == 64, "Unexpected double size");
return 0;
}
_Static_assert available since C11. Otherwise code could use a run-time assert.
#include <assert.h>
#include <limits.h>
int main(void) {
assert(sizeof (double)*CHAR_BIT == 64);
return 0;
}
Although this will insure the size of a double is 64, it does not insure IEEE 754 double-precision binary floating-point format adherence.
Code could use __STDC_IEC_559__
An implementation that defines __STDC_IEC_559__ shall conform to the specifications in this annex` C11 Annex F IEC 60559 floating-point arithmetic
Yet that may be too strict. Many implementations adhere to most of that standard, yet still do no set the macro.
would there be some standard way to guarantee size of a floating point type or double?
The best guaranteed is to write the FP value as its hex representation or as an exponential with sufficient decimal digits. See Printf width specifier to maintain precision of floating-point value
The problem with floating point type is that the C standard doesn't specify how they should be represented. The use of IEEE 754 is not required.
If you're communicating between a system that uses IEEE 754 and one that doesn't, you won't be able to write on one and read on the other even if the sizes are the same.
You need to serialize the data in a known format. You can either use sprintf to convert it to a text format, or you can do some math to determine the base and mantissa and store those.
Floating point values are defined in the The IEEE Standard for Floating-Point Arithmetic (IEEE 754) and have standard sizes:
float, in full "single precision floating point number": 32 bits
double, in full "double precision floating point number": 64 bits
The following also exist:
Half-precision floating-point format
Quadruple precision floating-point format
Extended precision floating-point format
This format is reused in the C11 standard, Annex F "IEC 60559 floating-point arithmetic" of ISO/IEC 9899:2011(en).
Why use CHAR_BIT and assert at runtime? We can do this at compile time.
void write_double(void* buf, double num)
{
char checkdoublesize[(sizeof(double) == 8)?1:-1];
*(double*)buf = num;
}
Your code is still undefined as it doesn't gurantee IEEE or endianness but it will catch a bad double size. If your platform's new enough for htonq this will allow endianness to work
void write_double(void* buf, double num)
{
char checkdoublesize[(sizeof(double) == 8)?1:-1];
*(int64_t*)buf = htonq(*(volatile int64_t*)&num);
}
double read_double(void* buf)
{
int64_t n = ntohq(*(int64_t*)buf);
return *(volatile double*)&n;
}
Where volatile is merely the shortest way to tell the compiler the pointer cast really is defined. Usually it does the right thing anyway but after N levels of inlining maybe it won't anymore.

Values double have in C

Which numbers can a double type contain? (in C language)
I was trying to find the numbers that double can contain in c.
I know that a float can contain numbers between -10^38
In C, a double is usually an IEEE double. I don't know if this is required by the standard, but these days it would be unusual for it to be something else. Here's some info on double precision formats, particularly IEEE: double-precision floating point formats

Is float better than double sometimes?

I was solving this problem on spoj http://www.spoj.com/problems/ATOMS/. I had to give the integral part of log(m / n) / log(k) as output. I had taken m, n, k as long long. When I was calculating it using long doubles, I was getting a wrong answer, but when I used float, it got accepted.
printf("%lld\n", (long long)(log(m / (long double)n) / log(k)));
This was giving a wrong answer but this:
printf("%lld\n", (long long)((float)log(m / (float)n) / (float)log(k)));
got accepted. So are there situations when float is better than double with respect to precision?
A float is never more accurate than a double since the former must be a subset of the latter, by the C standard:
6.2.5/6: "The set of values of the type float is a subset of the set of values of the type double; the set of values of the type double is a subset of the set of values of the type long double."
Note that the standard does not insist on a particular floating point representation although IEEE754 is particularly common.
It might be better in some cases in terms of calculation time/space performance. One example that is just on the table in front of me - an ARM Cortex-M4F based microcontroller, having a hardware Floating Point Unit (FPU), capable of working with single-precision arithmetic, but not with double precision, which is giving an incredible boost to floating point calculations.
Try this simple code :
#include<stdio.h>
int main(void)
{
float i=3.3;
if(i==3.3)
printf("Equal\n");
else
printf("Not Equal\n");
return 0;
}
Now try the same with double as a datatype of i.
double will always give you more precision than a float.
With double, you encode the number using 64 bits, while your using only 32 bits with float.
Edit: As Jens mentioned it may not be the case. double will give more precision only if the compiler is using IEEE-754. That's the case of GCC, Clang and MSVC. I haven't yet encountered a compiler which didn't use 32 bits for floats and 64 bits for doubles though...

Resources