Assign double constant to float variable without warning in C?

Assign double constant to float variable without warning in C? - c

In C programming language, the floating point constant is double type by default
so 3.1415 is double type, unless use 'f' or 'F' suffix to indicate float type.
I assume const float pi = 3.1415 will cause a warning, but actually not.
when I try these under gcc with -Wall:
float f = 3.1415926;
double d = 3.1415926;
printf("f: %f\n", f);
printf("d: %f\n", d);
f = 3.1415926f;
printf("f: %f\n", f);
int i = 3.1415926;
printf("i: %d\n", i);
the result is:
f: 3.141593
d: 3.141593
f: 3.141593
i: 3
the result (including double variable) obviously lose precision, but compile without any warning.
so what did the compiler do with this? or did I misunderstand something?

-Wall does not enable warnings about loss of precision, truncation of values, etc. because these warnings are annoying noise and "fixing" them requires cluttering correct code with heaps of ugly casts. If you want warnings of this nature you need to enable them explicitly.
Also, your use of printf has nothing to do with the precision of the actual variables, just the precision printf is printing at, which defaults to 6 places after the decimal point.

%f can be used with float and double. If you want more precision use
printf("f: %.16f",d);
And this is what's going on under the hood:
float f = 3.1415926; // The double 3.1415926 is truncated to float
double d = 3.1415926;
printf("f: %f\n", f);
printf("d: %f\n", d);
f = 3.1415926f; // Float is specified
printf("f: %f\n", f);
int i = 3.1415926; // Truncation from double to int
printf("i: %d\n", i);

If you want to get warnings for this, I believe that -Wconversion flags them in mainline gcc-4.3 and later.
If you happen to use OS X, -Wshorten-64-to-32 has been flagging them in Apple's GCC since gcc-4.0.1. I believe that clang matches the mainline gcc behavior, however.

Related

Float inputs for which sinf and sin return different results?

I'm trying to understand something about sin and sinf from math.h.
I understand that their types differ: the former takes and returns doubles, and the latter takes and returns floats.
However, GCC still compiles my code if I call sin with float arguments:
#include <stdio.h>
#include <math.h>
#define PI 3.14159265
int main ()
{
float x, result;
x = 135 / 180 * PI;
result = sin (x);
printf ("The sin of (x=%f) is %f\n", x, result);
return 0;
}
By default, all compiles just fine (even with -Wall, -std=c99 and -Wpedantic; I need to work with C99). GCC won't complain about me passing floats to sin. If I enable -Wconversion then GCC tells me:
warning: conversion to ‘float’ from ‘double’ may alter its value [-Wfloat-conversion]
result = sin (x);
^~~
So my question is: is there a float input for which using sin, like above, and (implicitly) casting the result back to float, will result in a value that is different from that obtained using sinf?

This program finds three examples on my machine:
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{
int i;
float f, f1, f2;
for(i = 0; i < 10000; i++) {
f = (float)rand() / RAND_MAX;
float f1 = sinf(f);
float f2 = sin(f);
if(f1 != f2) printf("jackpot: %.8f %.8f %.8f\n", f, f1, f2);
}
}
I got:
jackpot: 0.98704159 0.83439910 0.83439904
jackpot: 0.78605396 0.70757037 0.70757031
jackpot: 0.78636044 0.70778692 0.70778686

This will find all the float input values in the range 0.0 to 2 * M_PI where (float)sin(input) != sinf(input):
#include <stdio.h>
#include <math.h>
#include <float.h>
#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif
int main(void)
{
for (float in = 0.0; in < 2 * M_PI; in = nextafterf(in, FLT_MAX)) {
float sin_result = (float)sin(in);
float sinf_result = sinf(in);
if (sin_result != sinf_result) {
printf("sin(%.*g) = %.*g, sinf(%.*g) = %.*g\n",
FLT_DECIMAL_DIG, in, FLT_DECIMAL_DIG, sin_result,
FLT_DECIMAL_DIG, in, FLT_DECIMAL_DIG, sinf_result);
}
}
return 0;
}
There are 1020963 such inputs on my amd64 Linux system with glibc 2.32.

float precision is approximately 6 significant figures decimal, while double is good for about 15. (It is approximate because they are binary floating point values not decimal floating point).
As such for example: a double value 1.23456789 will become 1.23456xxx as a float where xxx are unlikely to be 789 in this case.
Clearly not all (in fact very few) double values are exactly representable by float, so will change value when down-converted.
So for:
double a = 1.23456789 ;
float b = a ;
printf( "double: %.10f\n", a ) ;
printf( "float: %.10f\n", b ) ;
The result in my test was:
double: 1.2345678900
float: 1.2345678806
As you can see the float in fact retained 9 significant figures in this case, but it is by no means guaranteed for all possible values.
In your test you have limited the number of instances of mismatch because of the limited and finite range of rand() and also because f itself is float. Consider:
int main()
{
unsigned mismatch_count = 0 ;
unsigned iterations = 0 ;
for( double f = 0; f < 6.28318530718; f += 0.000001)
{
float f1 = sinf(f);
float f2 = sin(f);
iterations++ ;
if(f1 != f2)
{
mismatch_count++ ;
}
}
printf("%f%%\n", (double)mismatch_count/iterations* 100.0);}
In my test about 55% of comparisons mismatched. Changing f to float, the mismatches reduced to 1.3%.
So in your test, you see few mismatches because of the constraints of your method of generating f and its type. In the general case the issue is much more obvious.
In some cases you might see no mismatches - an implementation may simply implement sinf() using sin() with explicit casts. The compiler warning is for the general case of implicitly casting a double to a float without reference to any operations performed prior to the conversion.

However, GCC still compiles my code if I call sin with float arguments:
Yes, this is because they are implicitly converted to double (because sin() requires a float), and back to float (because sin() returns a double) on entering and exiting from the sinf() function. See below why it is better to use sinf() in this case, instead of having only one function.
You have included math.h which has prototypes for both function calls:
double sin(double);
float sinf(float);
And so, the compiler knows that to use sin() it is necessary a conversion from float to double so it compiles a conversion before calling, and also compiles a conversion from double to float in the result from sin().
In case you have not #include <math.h> and you ignored the compiler warning telling you are calling a function sin() with no prototype, the compiler should have also converted first the float to double (because on nonspecified argument types this is how it mus proceed) and pass the double data to the function (which is assumed to return an int in this case, that will provoke a serious Undefined Behaviour)
In case you have used the sinf() function (with the proper prototype), and passed a float, then no conversion should be compiled, the float is passed as such with no type conversion, and the returned value is assigned to a float variable, also with no conversion. So everything goes fine with no conversion, this makes the fastest code.
In case you have used the sinf() function (with no prototype), and passed a float, this float would be converted to a double and passed as such to sinf(), resulting in undefined behaviour. In case somehow sinf() returned properly, an int result (that could have something to do with the calculation or not, as per UB) would be converted into float type (should this be possible) and assigned to the result value.
In the case mentioned above, in case you are operating on floats, it is better to use sinf() as it takes less to execute (it has less iterations to do, as less precision is required in them) and the two conversions (from float to double and back from double to float) have not to be compiled in, in the binary code output by the compiler.

There are some systems where computations on float are an order of magnitude faster than computations on double. The primary purpose of sinf is to allow trigonometric calculations to be performed efficiently on such systems in cases where the lower precision of float would be adequate to satisfy application needs. Converting a value to float, calling sin, and converting the result to float would always yield a value that either matched that of sinf or was more accurate(*), and on some implementations that would in fact be the most efficient way of implementing sinf. On some other systems, however, such an approach would be more than an order of magnitude slower than using a purpose-designed function to evaluate the sine of a float.
(*) Note that for arguments outside the range +/- π/2, the most mathematically accurate way of computing sin(x) for an exact specified value of x might not be the most accurate way of computing what the calling code wants to know. If an application computes sinf(angle * (2.0f * 3.14159265f)), when angle is 0.5, having the function (double)3.1415926535897932385-(float)3.14159265f may be more "mathematically accurate" than having it return sin(angle-(2.0f*3.14159265f)), but the latter would more accurately represent the sine of the angle the code was actually interested in.

C math not respecting declared constants

Stackoverflow,
I'm trying to write a (very) simple program that will be used to show how machine precision and flops effect functions around their root. My code is as follows:
#include <stdio.h>
#include <math.h>
int main(){
const float x = 2.2;
float sum = 0.0;
sum = pow(x,9) - 18*pow(x,8) + 144*pow(x,7) - 672*pow(x,6) + 2016*pow(x,5) -
4032*pow(x,4) + 5376*pow(x,3) - 4608*pow(x,2) + 2304*x - 512;
printf("sum = %d", sum);
printf("\n----------\n");
printf("x = %d", x);
return 0;
}
But I keep getting that sum is equal to zero. At first I thought that maybe my machine wasn't respecting the level of percision, but after printing x I discovered that the value of x is changing each time I run the program and is always huge (abs(x) > 1e6)
I have it declared as a constant so I'm even more confused as to whats going on...
FYI I'm compiling with gcc -lm

printf("sum = %d", sum);
sum is a float, not an int. You should use %f instead of %d. Same here:
printf("x = %d", x);
Reading about printf() format specifiers may be a good idea.

gcc: printf and long double leads to wrong output. [C - Type conversion messes up]

I'm fairly new to C. I try to write functions for a Vector, but there must be something wrong.
Here's the code:
/* Defines maths for particles. */
#include <math.h>
#include <stdio.h>
/* The vector struct. */
typedef struct {
long double x, y, z;
} Vector;
Vector Vector_InitDoubleXYZ(double x, double y, double z) {
Vector v;
v.x = (long double) x;
v.y = (long double) y;
v.z = (long double) z;
return v;
}
Vector Vector_InitDoubleAll(double all) {
Vector v;
v.x = v.y = v.z = (long double) all;
return v;
}
Vector Vector_InitLongDXYZ(long double x, long double y, long double z) {
Vector v;
v.x = x;
v.y = y;
v.z = z;
return v;
}
Vector Vector_InitLongDAll(long double all) {
Vector v;
v.x = v.y = v.z = all;
return v;
}
Vector Vector_AddVector(Vector *v1, Vector *v2) {
Vector v3;
v3.x = v1->x + v2->x;
v3.y = v1->y + v2->y;
v3.z = v1->z + v2->z;
return v3;
}
Vector Vector_AddDouble(Vector *v1, double other) {
Vector v2;
v2.x = v1->x + other;
v2.y = v1->y + other;
v2.z = v1->z + other;
return v2;
}
Vector Vector_AddLongD(Vector *v1, long double other) {
Vector v2;
v2.x = v1->x + other;
v2.y = v1->y + other;
v2.z = v1->z + other;
return v2;
}
void Vector_Print(Vector *v) {
printf("X: %Lf, Y: %Lf, Z: %Lf\n", v->x, v->y, v->z); //Before edit: used %ld
}
double Vector_Length(Vector *v) {
return pow(pow(v->x, 2) + pow(v->y, 2) + pow(v->z, 2), 0.5);
}
int main() {
Vector v = Vector_InitDoubleXYZ(2.0, 1.0, 7.0); //Before edit: (2.0d, 1.0d, 7.0d);
Vector_Print(&v);
}
I'm using gcc to compile. Running vector.exe in the commandline gives me the following output:
X: 0, Y: -2147483648, Z: 9650176
and I do not understand why this is happening.
I appreciate any hints (even about my coding-style or whatever could've be done better in the code).
Thank you,
Update: Using the MSVC Compiler works just fine, it seems to be an issue of gcc. Do you know why this happens ?

The problem (after fixing the various problems if using integer specifiers for floating point formatting) is that you're mixing GCC types with an MSVC runtime that doesn't understand them.
First off, MinGW is a GCC compiler, but it uses an MSVC runtime for the bulk of it runtime support. What this means for the printf() family of functions is that only the format specifiers that msvcrt.dll supports and only the types that msvcrt.dll supports will work. But GCC doesn't know anything about this, so it'll pass its own types and, of course, the format specifiers are whatever you pass in the format string (though GCC might issue warnings that don't really apply to the msvcrt.dll situation). See Strange "unsigned long long int" behaviour for some examples based on 64-bit ints (I think that newer versions of msvcrt.dll may have fixed some or all of the 64-bit int issues though).
The other part of this problem you're running into is that long double in GCC is a different type than long double in MSVC. GCC uses a 96-bit or 128-bit type for long double on x86 or x64 targets (see http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html). However, MSVC uses a 64-bit type - basically long double is exactly the same as double for msvcrt.dll (http://msdn.microsoft.com/en-us/library/9cx8xs15.aspx):
Previous 16-bit versions of Microsoft C/C++ and Microsoft Visual C++ supported the long double, 80-bit precision data type. In Win32 programming, however, the long double data type maps to the double, 64-bit precision data type. The Microsoft run-time library provides long double versions of the math functions only for backward compatibility. The long double function prototypes are identical to the prototypes for their double counterparts, except that the long double data type replaces the double data type. The long double versions of these functions should not be used in new code.
So what this boils down to is that the GCC/MinGW long double type will simply not be compatible with the formatted I/O in msvcrt.dll. Either switch to using double with MinGW, or if you need to use long double you'll have to cast the values to (double) for formatted I/O or come up with your own formatting routines.
Another option might be to use the GCC compiler under Cygwin, which I think will avoid relying on msvcrt.dll for I/O formatting (at the cost of relying on the Cygwin environment).

Your format string doesn't match your type. %ld is a format specifier for a long int, not a long double. Use %Lg.

Should be:
void Vector_Print(Vector *v) {
printf("X: %Lf, Y: %Lf, Z: %Lf\n", v->x, v->y, v->z);
}
With f (or g or e or G or E) for floating point type and with uppercase L for long double.
That's standard C and C++ specifier for long double. Using lowercase l might work in some implementations, but why make your program less portable then necessary.
Note: With scanf functions family, specifiers are slightly different. %f (and others) means float, %lf means double and %Lf means long double.
With printf there is no specifier for float. You need to cast float variables to double.
Note 2: Apparently mingw has some incompatibilities caused by using MSVC runtime that cause problems with %Lf. It's strange because normally MSVC allows both %lf and %Lf.
Mingw has alternative implementation of stdio. You can enable it by #defining __USE_MINGW_ANSI_STDIO to 1 (see). I can't promise it'll help. I don't know mingw too well.
Update: Michael Burr's answer explains why %Lf conversion fails when using MSVC runtime with MinGW. If you still need to use MinGW, try switching to their own implementation. It uses real long double type.

Have you enabled warnings in the compiler? If the compiler gives a warning, it may provide a hint about what is wrong.

Try something like this:
Vector v;
Vector_InitDoubleXYZ(&v, 2.0d, 1.0d, 7.0d);
where that function is defined as:
void Vector_InitDoubleXYZ(Vector *v, double x, double y, double z) {
long double t;
t = x;
v->x = t;
t=y;
v->y = t;
t=z;
v->z = t;
}

What's the use of suffix `f` on float value

I am wondering what the difference is between these two variables in C:
float price = 3.00;
and
float price = 3.00f;
What is the use of suffix f in this case?

3.00 is interpreted as a double, as opposed to 3.00f which is seen by the compiler as a float.
The f suffix simply tells the compiler which is a float and which is a double.
See MSDN (C++)

In addition to what has already been said, keeping track of 1.0 versus 1.0f is more important than many people realize. If you write code like this:
float x;
...
float y = x * 2.0;
Then x will be promoted to a double, because 2.0 is a double. The compiler is not allowed to optimize that promotion away or it would violate the C standard. The calculation takes place with double precision, and then the result is then implicitly truncated into a float. This means that the calculation will be slower (though more accurate) than it would have been if you had written 2.0f or 2.
Had you written 2, the constant would be of int type, which would be promoted to a float, and the calculation would have been done with "float precision". A good compiler would warn you about this promotion.
Read more about the "usual arithmetic conversion" rules here:
http://msdn.microsoft.com/en-us/library/3t4w2bkb%28v=vs.80%29.aspx

Because by unsuffixed floating-point literals are doubles, and rounding means that even small literals can take on different values when rounded to float and double. This can be observed in the following example:
float f=0.67;
if(f == 0.67)
printf("yes");
else
printf("no");
This will output no, because 0.67 has a different value when rounded to float than it does when rounded to double. On the other hand:
float f=0.67;
if(f == 0.67f)
printf("yes");
else
printf("no");
outputs yes.
The suffix can be specified using either upper or lowercase letters.
Try this also:
printf(" %u %u\n", sizeof(.67f), sizeof(.67));
Check #codepade

3.00 is a double, 3.00f is a float.

Adding few more combination of comparisons between float and double data types.
int main()
{
// Double type constant(3.14) converts to Float type by
// truncating it's bits representation
float a = 3.14;
// Problem: float type 'a' promotes to double type and the value
// of 'a' depends on how many bits added to represent it.
if(a == 3.14)
std::cout<<"a: Equal"<<std::endl;
else
std::cout<<"a: Not Equal"<<std::endl;
float b = 3.14f; // No type conversion
if(b == 3.14) // Problem: Float to Double conversion
std::cout<<"b: Equal"<<std::endl;
else
std::cout<<"b: Not Equal"<<std::endl;
float c = 3.14; // Double to Float conversion (OK even though is not a good practice )
if(c == 3.14f) // No type conversion
std::cout<<"c: Equal"<<std::endl; // OK
else
std::cout<<"c: Not Equal"<<std::endl;
float d = 3.14f;
if(d == 3.14f)
std::cout<<"d: Equal"<<std::endl; // OK
else
std::cout<<"d: Not Equal"<<std::endl;
return 0;
}
Output:
a: Not Equal
b: Not Equal
c: Equal
d: Equal

That's because the default type of a
floating point numeric literal - the
characters 3.00 is double not float.
To make this compile you have to add
the suffix f (or F).

Often the difference isn't important, as the compiler will convert the double constant into a float anyway. However, consider this:
template<class T> T min(T a, T b)
{
return (a < b) ? a : b;
}
float x = min(3.0f, 2.0f); // will compile
x = min(3.0f, 2); // compiler cannot deduce T type
x = min(3.0f, 2.0); // compiler cannot deduce T type

How do I force 0.0/0.0 to return zero instead of NaN in MIPSPro C compiler?

As the question states, I am using the MIPSPRo C compiler, and I have an operation that will return NaN for some data sets where both the numerator and denom are zero. How do I keep this from happening?

On SGI systems with the MIPSPro compiler, you can set the handling of various floating point exceptions with great precision using the facilities in sigfpe.h. As it happens, the division of zero by zero is one such case:
#include <stdio.h>
#include <sigfpe.h>
int main (void) {
float x = 0.0f;
(void) printf("default %f / %f = %f\n", x, x, (x / x));
invalidop_results_[_ZERO_DIV_ZERO] = _ZERO;
handle_sigfpes(_ON, _EN_INVALID, 0, 0, 0);
(void) printf("handled %f / %f = %f\n", x, x, (x / x));
return 0;
}
In use:
arkku#seven:~/test$ cc -version
MIPSpro Compilers: Version 7.3.1.3m
arkku#seven:~/test$ cc -o sigfpe sigfpe.c -lfpe
arkku#seven:~/test$ ./sigfpe
default 0.000000 / 0.000000 = nan0x7ffffe00
handled 0.000000 / 0.000000 = 0.000000
As you can see, setting the _ZERO_DIV_ZERO result changes the outcome of the same division. Likewise you can handle regular division by zero (e.g. if you don't want infinity as the result).
Of course, none of this is standard; it would be more portable to check for NaN after each division and even better to check for zeros before. C99 offers some control over the floating point environment in fenv.h, but I don't think anything suitable for this is available. In any case my old MIPSPro doesn't support C99.

Use an if clause? Also I'm curious why you'd want to ignore this mathematical impossibility. You sure your input isn't wrong/meaningless in this case?

If you don't mind introducing a small error, you can add a small value to the denominator, assuming you are doing floating point arithmetic. apparently has some small values defined:
DBL_MIN is the smallest double
DBL_EPSILON is the smallest double s.t. x+DBL_EPSILON != x
So I would try
#include <float.h>
#define EPS DBL_MIN
double divModified(double num, double denom) {
return num / (denom + EPS);
}

IEEE 754 (the spec for floating point) says that 0.0/0.0 is not a number, i.e. NaN. If you want it to be anything else, by far the best approach is to detect when the operands are both zero in an if clause and return the value that you'd rather give. Perhaps like this:
#define WonkyDiv(a,b) ((a)==0.0&&(b)==0.0 ? 0.0 : (a)/(b))
float wonkyResult = WonkyDiv(numerator, denominator);

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight