Size of a float variable and compilation - c

I'm struggling to understand the behavior of gcc in this. The size of a float is of 4 bytes for my architecture. But I can still store a 8 bytes real value in a float, and my compiler says nothing about it.
For example I have :
#include <stdio.h>
int main(int argc, char** argv){
float someFloatNumb = 0xFFFFFFFFFFFF;
printf("%i\n", sizeof(someFloatNumb));
printf("%f\n", someFloatNumb);
printf("%i\n", sizeof(281474976710656));
return 0;
}
I expected the compiler to insult me, or displaying a disclaimer of some sort, because I shouldn't be able to something like that, at least I think it's kind of twisted wizardry.
The program simply run :
4
281474976710656.000000
8
So, if I print the size of someFloatNumb, I get 4 bytes, which is expected. But the affected value isn't, as seen just below.
So I have a few questions:
Does sizeof(variable) simply get the variable type and return sizeof(type), which in this case would explain the result?
Does/Can gcc grow the capacity of a type? (managing multiple variables behind the curtains to allow us that sort of things)

1)
Does sizeof(variable) simply get the variable type and return sizeof(type), which in this case would explain the result ?
Except for variable-length arrays, sizeof doesn't evaluate its operand. So yes, all it cares is the type. So sizeof(someFloatNumb) is 4 which is equivalent to sizeof(float). This explains printf("%i\n", sizeof(someFloatNumb));.
2)
[..] But I can still store a 8 bytes real value in a float, and my compiler says nothing about it.
Does/Can gcc grow the capacity of a type ? (managing multiple variables behind the curtains to allow us that sort of things)
No. Capacity doesn't grow. You simply misunderstood how floats are represented/stored. sizeof(float) being 4 doesn't mean
it can't store more than 2^32 (assuming 1 byte == 8 bits). See Floating point representation.
What the maximum value of a float can represent is defined by the constant FLT_MAX (see <float.h>). sizeof(someFloatNumb) simply yields how many bytes the object (someFloatNumb) takes up in memory which isn't necessarily equal to the range of values it can represent.
This explains why printf("%f\n", someFloatNumb); prints the value as expected (and there's no automatic "capacity growth").
3)
printf("%i\n", sizeof(281474976710656));
This is slightly more involved. As said before in (1), sizeof only cares about the type here. But the type of 281474976710656 is not necessarily int.
The C standard defines the type of integer constants according to the smallest type that can represent the value. See https://stackoverflow.com/a/42115024/1275169 for an explanation.
On my system 281474976710656 can't be represented in an int and it's stored in a long int which is likely to be case on your system as well. So what you see is essentially equivalent to sizeof(long).
There's no portable way to determine the type of integer constants. But since you are using gcc, you could use a little trick with typeof:
typeof(281474976710656) x;
printf("%s", x); /* deliberately using '%s' to generate warning from gcc. */
generates:
warning: format ‘%s’ expects argument of type ‘char *’, but argument 2
has type ‘long int’ [-Wformat=]
printf("%s", x);
P.S: sizeof results a size_t for which the correct format specifier is %zu. So that's what you should be using in your 1st and 3rd printf statements.

This doesn't store "8 bytes" of data, that value gets converted to an integer by the compiler, then converted to a float for assignment:
float someFloatNumb = 0xFFFFFFFFFFFF; // 6 bytes of data
Since float can represent large values, this isn't a big deal, but you will lose a lot of precision if you're only using 32-bit floats. Notice there's a slight but important difference here:
float value = 281474976710656.000000;
int value = 281474976710655;
This is because float becomes an approximation when it runs out of precision.
Capacities don't "grow" for standard C types. You'll have to use a "bignum" library for that.

But I can still store a 8 bytes real value in a float, and my compiler
says nothing about it.
That's not what's happening.
float someFloatNumb = 0xFFFFFFFFFFFF;
0xFFFFFFFFFFFF is an integer constant. Its value, expressed in decimal, is 281474976710655, and its type is probably either long or long long. (Incidentally, that value can be stored in 48 bits, but most systems don't have a 48-bit integer type, so it will probably be stored in 64 bits, of which the high-order 16 bits will be zero.)
When you use an expression of one numeric type to initialize an object of a different numeric type, the value is converted. This conversion doesn't depend on the size of the source expression, only on its numeric value. For an integer-to-float conversion, the result is the closest representation to the integer value. There may be some loss of precision (and in this case, there is). Some compilers may have options to warn about loss of precision, but the conversion is perfectly valid so you probably won't get a warning by default.
Here's a small program to illustrate what's going on:
#include <stdio.h>
int main(void) {
long long ll = 0xFFFFFFFFFFFF;
float f = 0xFFFFFFFFFFFF;
printf("ll = %lld\n", ll);
printf("f = %f\n", f);
}
The output on my system is:
ll = 281474976710655
f = 281474976710656.000000
As you can see, the conversion has lost some precision. 281474976710656 is an exact power of two, and floating-point types generally can represent those exactly. There's a very small difference between the two values because you chose an integer value that's very close to one that can be represented exactly. If I change the value:
#include <stdio.h>
int main(void) {
long long ll = 0xEEEEEEEEEEEE;
float f = 0xEEEEEEEEEEEE;
printf("ll = %lld\n", ll);
printf("f = %f\n", f);
}
the apparent loss of precision is much larger:
ll = 262709978263278
f = 262709979381760.000000

0xFFFFFFFFFFFF == 281474976710655
If you init a float with that value, it will end up being
0xFFFFFFFFFFFF +1 == 0x1000000000000 == 281474976710656 == 1<<48
That fits easily in a 4byte float, simple mantisse, small exponent.
It does however NOT store the correct value (one lower) because that IS hard to store in a float.
Note that the " +1" does not imply incrementation. It ends up one higher because the representation can only get as close as off-by-one to the attempted value. You may consider that "rounding up to the next power of 2 mutliplied by whatever the mantisse can store". Mantisse, by the way, usually is interpreted as a fraction between 0 and 1.
Getting closer would indeed require the 48 bits of your initialisation in the mantisse; plus whatever number of bits would be used to store the exponent; and maybe a few more for other details.

Look at the value printed... 0xFFFF...FFFF is an odd value, but the value printed in your example is even. You are feeding the float variable with an int value that is converted to float. The conversion is loosing precision, as expected by the value used, which doesn't fit in the 23 bits reserved to the target variable mantissa. And finally you get an approximation with is the value 0x1000000....0000 (the next value, which is the closest value to the one you used, as posted #Yunnosch in his answer)

Related

Sizeof with different specificators [duplicate]

This question already has answers here:
Using %f to print an integer variable
(6 answers)
Closed 3 years ago.
I want to know why sizeof doesn't work with different types of format specifiers.
I know that sizeof is usually used with the %zu format specifier, but I want to know for my own knowledge what happens behind and why it prints nan when I use it with %f or a long number when used with %lf
int a = 0;
long long d = 1000000000000;
int s = a + d;
printf("%d\n", sizeof(a + d)); // prints normal size of expression
printf("%lf\n", sizeof(s)); // prints big number
printf("%f", sizeof(d)); // prints nan
sizeof evaluates to a value of type size_t. The proper specifier for size_t in C99 is %zu. You can use %u on systems where size_t and unsigned int are the same type or at least have the same size and representation. On 64-bit systems, size_t values have 64 bits and therefore are larger than 32-bit ints. On 64-bit linux and OS/X, this type is defined as unsigned long and on 64-bit Windows as unsigned long long, hence using %lu or %llu on these systems is fine too.
Passing a size_t for an incompatible conversion specification has undefined behavior:
the program could crash (and it probably will if you use %s)
the program could display the expected value (as it might for %d)
the program could produce weird output such as nan for %f or something else...
The reason for this is integers and floating point values are passed in different ways to printf and they have a different representation. Passing an integer where printf expects a double will let printf retrieve the floating point value from registers or memory locations that have random contents. In your case, the floating point register just happens to contain a nan value, but it might contain a different value elsewhere in the program or at a later time, nothing can be expected, the behavior is undefined.
Some legacy systems do not support %zu, notably C runtimes by Microsoft. On these systems, you can use %u or %lu and use a cast to convert the size_t to an unsigned or an unsigned long:
int a = 0;
long long d = 1000000000000;
int s = a + d;
printf("%u\n", (unsigned)sizeof(a + d)); // should print 8
printf("%lu\n", (unsigned long)sizeof(s)); // should print 4
printf("%llu\n", (unsigned long long)sizeof(d)); // prints 4 or 8 depending on the system
I want to know for my own knowledge what happens behind and why it prints nan when I use it with %f or a long number when used with %lf
Several reasons.
First of all, printf doesn't know the types of the additional arguments you actually pass to it. It's relying on the format string to tell it the number and types of additional arguments to expect. If you pass a size_t as an additional argument, but tell printf to expect a float, then printf will interpret the bit pattern of the additional argument as a float, not a size_t. Integer and floating point types have radically different representations, so you'll get values you don't expect (including NaN).
Secondly, different types have different sizes. If you pass a 16-bit short as an argument, but tell printf to expect a 64-bit double with %f, then printf is going to look at the extra bytes immediately following that argument. It's not guaranteed that size_t and double have the same sizes, so printf may either be ignoring part of the actual value, or using bytes from memory that isn't part of the value.
Finally, it depends on how arguments are being passed. Some architectures use registers to pass arguments (at least for the first few arguments) rather than the stack, and different registers are used for floats vs. integers, so if you pass an integer and tell it to expect a double with %f, printf may look in the wrong place altogether and print something completely random.
printf is not smart. It relies on you to use the correct conversion specifier for the type of the argument you want to pass.

value of variable in c language with 100 digits

So I'm new to c , and I have just learned about data type, what confuse me is that a value range of a double for example is from 2.3E-308 to 1.7E+308
mathematically a number of 100 digits ∈ [2.3E-308 , 1.7E+308].
Writing this simple program
#include <stdio.h>
int main()
{
double c = 5416751717547457918597197587615765157415671579185765176547645735175197857989185791857948797847984848;
printf("%le",c);
return 0;
}
the result is 7.531214e+18 by changing %le by %lf th result is 7531214226330737664.000000
which doesn't equal c.
So whats is the problem.
This long number is actually a numerical literal of type long long. But since this type cannot contain such a long number, it is truncated modulo (LLONG_MAX + 1) and resulting in 7531214226330737360.
Demo.
Edit:
#JohnBollinger: ... and then converted to double, with a resulting loss of a few (binary) digits of precision.
#rici: Demo2 - here the constant is of type double because of added decimal point
It might seem that, if we can store a number of up to 10 to the power 308, we are storing 308 digits or so but, in floating point arithmetic, that isn't the case. Floating point numbers are not stored as huge strings of digits.
Broadly, a floating-point number is stored as a mantissa -- typically a number between zero and one -- and an exponent -- some number raised to the power of some other number. The different kinds of floating point number (float, double, long double) each has a different number of bits allocated to the mantissa and exponent. These bit counts, particularly in the mantissa, control the precision with which the number can be represented.
A double on most platforms gives 16-17 decimal digits of precision, regardless of the magnitude (power of ten). It's possible to use libraries that will do arithmetic to any degree of precision required, although such features are not built into C.
An additional complication is that, in your example, the number you assign to c is not actually defined to be a floating point number at all. Lacking any indication that it should be so represented, the compiler will treat it as an integer and, as it's too large to fit even the largest integer type on most platforms, it gets truncated down to integer range.
You should get a proper compiler or enable warnings on it. A recent GCC, with just default settings will output the following warning:
% gcc float.c
float.c: In function ‘main’:
float.c:4:12: warning: integer constant is too large for its type
double c = 5416751717547457918597197587615765157415671579185765176547645735175197857989185791857948797847984848;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Notice that it says integer, i.e. a whole number, not floating point. In C a constant of that form denotes an integer. Unless suffixed with U, it is additionally a signed integer, of the greatest type that it fits. However, neither standard C, nor common implementations, have a type that is big enough to fit this value. So what happens, is [(C11 6.4.4.1p6)[http://port70.net/~nsz/c/c11/n1570.html#6.4.4.1p6]) :
If an integer constant cannot be represented by any type in its list and has no extended integer type, then the integer constant has no type.
Use of such an integer constant without type in arithmetic leads to undefined behaviour, that is the whole execution of the program is now meaningless. You should have read the warnings.
The "fix" would have been to add a . after the number!
#include <stdio.h>
int main(void)
{
double c = 54167517175474579185971975876157651574156715791\
85765176547645735175197857989185791857948797847984848.;
printf("%le\n",c);
}
And running it:
% ./a.out
5.416752e+99
Notice that even then, a double is precise to average ~15 significant decimal digits only.

Rand() Conversion Warning

I have :
r = ((float)(rand()/(float)(RAND_MAX)) * BOUND);
this also gives the same warning:
r = ((rand()/(float)(RAND_MAX)) * BOUND);
And the warning:
conversion to ‘float’ from ‘int’ may alter its value
Any possible fixes?
Both RAND_MAX and the return value of rand() itself are of type int.
A 32bit int can have a maximum value (INT_MAX) of up to ten digits.
A float -- commonly implemented as 32 bit FP -- has about 6 digits of precision.
This means that there is not enough precision to distinguish e.g. 2,147,483,647 from 2,147,483,646. This is why the compiler generates that warning: Your conversion is not safe for all cases.
The quick fix is to use double, which is commonly implemented as 64 bit FP, having about 15 digits of precision. That's enough to hold a 32bit int (but not enough for a 64bit one).
All in all, rand() is a terrible function, and the standard allows it to be quite bad on the "randomness" as well. If you need good random numbers, you shouldn't rely on the standard library, but use a dedicated third-party solution (which usually includes functions that return floating point right away).
You get this warning because float variables do not have enough precision to hold any possible value an int can take. That means the cast can produce a wrong result if the integer you're casting from is too large. For example, an integer like 1 000 000 451 would be cast to 1.000000E9, making it wrong by 451. Depending on the value of your RAND_MAX this may or may not actually be a problem in your specific case.
It would be safer to use double instead, where you should not have this issue.

Ternary operator in C

Why this program is giving unexpected numbers(ex: 2040866504, -786655336)?
#include <stdio.h>
int main()
{
int test = 0;
float fvalue = 3.111f;
printf("%d", test? fvalue : 0);
return 0;
}
Why it is printing unexpected numbers instead of 0? should it supposed to do implicit typecast? This program is for learning purpose nothing serious.
Most likely, your platform passes floating point values in a floating point register and integer values in a different register (or on the stack). You told printf to look for an integer, so it's looking in the register integers are passed in (or on the stack). But you passed it a float, so the zero was placed in the floating point register that printf never looked at.
The ternary operator follows language rules to decide the type of its result. It can't sometimes be an integer and sometimes be a float. Those could be different sizes, stored in different places, and so on, which would make it impossible to generate sane code to handle both possible result types.
This is a guess. Perhaps something completely different is happening. Undefined behavior is undefined for a reason. These kinds of things can be impossible to predict and very difficult to understand without lots of experience and knowledge of platform and compiler details. Never let someone convince you that UB is okay or safe because it seems to work on their system.
Because you are using %d for printing a float value. Use %f. Using %d to print a float value invokes undefined behavior.
EDIT:
Regarding OP's comments;
Why it is printing random numbers instead of 0?
When you compile this code, compiler should give you a warning:
[Warning] format '%d' expects argument of type 'int', but argument 2 has type 'double' [-Wformat]
This warning is self explanatory that this line of code is invoking an undefined behavior. This is because, the conversion specification %d specifies that printf is to convert an int value from binary to a string of decimal digits, while %f does the same for a float value. On passing the fvalue compiler know that it is of float type but on the other hand it sees that printf expects an argument of type int. In such cases, sometimes it does what you expect, sometimes it does what I expect. Sometimes it does what nobody expects (Nice Comment by David Schwartz).
See the test cases 1 and 2. It is working fine with %f.
should it supposed to do implicit typecast?
No.
Although the existing upvoted answers are correct, I think they are far too technical and ignore the logic a beginner programmer might have:
Let's look at the statement causing confusion in some heads:
printf("%d", test? fvalue : 0);
^ ^ ^ ^
| | | |
| | | - the value we expect, an integral constant, hooray!
| | - a float value, this won't be printed as the test doesn't evaluate to true
| - an integral value of 0, will evaluate to false
- We will print an integer!
What the compiler sees is a bit different. He agrees on the value of test meaning false. He agrees on fvalue beeing a float and 0 an integer. However, he learned that the different possible outcomes of the ternary operator must be of same type! int and float aren't. In this case, "float wins", 0 becomes 0.0f!
Now printf isn't type safe. This means you can falsely say "print me an integer" and pass an float without the compiler noticing. Exactly that happened. No matter what the value of test is, the compiler deduced that the result will be of type float. Hence, your code is equivalent to:
float x = 0.0f;
printf("%d", x);
At this point, you experience undefined behaviour. float simply isn't something integral what is expected by %d.
The observed behaviour is dependent on the compiler and machine you're using. You may see dancing elephants, although most terminals don't support that afaik.
When we have the expression E1 ? E2 : E3, there are four types involved. Expressions E1, E2 and E3 each have a type (and the types of E2 and E3 can be different). Furthermore, the whole expression E1 ? E2 : E3 has a type.
If E2 and E3 have the same type, then this is easy: the overall expression has that type. We can express this in a meta-notation like this:
(T1 ? T2 : T2) -> T2
"The type of a ternary expression whose alterantives are both of the same type T2
is just T2."
If they don't have the same type, things get somewhat interesting, and the situation is quite similar to E2 and E3 being involved together in an arithmetic operation. For instance if you add together an int and float, the int operand is converted to float. That is what is happening in your program. The type situation is:
(int ? float : int) -> float
the test fails, and so the int value 0 is converted to the float value 0.0.
This float value is not compatible with the %d conversion specifier of printf, which requires an int.
More precisely, the float value undergoes one more. When a float is passed as one of the trailing arguments to a variadic function, it is converted to double.
So in fact the double value 0.0 is being passed to printf where it expects int.
In any case, it is undefined behavior: it is nonportable code for which the ISO standard definition of the C language doesn't offer a meaning.
From here on, we can apply platform-specific reasoning why we don't just see 0. Suppose int is a 32 bit, four byte type, and double is the common 64 bit, 8 byte, IEE754 representation, and that an all-bits-zero is used for 0.0. So then, why isn't a 32 bit portion of this all-bits-zero treated by printf as the int value 0?
Quite possibly, the 64 bit double argument value forces 8 byte alignment when it is put onto the stack, possibly moving the stack pointer by four bytes. And then printf pulls out garbage from those four bytes, rather than zero bits from the double value.

Different rounding between assignment and printf

I have a program with two variables of type int.
int num;
int other_num;
/* change the values of num and other_num with conditional increments */
printf ("result = %f\n", (float) num / other_num);
float r = (float) num / other_num;
printf ("result = %f\n", r);
The value written in the first printf is different from the value written by the second printf (by 0.000001, when printed with 6 decimal places).
Before the division, the values are:
num = 10201
other_num = 2282
I've printed the resulting numbers to 15 decimal places. Those numbers diverge in the 7th decimal place which explains the difference in the 6th one.
Here are the numbers with 15 decimal places:
4.470201577563540
4.470201492309570
I'm aware of floating point rounding issues, but I was expecting the calculated result to be the same when performed in the assignment and in the printf argument.
Why is this expectation incorrect?
Thanks.
Probably because FLT_EVAL_METHOD is something other than 0 on your system.
In the first case, the expression (float) num / other_num has nominal type float, but is possibly evaluated at higher precision (if you're on x86, probably long double). It's then converted to double for passing to printf.
In the second case, you assign the result to a variable of type float, which forces dropping of excess precision. The float is then promoted to double when passed to printf.
Of course without actual numbers, this is all just guesswork. If you want a more definitive answer, provide complete details on your problem.
The point is the actual position of the result of the expressions during the execution of the program. C values can live on the memory (which includes caches) or just on registers if the compiler decides that this kind of optimization is possible in the specific case.
In the first printf, the expression result is stored in a register, as the value is just used in the same C instruction, so the compiler thinks (correctly), that it would be useless to store it somewhere less volatile; as result, the value is stored as double or long double depending on the architecture.
In the second case, the compiler did not perform such optimization: the value is stored in a variable within the stack, which is memory, not register; the same value is therefore chopped at the 23th significant bit.
More examples are provided by streflop and its documentation.

Resources