Using 0b representation to print (printf) float - c

I'm trying to express a fractional number in binary and then have it print out as a float. I've done the fixed point to floating point conversion.
The number in decimal: -342.265625
fixed point: -101010110.010001
32-bit float: 11000011101010110010001000000000
64-bit float (double): 1100000001110101011001000100000000000000000000000000000000000000
*I've double checked with an IEEE 754 Converter
*I'm also aware that printf changes floats into doubles to print them, but declaring it as a double should work? I thought...?
Code:
int main()
{
float floaty = 0b11000011101010110010001000000000;
double doubley = 0b1100000001110101011001000100000000000000000000000000000000000000;
printf("Float: %f\n", floaty);
printf("Double: %lf\n", doubley);
}
Output:
Float: 3282772480.000000
Double: 13868100853597995008.000000
The compiler is gcc and the standard is c99

From gcc's documentation:
The type of these constants follows the same rules as for octal or
hexadecimal integer constants, so suffixes like ‘L’ or ‘UL’ can be
applied.
So, the binary numbers you assign to float and double are actually of integer types and don't directly map to the bit pattern of the underlying types you assign to.
In other words, this:
float floaty = 0b11000011101010110010001000000000;
double doubley = 0b1100000001110101011001000100000000000000000000000000000000000000;
is equivalent to:
float floaty = 3282772480;
double doubley = 13868100853597995008;

The problem is that the compiler is trying to help you out. Your literals (0b1...), which by the way is a non-standard extension and should be written as (0x...), are treaded as literals. The compiler then tries its very best to fit those values into the variables you cast them to. As such it produces very big values that are equal to the integer value of your literals.
To directly assign the value of a variable, you have to use unions (or pointers if you don't mind losing a bit of portability). This code works:
#include <stdint.h>
union floatint {
float f;
uint32_t i;
};
union doubleint {
double d;
uint64_t i;
};
int main()
{
floatint floaty;
doubleint doubley;
floaty.i = 0xC3AB2200;
doubley.i = 0xC075644000000000;
printf("Float: %f\n", floaty.f); // implementation-defined, in your case IEEE 754
printf("Double: %lf\n", doubley.d); // ditto
}
Note that this is the very definition of a union, two (or more) types that share the same representation, but are treated differently.

You can use the binary constants with some more work.
We will have to assume the floating point represented using IEEE 754, and the system is in little endian:
uint32_t value = 0b11000011101010110010001000000000;
float f;
memcpy( &f , &value , sizeof( f ) );
printf( "%f\n" , f );

Related

Passing float value from C program to assembler level program using only integer registers?

For my class we are writing a simple asm program (with C and AT&T x86-64) that prints all the bits of an integer or float. I have the integer part working fine. For the float part my professor has instructed us to pass the float value only using integer registers. Not too sure why we're not allowed to use float registers. Regardless, does anyone have ideas on how to go about this?
my professor has instructed us to pass the float value only using integer registers.
A simple approach is to copy the float into an integer using memcpy()
float f = ...;
assert(sizeof f == sizeof(uint32_t));
uint32_t u;
memcpy(&u, &f, sizeof u);
foo(u);
Another is to use a union. Perhaps using a compound literal.
void foo (uint32_t);
int main() {
float f;
assert(sizeof f == sizeof(uint32_t));
// v----------- compound literal -----------v
foo((union { float f; uint32_t u; }) { .f = f}.u);
// ^------ union object ------- ^
}
Both require that the integer type used and the float are the same size.
Other issues include insuring the correct endian of the two, yet very commonly the endians of the float and integer will match.

arithmetic operations using float binary "0b"

I'm trying to understand, I'm a beginner.
I want to do arithmetic operations with float numbers in binary.
I was using http://www.binaryconvert.com/result_float.html to do the conversion
Only he returns me:
1069547520.000000
1069547520.000000
2139095040.000000
What is it?
I was hoping for this:
00111111110000000000000000000000
00111111110000000000000000000000
01000000010000000000000000000000
%f in printf() would be wrong too?
#include <stdio.h>
int main()
{
float a = 0b00111111110000000000000000000000; /* 1.5 */
float b = 0b00111111110000000000000000000000; /* 1.5 */
float c;
c = a + b; /* 3.0 !? */
printf("%f\n", a);
printf("%f\n", b);
printf("%f\n", c);
return 0;
}
The binary constant 0b00111111110000000000000000000000 is an extension of GCC, and it has type int having value 1069547520. This is converted to a float by the same value, i.e. the float closest to 1069547520.
There is no way of having floating point constants in binary in C; but hex is possible. If there were, then 1.5 would be expressed in binary simply as something like
0b1.1f
i.e. its numeric value in binary is 1.1.
C17 (C99, C11) does have support for hexadecimal floating point constants; you can use
0x1.8p0f
for 1.5f; p0 signifies the exponent.
If you really want to fiddle with the IEEE 754 binary format, you need to use an union or memcpy. For example
#include <stdio.h>
#include <string.h>
#include <stdint.h>
int main(void) {
float a;
uint32_t a_v = 0b00111111110000000000000000000000;
memcpy(&a, &a_v, sizeof(float));
printf("%f\n", a);
// prints 1.500000 on linux x86-64
}
Your binary literals are integer literals. Then you print the floating point values as floating point values, not using binary representation.

Why do I get this output with a C union?

#include<stdio.h>
union U{
struct{
int x;
int y;
};
float xy;
};
int main(){
union U u;
u.x = 99;
printf("xy %f\n",u.xy); //output " 0 "
return 0;
}
I have figured out that it has something to do with how float is stored and read internally. Can someone explain it to me exactly?
Converting comments into an answer.
Printing with %f is not very useful; you should consider %g or %e. With %f, if the value is very small, it will be printed as 0.000000 even when it is not zero. (For example, any value smaller than 0.0000005 will be printed as 0.000000.) You need to read about IEEE 754 at Wikipedia, for example, to find out about how such values are represented.
For example, on a Mac running macOS Sierra 10.12.5 using GCC 7.2.0, printing with:
printf("xy %22.16g\n", u.xy);
produces:
xy 1.387285479681569e-43
The range of normal numbers in 4-byte float is normally 10⁺³⁸ to 10⁻³⁸, so a value 1.387…E-43 from a float is a subnormal value (though well within range of 8-byte double values). Remember that float values passed to printf() are promoted to a double automatically because of 'default argument promotions' — printf() never actually receives a float value.
The way float is represented is impacting the result. See How to represent FLOAT number in memory in C and https://softwareengineering.stackexchange.com/questions/215065/can-anyone-explain-representation-of-float-in-memory to see how the float is represented in the memory. You also did not initialize structure variable y. It can have any value in it. It may or may not be used. In your case, the value is very small and you are not printing full value. To see the value of float xy you need to print full value. As suggested in a comment here, if I use below statement in Codeblocks 16.1 on Windows( which contains MinGW compiler) I get value different than 0.000000.
printf("xy %.80f\n",u.xy);
gives me
xy 0.00000000000000000000000000000000000000000013872854796815689002144922874570169700
Yes, you're right, it has to do with the binary representation of a float value, which is defined in the standard document IEEE 754. See this great article by Steve Hollasch for an easy explanation. A float is a 32-bit value, and so is an int. So in your union U, xy falls exactly on the x member of the embedded struct, so when you set x to 99, the bits 1100011 (binary representation of 99) will be reinterpreted in xy as the mantissa of a float. As others have pointed out, this is a very small value, which may be printed as 0, depending on the printf format specifier.
Guessing from the naming of your union members (x, y, xy), I think you wanted to declare something different, e.g.:
union U
{
struct
{
short x;
short y;
};
float xy;
};
Or:
union U
{
struct
{
int x;
int y;
};
double xy;
};
In those declarations, both the x and y members are mapped onto the xy member.
The reason is that the value is very close to zero, so the default 6 digits
of precision isn't enough to display anything.
Try:
union { int i; float f; } u = {.i= 99};
printf("f %g\n", u.f);

Pointers in C programming with double precision

Ok so this what I must do but i can't make it work:
a) Change to float instead of integers. And assign 0.3 as starting value to "u".
b) Use double precision instead of integers. Asign 0.3x10^45 as starting value for "u".
c) Use characters instead of integers. Assign starting value as 'C' for "u".
#include <stdio.h>
main ()
{
int u = 3;
int v;
int *pu;
int *pv;
pu = &u;
v = *pu;
pv = &v;
printf("\nu=%d &u=%X pu=%X *pu=%d", u, &u, pu, *pu);
printf("\n\nv=%d &v=%X pv=%X *pv=%d", v, &v, pv, *pv);
}
I'll be really grateful if anyone could modify my code to do the things above. Thanks
This question is testing a few things. First do you know your types? You are expected to know that a floating pointing number is declared with float, a double precision number with double, and a character with char.
Second you are expected to know how to assign a literal value to those different types. For the float literal you are probably expected to use 0.3f, since without that suffix it would be double precision by default (although in this context it isn't going to make any difference). For the double, you are expected to know how to use scientific notation (the literal value should be 0.3e45). The character literal I would hope is fairly obvious to you.
Finally you are expected to know the various type characters used in the printf format specification. Both single and double precision numbers use the same type characters, but you have a choice of %e, %f or %g, depending on your requirements. I tend to use %g as a good general purpose choice, but my guess is they are expecing you to use %e for the double (because that forces the use of scientific notation) and possibly %f for the float - it depends what you have been taught. For a character you use %c.
Also, note that you should only be replacing the %d type characters in the format strings. The %X values are used to output a hexadecimal representation of the pointers (&u and pu). A pointer isn't going to change into a floating point value or a character just because the type that is being pointed to has changed - an address is always an integer when you are writing it out.

C : how is double number (e.g. 123.45) stored in a float variable or double variable or long double variable?

#include <stdio.h>
int main () {
float a = 123.0;
unsigned char ch, *cp;
ch = 0xff;
cp = (unsigned char *) &a;
printf ("%02x", ch&(*(cp+3)));
printf ("%02x", ch&(*(cp+2)));
printf ("%02x", ch&(*(cp+1)));
printf ("%02x\n", ch&(*(cp+0)));
/*I have written this program to see binary representation, but I can not understand the output, the binary representation?
}
See Wikipedia: http://en.wikipedia.org/wiki/Single_precision_floating-point_format, which describes single precision floating point (a typical C float, but depends on the compiler) as a 1-bit sign, 8-bit biased exponent, and 24-bit mantissa (23-bits stored).
For your example:
123.0 = 42f60000hex = 0 10000101 11101100000000000000000bin
1-bit sign = 0 (indicating positive number)
8-bit biased exponent = 10000101bin = 133dec - 127dec = 6dec
23-bit mantissa = 11101100000000000000000bin = 1.111011bin (note implied leading 1)
Converting 1.111011bin x 26dec = 1111011.0bin = 123.0dec
Guessing about your question how is double number (e.g. 123.45) stored in a float variable or double variable or long double variable?: If you store a double-value (like the literal "123.0") into a float variable, the compiler will static_cast<float> the value so it becomes a valid float value.
So, apart from possible compiler warnings, the following
int main () {
float foo = 123.0;
}
is basically the same as
int main () {
float foo = static_cast<float>(123.0);
}
If you want to explicitly state a float-literal, use the f or F postfix:
int main () {
float foo = 123.0f; // alternatively: 123.f, 123.F
}
edit: From the Standard
Just looked up the grammar for floating-literals, for the curious:
floating-literal:
fractional-constant exponent-part_opt floating-suffix_opt
digit-sequence exponent-part floating-suffix_opt
fractional-constant:
digit-sequence_opt . digit-sequence
digit-sequence .
exponent-part:
e sign_opt digit-sequence
E sign_opt digit-sequence
Here some examples for floating-point literals that require no conversion (but possibly rounding):
float a = 1.f,
b = 1.0f,
c = .0f,
d = 1e1f,
e = 1.e1f,
f = 1e-1f,
g = 1e+1f,
h = 1E+1F;
If conversion is needed, e.g.
float a = 1., // double
b = 1.L,// long double
c = 1; // integer
The following applies:
4.8 Floating point conversions [conv.double]
An rvalue of floating point type can be converted to an rvalue of another floating point type. If the source
value can be exactly represented in the destination type, the result of the conversion is that exact representation.
If the source value is between two adjacent destination values, the result of the conversion is an
implementation-defined choice of either of those values. Otherwise, the behavior is undefined.
4.9 Floating-integral conversions_ [conv.fpint]:
An rvalue of an integer type or of an enumeration type can be converted to an rvalue of a floating point type. The result is exact if possible. An rvalue of an integer type or of an enumeration type can be converted to an rvalue of a floating point
type. The result is exact if possible. Otherwise, it is an implementation-defined choice of either the next
lower or higher representable value.
So in summary, if you put a literal of type double or long double (or some integer) into a float, the compiler will implicitly convert that value, if it can be converted exactly. Otherwise, it is platform-dependent how the result is stored; in case the value exceeds the representable range, you enter the world of undefined behaviour *.
* The Dreaded Realm of Undefined Behaviour, where an evil compiler writer may find it funny to beep out an infernal sound through your loudspeakers and make you bleed through the ears, and still be sanctioned by the standard (not necessarily by local laws, though).
To understand how the binary layout of floating point variables works, I recommend you to read the wikipedia article of the according standardization.
In a nutshell, all floating point numbers (float, double, long double, respectively implementations of half) consist of a mantisse and an exponent to represent the number.

Resources