Error on casting unsigned int to float - c

For the following program.
#include <stdio.h>
int main()
{
unsigned int a = 10;
unsigned int b = 20;
unsigned int c = 30;
float d = -((a*b)*(c/3));
printf("d = %f\n", d);
return 0;
}
It is very strange that output is
d = 4294965248.000000
When I change the magic number 3 in the expression to calculate d to 3.0, I got correct result:
d = 2000.000000
If I change the type of a, b, c to int, I also got correct result.
I guess this error occurred by the conversion from unsigned int to float, but I do not know details about how the strange result was created.

I think you realize that you casting minus to unsigned int before assignment to float. If you run the below code, you will get highly likely 4294965296
#include <stdio.h>
int main()
{
unsigned int a = 10;
unsigned int b = 20;
unsigned int c = 30;
printf("%u", -((a*b)*(c/3)));
return 0;
}
The -2000 to the right of your equals sign is set up as a signed
integer (probably 32 bits in size) and will have the hexadecimal value
0xFFFFF830. The compiler generates code to move this signed integer
into your unsigned integer x which is also a 32 bit entity. The
compiler assumes you only have a positive value to the right of the
equals sign so it simply moves all 32 bits into x. x now has the
value 0xFFFFF830 which is 4294965296 if interpreted as a positive
number. But the printf format of %d says the 32 bits are to be
interpreted as a signed integer so you get -2000. If you had used
%u it would have printed as 4294965296.
#include <stdio.h>
#include <limits.h>
int main()
{
float d = 4294965296;
printf("d = %f\n\n", d);
return 0;
}
When you convert 4294965296 to float, the number you are using is long to fit into the fraction part. Now that some precision was lost. Because of the loss, you got 4294965248.000000 as I got.
The IEEE-754 floating-point standard is a standard for representing
and manipulating floating-point quantities that is followed by all
modern computer systems.
bit 31 30 23 22 0
S EEEEEEEE MMMMMMMMMMMMMMMMMMMMMMM
The bit numbers are counting from the least-significant bit. The first
bit is the sign (0 for positive, 1 for negative). The following
8 bits are the exponent in excess-127 binary notation; this
means that the binary pattern 01111111 = 127 represents an exponent
of 0, 1000000 = 128, represents 1, 01111110 = 126 represents
-1, and so forth. The mantissa fits in the remaining 24 bits, with
its leading 1 stripped off as described above. Source
As you can see, when doing conversion 4294965296 to float, precision which is 00011000 loss occurs.
11111111111111111111100 00011000 0 <-- 4294965296
11111111111111111111100 00000000 0 <-- 4294965248

This is because you use - on an unsigned int. The - inverts the bits of the number. Lets print some unsigned integers:
printf("Positive: %u\n", 2000);
printf("Negative: %u\n", -2000);
// Output:
// Positive: 2000
// Negative: 4294965296
Lets print the hex values:
printf("Positive: %x\n", 2000);
printf("Negative: %x\n", -2000);
// Output
// Positive: 7d0
// Negative: fffff830
As you can see, the bits are inverted. So the problem comes from using - on unsigned int, not from casting unsigned intto float.

As others have said, the issue is that you are trying to negate an unsigned number. Most of the solutions already given have you do some form of casting to float such that the arithmetic is done on floating point types. An alternate solution would be to cast the results of your arithmetic to int and then negate, that way the arithmetic operations will be done on integral types, which may or may not be preferable, depending on your actual use-case:
#include <stdio.h>
int main(void)
{
unsigned int a = 10;
unsigned int b = 20;
unsigned int c = 30;
float d = -(int)((a*b)*(c/3));
printf("d = %f\n", d);
return 0;
}

Your whole calculation will be done unsigned so it is the same as
float d = -(2000u);
-2000 in unsigned int (assuming 32bits int) is 4294965295
this gets written in your float d. But as float can not save this exact number it gets saved as 4294965248.
As a rule of thumb you can say that float has a precision of 7 significant base 10 digits.
What is calculated is 2^32 - 2000 and then floating point precision does the rest.
If you instead use 3.0 this changes the types in your calculation as follows
float d = -((a*b)*(c/3.0));
float d = -((unsigned*unsigned)*(unsigned/double));
float d = -((unsigned)*(double));
float d = -(double);
leaving you with the correct negative value.

you need to cast the ints to floats
float d = -((a*b)*(c/3));
to
float d = -(((float)a*(float)b)*((float)c/3.0));

-((a*b)*(c/3)); is all performed in unsigned integer arithmetic, including the unary negation. Unary negation is well-defined for an unsigned type: mathematically the result is modulo 2N where N is the number of bits in unsigned int. When you assign that large number to the float, you encounter some loss of precision; the result, due to its binary magnitude, is the nearest number to the unsigned int that divides 2048.
If you change 3 to 3.0, then c / 3.0 is a double type, and the result of a * b is therefore converted to a double before being multiplied. This double is then assigned to a float, with the precision loss already observed.

Related

%d for unsigned integer

I accidentally used "%d" to print an unsigned integer using an online compiler. I thought errors would pop out, but my program can run successfully. It's good that my codes are working, but I just don't understand why.
#include <stdio.h>
int main() {
unsigned int x = 1
printf( "%d", x);
return 0;
}
The value of the "unsigned integer" was small enough that the MSB (most significant bit) was not set. If it were, printf() would have treated the value as a "negative signed integer" value.
int main() {
uint32_t x = 0x5;
uint32_t y = 0xC0000000;
printf( "%d %u %d\n", x, y, y );
return 0;
}
5 3221225472 -1073741824
You can see the difference.
With new-fangled compilers that "read into" printf format specifiers and match those with the datatypes of following parameters, it may be that the online compiler may-or-may-not have been able to report this type mismatch with a warning. This may be something you will want to look into.
refer to printf() manual, they said:
A character that specifies the type of conversion to be applied. The
conversion specifiers and their meanings are:
d, i
The int argument is
converted to signed decimal notation. The precision, if any, gives the
minimum number of digits that must appear; if the converted value
requires fewer digits, it is padded on the left with zeros. The
default precision is 1. When 0 is printed with an explicit precision
0, the output is empty.
so it means that the parameter even if it's in unsigned representation, it will be converted into its signed int representation and printed, see the following code example:
#include <stdio.h>
int main(){
signed int x1 = -2147483648;
unsigned int x2 = -2147483648;
unsigned long long x3 = -2147483648;
printf("signed int x1 = %d\n", x1);
printf("unsigned int x2 = %d\n", x2);
printf("signed long long x3 = %d\n", x3);
}
and this is the output:
signed int x1 = -2147483648
unsigned int x2 = -2147483648
signed long long x3 = -2147483648
so it means no matter what is the type of the variable printed, as long as you specified %d as format specifier, the variable will be converted into its representation in signed int and be printed
in case of unsigned char like for example:
#include <stdio.h>
int main(){
unsigned char s = -10;
printf("s = %d",s);
}
the output is :
s = 246
as the binary representation of unsigned char s = -10 is :
1111 0110
where the MSB bit is 1, but when it's converted into signed int, the new representation is :
0000 0000 0000 0000 0000 0000 1111 0110
so the MSB is no longer have that 1 bit in its MSB which represents whether the number is positive or negative.

what does float cf = *(float *)&ci; in C do?

i'm trying to find out what this program prints exactly.
#include <stdio.h>
int main() {
float bf = -62.140625;
int bi = *(int *)&bf;
int ci = bi+(1<<23);
float cf = *(float *)&ci;
printf("%X\n",bi);
printf("%f\n",cf);
}
This prints out:
C2789000
-124.281250
But what happens line by line ? I do not understand .
Thanks in advance.
It is a convoluted way of doubling an 32bit floating point number by adding one to its exponent. Moreover it is incorrect due to violation of strict aliasing rule by accesing object if type float via type int.
Exponent is located at bits number 23 to 30. Adding 1<<23 increment the exponent by one what works like multiplication of the original number by 2.
If we rewrite this program to remove pointer punning
int main() {
float bf = -62.140625;
memcpy(&bi, &bf, sizeof(bi));
for(int i = 0; i < 32; i += 8)
printf("%02x ", ((unsigned)bi & (0xff << i)) >> i);
bi += (1<<23);
memcpy(&bf, &bi, sizeof(bi));;
printf("%f\n",bf);
}
Float numbers have the format:
-62.140625 has exponent == 0.
bi += (1<<23);
sets the exponent to 1 so the resulting float number will be -62.140625 * 2^1 and it is equal to -124.281250. If you change that line to
bi += (1<<24);
it will set the exponent to 4 so the resulting float number will be -62.140625 * 2^2 and it is equal to -248.562500.
float bf = -62.140625;
This creates a float object named bf and initializes it to −62.140625.
int bi = *(int *)&bf;
&bf takes the address of bf, which produces a pointer to a float. (int *) says to convert this to a pointer to an int. Then * says to access the pointed-to memory, as if it were an int.
The C standard does not define the behavior of this access, but many C implementations support it, sometimes requiring a command-line switch to enable support for it.
A float value is normally encoded in some way. −62.140625 is not an integer, so it cannot be stored as a binary numeral that represents an integer. It is encoded. Reinterpreting the bytes memory as an int using * (int *) &bf is an attempt to get the bits into an int so they can be manipulated directly, instead of through floating-point operations.
int ci = bi+(1<<23);
The format most commonly used for the float type is IEEE-754 binary32, also called “single precision.” In this format, bit 31 is a sign bit, bits 30-23 encode an exponent and/or some other information, and bits 22-0 encode most of a significand (or, in the case of a NaN, other information). (The significand is the fraction part of a floating-point representation. A floating-point format represents a number as ±F•be, where b is a fixed base, F is a number with a fixed precision in a certain range, and e is an exponent in a certain range. F is the significand.)
1<<23 is 1 shifted 23 bits, so it is 1 in the exponent field, bits 30-23.
If the exponent field contains 1 to 1021, then adding 1 to it increases the encoded exponent by 1. (The codes 0 and 1023 have special meaning in the exponent field. 1022 is a normal value, but adding 1 to it overflows the exponent in the special code 1023, so it will not increase the exponent in a normal way.)
Since the base b of a binary floating-point format is 2, increasing the exponent by 1 multiplies the number represented by 2. ±F•be becomes ±F•be+1.
float cf = *(float *)&ci;
This is the opposite of the previous reinterpretation: It says to reinterpet the bytes of the int as a float.
printf("%X\n",bi);
This says to print bi using a hexadecimal format. This is technically wrong; the %X format should be used with an unsigned int, not an int, but most C implementations let it pass.
printf("%f\n",cf);
This prints the new float value.

How to create a negative binary number using signed/unsigned in C?

My thoughts: if one declares an int it basically gets an unsigned int. So if I need a negative value I have to explicitly create a signed int.
I tried
int a = 0b10000101;
printf("%d", a); // i get 138 ,what i've expected
signed int b = 0b10000101; // here i expect -10, but i also get 138
printf("%d", b); // also tried %u
So am I wrong that an signed integer in binary is a negative value?
How can I create a negative value in binary format?
Edit Even if I use 16/32/64 bits I get the same result. unsigned/signed doest seems to make a difference without manually shifting the bits.
If numbers are represented as two's complement you just need to have the sign bit set to ensure that the number is negative. That's the MSB. If an int is 32 bits, then 0b11111111111111111111111111111111 is -1, and 0b10000000000000000000000000000000 is INT_MIN.
To adjust for the size int(8|16|64)_t, just change the number of bits. The sign bit is still the MSB.
Keep in mind that, depending on your target, int could be 2 or 4 bytes. This means that int a=0b10000101 is not nearly enough bits to set the sign bit.
If your int is 4 bytes, you need 0b10000000 0000000 0000000 00000000 (spaces added for clarity).
For example on a 32-bit target:
int b = 0b11111111111111111111111111111110;
printf("%d\n", b); // prints -2
because int a = 0b10000101 has only 8 bits, where you need 16 or 32. Try thi:
int a = 0b10000000000000000000000000000101
that should create negative number if your machine is 32bits. If this does not work try:
int a = 0b1000000000000101
there are other ways to produce negative numbers:
int a = 0b1 << 31 + 0b101
or if you have 16 bit system
int a = 0b1 << 15 + 0b101
or this one would work for both 32 or 16 bits
int a = ~0b0 * 0b101
or this is another one that would work on both if you want to get -5
int a = ~0b101 + 1
so 0b101 is 5 in binary, ~0b101 gives -6 so to get -5 you add 1
EDIT:
Since I now see that you have confusion of what signed and unsigned numbers are, I will try to explain it as simple as possible int
So when you have:
int a = 5;
is the same as:
signed int a = 5;
and both of them would be positive. Now it would be the same as:
unsigned int a = 5;
because 5 is positive number.
On the other hand if you have:
int a = -5;
this would be the same as
signed int a = -5;
but it would not be the same as following:
unsigned int a = -5;
the first 2 would be -5, the third one is not the same. In fact it would be the same if you entered 4294967291 because they are the same in binary form but the fact that you have unsigned in front means that compiler would store it the same way but treat it as positive value.
How to create a negative binary number using signed/unsigned in C?
Simply negate the constant of a positive value. To attempt to do so with many 1's
... 1110110 assumes a bit width for int. Better to be portable.
#include <stdio.h>
int main(void) {
#define NEGATIVE_BINARY_NUMBER (-0b1010)
printf("%d\n", NEGATIVE_BINARY_NUMBER);
}
Output
-10

Getting the mantissa (of a float) of either an unsigned int or a float (C)

So, i am trying to program a function which prints a given float number (n) in its (mantissa * 2^exponent) format. I was abled to get the sign and the exponent, but not the mantissa (whichever the number is, mantissa is always equal to 0.000000). What I have is:
unsigned int num = *(unsigned*)&n;
unsigned int m = num & 0x007fffff;
mantissa = *(float*)&m;
Any ideas of what the problem might be?
The C library includes a function that does this exact task, frexp:
int expon;
float mant = frexpf(n, &expon);
printf("%g = %g * 2^%d\n", n, mant, expon);
Another way to do it is with log2f and exp2f:
if (n == 0) {
mant = 0;
expon = 0;
} else {
expon = floorf(log2f(fabsf(n)));
mant = n * exp2f(-expon);
}
These two techniques are likely to give different results for the same input. For instance, on my computer the frexpf technique describes 4 as 0.5 × 23 but the log2f technique describes 4 as 1 × 22. Both are correct, mathematically speaking. Also, frexp will give you the exact bits of the mantissa, whereas log2f and exp2f will probably round off the last bit or two.
You should know that *(unsigned *)&n and *(float *)&m violate the rule against "type punning" and have undefined behavior. If you want to get the integer with the same bit representation as a float, or vice versa, use a union:
union { uint32_t i; float f; } u;
u.f = n;
num = u.i;
(Note: This use of unions is well-defined in C since roughly 2003, but, due to the C++ committee's long-standing habit of not paying sufficient attention to changes going into C, it is not officially well-defined in C++.)
You should also know IEEE floating-point numbers use "biased" exponents. When you initialize a float variable's mantissa field but leave its exponent field at zero, that gives you the representation of a number with a large negative exponent: in other words, a number so small that printf("%f", n) will print it as zero. Whenever printf("%f", variable) prints zero, change %f to %g or %a and rerun the program before assuming that variable actually is zero.
You are stripping off the bits of the exponent, leaving 0. An exponent of 0 is special, it means the number is denormalized and is quite small, at the very bottom of the range of representable numbers. I think you'd find if you looked closely that your result isn't quite exactly zero, just so small that you have trouble telling the difference.
To get a reasonable number for the mantissa, you need to put an appropriate exponent back in. If you want a mantissa in the range of 1.0 to 2.0, you need an exponent of 0, but adding the bias means you really need an exponent of 127.
unsigned int m = (num & 0x007fffff) | (127 << 23);
mantissa = *(float*)&m;
If you'd rather have a fully integer mantissa you need an exponent of 23, biased it becomes 150.
unsigned int m = (num & 0x007fffff) | ((23+127) << 23);
mantissa = *(float*)&m;
In addition to zwol's remarks: if you want to do it yourself you have to acquire some knowledge about the innards of an IEEE-754 float. Once you have done so you can write something like
#include <stdlib.h>
#include <stdio.h>
#include <math.h> // for testing only
typedef union {
float value;
unsigned int bits; // assuming 32 bit large ints (better: uint32_t)
} ieee_754_float;
// clang -g3 -O3 -W -Wall -Wextra -Wpedantic -Weverything -std=c11 -o testthewest testthewest.c -lm
int main(int argc, char **argv)
{
unsigned int m, num;
int exp; // the exponent can be negative
float n, mantissa;
ieee_754_float uf;
// neither checks nor balances included!
if (argc == 2) {
n = atof(argv[1]);
} else {
exit(EXIT_FAILURE);
}
uf.value = n;
num = uf.bits;
m = num & 0x807fffff; // extract mantissa (i.e.: get rid of sign bit and exponent)
num = num & 0x7fffffff; // full number without sign bit
exp = (num >> 23) - 126; // extract exponent and subtract bias
m |= 0x3f000000; // normalize mantissa (add bias)
uf.bits = m;
mantissa = uf.value;
printf("n = %g, mantissa = %g, exp = %d, check %g\n", n, mantissa, exp, mantissa * powf(2, exp));
exit(EXIT_SUCCESS);
}
Note: the code above is one of the quick&dirty(tm) species and is not meant for production. It also lacks handling for subnormal (denormal) numbers, a thing you must include. Hint: multiply the mantissa with a large power of two (e.g.: 2^25 or in that ballpark) and adjust the exponent accordingly (if you took the value of my example subtract 25).

How does C convert signed hex number to negative numbers?

So recently, I met this problem:
Given: int x; unsigned int y; x = 0xAB78; y = 0xAB78; write a program to display the decimal values of x and y on the screen.
And here is the program I wrote: ( I am on a 64-bit windows 7 machine)
#include<stdio.h>
#include<math.h>
int main()
{
short int x;
unsigned short int y;
x = 0xAB78;
y = 0xAB78;
printf("The decimal values of x and y are: %d & %hu.\n", x, y);
return 0;
}
The output I got is:
x=-21640, and y=43896.
I am ok with the unsigned hex number,
since 0xAB78 = 10*16^3 + 11*16^2 + 7*16^1 + 8*16^0 = 43896.
However for the signed hex number,
should it be: -1*16^3 + 11*16^2 + 7*16^1 +8*16^0 = -1160?
why is it -21640 then?
Thank you for your time!
signed short int range values is [-32768 ; 32767]. Value 0xAB78 (or 43896) won't fit into, so it overflows into -32768 + (43896 - 32768) thus ending in -21640.
If the constant does not fit in your signed integer type, the result is implementation-defined.
C allows 1s-complement, sign-and-magnitude and 2s-complement for signed numbers. Also, short has minimum width 16 value-bits and may contain non-value bits. Trap-representations are also possible.
That gives quite a lot of possible results.
For POSIX and Windows machines, 16-bit 2s complement is mandatory, thus just reduce the constant modulo 2**16. If that is >= 2**15, subtract 2**16.

Resources