Struggling with union output - c

I'm trying to figure out this code for about an hour and still no luck.
#include <stdio.h>
#include <stdlib.h>
int f(float f)
{
union un {float f; int i;} u = {f};
return (u.i&0x7F800000) >> 23;
}
int main()
{
printf("%d\n", f(1));
return 0;
}
I don't understand how this work, I've tried f(1), f(2), f(3), f(4) and of course getting the different results. I've also read a lot about unions and stuff. What I have noticed that when i delete 0x7F800000 from return, results will be the same. I wanna know how u.i is generated, obviously it is not some random garbage but also it is not one (1) from function argument. What is going on here, how does it work?

This really amounts to an understanding of how floating point numbers are represented in memory. (see IEEE 754).
In short, a 32-bit floating point number will have the following structure
bit 31 will be the sign bit for the overall number
bits 30 - 23 will be exponent for the number, biased 127
bits 22 - 0 will represent the fractional part of the number. This is normalized such that the digit before the decimal (actually binary) point is one.
With regards to the union, recall that a union is a block of computer memory that can hold one of the types at at time, so the declaration:
union un
{
float f;
int i;
};
is creating a 32-bit block of memory that can either hold a floating point number or an integer, at any given time. Now when we call the function with a floating point parameter, the bit-pattern of that number is written to the memory location of un. Now when we access the union using the i member, the bit pattern is treated as an integer.
Thus, the general layout of a 32-bit floating point number is seee eeee efff ffff ffff ffff ffff ffff, whese s represents the sign bit, e the exponent bits and f the fraction bits. OK, kind of gibberish, hopefully an example might help.
To convert 4 into IEEE floating point, first convert 7 into binary (I've split te 32-bit number into 4-bit nibbles);
4 = 0000 0000 0000 0000 0000 0000 0000 0111
Now we need to normalize this, i.e. express this as a number raised to the power of two;
1.11 x 2^2
Here we need to remember that each power of two move the binary point to the right on place (analogous to dealing with powers of 10).
From this, we now can generate the bit pattern
the overall sign of the number is positive, so the overall sign bit is 0.
the exponent is 2, but we bias the exponent with 127. This means that an exponent of -127 would be stored a 0, while an exponent of 127 would be stored as 255. Thus our exponent field would be 129 or 1000 0001.
Finally our normalized number would be 1100 0000 0000 0000 0000 000 000. Notice we have dropped the leading `1' because it always assumed to be there.
Putting this all together, we have as the bit pattern:
4 = 0100 0000 1110 0000 0000 0000 0000 0000
Now, the last little bit here is the bit-wise and with 0x7F800000 which if we
write out in binary is 0111 1111 1000 0000 0000 0000 0000 0000, If we compare this to the general lay out of an IEEE floating point number, we see that what we are selecting with the mask is the exponent bits, and then we are shifting the to the left 23 bits.
So your program is just printing out the biased exponent of a floating point number. As an example,
#include <stdio.h>
#include <stdlib.h>
int f(float f)
{
union un {float f; int i;} u = {f};
return (u.i&0x7F800000) >> 23;
}
int main()
{
printf("%d\n", f(7));
return 0;
}
gives an output of 129 as we would expect.

Related

How does the compiler treats printing unsigned int as signed int?

I'm trying to figure out why the following code:
{
unsigned int a = 10;
a = ~a;
printf("%d\n", a);
}
a will be 00001010 to begin with, and after NOT opertaion, will transform
into 11110101.
What happens exactly when one tries to print a as signed integer, that makes
the printed result to be -11?
I thought i would end up seeing -5 maybe (according to the binary representation), but not -11.
I'll be glad to get a clarification on the matter.
2's complement notation is used to store negative numbers.
The number 10 is 0000 0000 0000 0000 0000 0000 0000 1010 in 4 byte binary.
a=~a makes the content of a as 1111 1111 1111 1111 1111 1111 1111 0101.
This number when treated as signed int will tell the compiler to
take the most significant bit as sign and rest as magnitude.
The 1 in the msb makes the number a negative number.
Hence 2's complement operation is performed on the remaining bits.
Thus 111 1111 1111 1111 1111 1111 1111 0101 becomes
000 0000 0000 0000 0000 0000 0000 1011.
This when interpreted as a decimal integer becomes -11.
When you write a = ~a; you reverse each an every bit in a, what is also called a complement to 1.
The representation of a negative number is declared as implementation dependant, meaning that different architectures could have different representation for -10 or -11.
Assuming a 32 architecture on a common processor that uses complement to 2 to represent negative numbers -1 will be represented as FFFFFFFF (hexadecimal) or 32 bits to 1.
~a will be represented as = FFFFFFF5 or in binary 1...10101 which is the representation of -11.
Nota: the first part is always the same and is not implementation dependant, ~a is FFFFFFF5 on any 32 bits architecture. It is only the second part (-11 == FFFFFFF5) that is implementation dependant. BTW it would be -10 on an architecture that would use complement to 1 to represent negative numbers.

Right bit-shift giving wrong result, can someone explain

I'm right-shifting -109 by 5 bits, and I expect -3, because
-109 = -1101101 (binary)
shift right by 5 bits
-1101101 >>5 = -11 (binary) = -3
But, I am getting -4 instead.
Could someone explain what's wrong?
Code I used:
int16_t a = -109;
int16_t b = a >> 5;
printf("%d %d\n", a,b);
I used GCC on linux, and clang on osx, same result.
The thing is you are not considering negative numbers representation correctly. With right shifting, the type of shift (arithmetic or logical) depends on the type of the value being shifted. If you cast your value to an unsigned value, you might get what you are expecting:
int16_t b = ((unsigned int)a) >> 5;
You are using -109 (16 bits) in your example. 109 in bits is:
00000000 01101101
If you take's 109 2's complement you get:
11111111 10010011
Then, you are right shifting by 5 the number 11111111 10010011:
__int16_t a = -109;
__int16_t b = a >> 5; // arithmetic shifting
__int16_t c = ((__uint16_t)a) >> 5; // logical shifting
printf("%d %d %d\n", a,b,c);
Will yield:
-109 -4 2044
The result of right shifting a negative value is implementation defined behavior, from the C99 draft standard section 6.5.7 Bitwise shift operators paragraph 5 which says (emphasis mine going forward):
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type
or if E1 has a signed type and a nonnegative value, the value of the result is the integral
part of the quotient of E1 / 2E2. If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
If we look at gcc C Implementation-defined behavior documents under the Integers section it says:
The results of some bitwise operations on signed integers (C90 6.3, C99 and C11 6.5).
Bitwise operators act on the representation of the value including both the sign and value bits, where the sign bit is considered immediately above the highest-value value bit. Signed ‘>>’ acts on negative numbers by sign extension.
That's pretty clear what's happening, when representing signed integers, negative integers have a property which is, sign extension, and the left most significant bit is the sign bit.
So, 1000 ... 0000 (32 bit) is the biggest negative number that you can represent, with 32 bits.
Because of this, when you have a negative number and you shift right, a thing called sign extension happens, which means that the left most significant bit is extended, in simple terms it means that, for a number like -109 this is what happens:
Before shifting you have (16bit):
1111 1111 1001 0011
Then you shift 5 bits right (after the pipe are the discarded bits):
1XXX X111 1111 1100 | 1 0011
The X's are the new spaces that appear in your integer bit representation, that due to the sign extension, are filled with 1's, which give you:
1111 1111 1111 1100 | 1 0011
So by shifting: -109 >> 5, you get -4 (1111 .... 1100) and not -3.
Confirming results with the 1's complement:
+3 = 0... 0000 0011
-3 = ~(0... 0000 0011) + 1 = 1... 1111 1100 + 1 = 1... 1111 1101
+4 = 0... 0000 0100
-4 = ~(0... 0000 0100) + 1 = 1... 1111 1011 + 1 = 1... 1111 1100
Note: Remember that the 1's complement is just like the 2's complement with the diference that you first must negate the bits of positive number and only then sum +1.
Pablo's answer is essentially correct, but there are two small bits (no pun intended!) that may help you see what's going on.
C (like pretty much every other language) uses what's called two's complement, which is simply a different way of representing negative numbers (it's used to avoid the problems that come up with other ways of handling negative numbers in binary with a fixed number of digits). There is a conversion process to turn a positive number in two's complement (which looks just like any other number in binary - except that the furthest most left bit must be 0 in a positive number; it's basically the sign place-holder) is reasonably simple computationally:
Take your number
00000000 01101101 (It has 0s padding it to the left because it's 16 bits. If it was long, it'd be padded with more zeros, etc.)
Flip the bits
11111111 10010010
Add one.
11111111 10010011.
This is the two's complement number that Pablo was referring to. It's how C holds -109, bitwise.
When you logically shift it to the right by five bits you would APPEAR to get
00000111 11111100.
This number is most definitely not -4. (It doesn't have a 1 in the first bit, so it's not negative, and it's way too large to be 4 in magnitude.) Why is C giving you negative 4 then?
The reason is basically that the ISO implementation for C doesn't specify how a given compiler needs to treat bit-shifting in negative numbers. GCC does what's called sign extension: the idea is to basically pad the left bits with 1s (if the initial number was negative before shifting), or 0s (if the initial number was positive before shifting).
So instead of the 5 zeros that happened in the above bit-shift, you instead get:
11111111 11111100. That number is in fact negative 4! (Which is what you were consistently getting as a result.)
To see that that is in fact -4, you can just convert it back to a positive number using the two's complement method again:
00000000 00000011 (bits flipped)
00000000 00000100 (add one).
That's four alright, so your original number (11111111 11111100) was -4.

Fraction to right of radix - Floating point conversion

When converting a number from base 10 to binary using the floating point bit model, what determines how many zeros you "zero pad" the fraction to the right of the radix?
Take for example -44.375
It was a question on a test in my systems programming course, and below is the answer the prof provided the class with... I posted this because most comments below seem to argue what my professor states in the answer and causing some confusion.
Answer: 1 1000 0100 0110 0011 0000 0000 0000 000
-- sign bit: 1
-- fixed point: -44.375 = 25 + 23 + 22 + 2-2 + 2-3
= 101100.011
= 1.01100011 * 2<sup>5</sup>
-- exponent: 5 + 127 = 132 = 1000 0100
-- fraction: 0110 0011 0000 0000 0000 000
Marking:
-- 1 mark for correct sign bit
-- 2 marks for correct fixed point representation
-- 2 marks for correct exponent (in binary)
-- 2 marks for correct fraction (padded with zeros)
Unless the float is very small, there is no left "zero pad" of the fraction.
The sample here is -1.63 (in hexadecimal) * power(2,5 (decimal)).
The exponent is adjusted until the leading digit is 1.
printf("%a\n", -44.375);
// -0x1.63p+5
[Edit]
Your prof wants to see "2 marks for correct fraction (padded with zeros)" as the number of bits in a float, so the significand in your example is
1.0110 0011 0000 0000 0000 000
The leading 1 is not stored explicitly in a typical float.
OP "what determines how many zeros you "zero pad" the fraction to the right of the radix?
A: IEEE 754 binary32 (a popular float implementation) has a 24 bit significand. A lead bit (usually 1) and a 23-bit fraction. Thus your "right" zero padding goes out to fill 23 places.
To determine the significand of an IEEE-754 32-bit binary floating-point value:
Figure out where the leading (most significant) 1 bit is. That is the starting point. Calculate 23 more bits. If there is anything left over, round it into last of the 24 bits (carrying as necessary).
Exception: If the leading bit is less than 2-126, use the 2-126 bit as the starting point, even though it is zero.
That gives the mathematical significand. To get the bits for the significand field, remove the first bit. (And, if the exception was used, set the encoded exponent to zero instead of the normal value.)
Another exception: If the leading bit, after rounding, is 2128 or greater, the conversion overflows. Set the result to infinity.

Converting IEEE 754 Float to MIL-STD-1750A Float

I am trying to convert a IEEE 754 32 bit single precision floating point value (standard c float variable) to an unsigned long variable in the format of MIL-STD-1750A. I have included the specification for both IEEE 754 and MIL-STD-1750A at the bottom of the post. Right now, I am having issues in my code with converting the exponent. I also see issues with converting the mantissa, but I haven't gotten to fixing those yet. I am using the examples listed in Table 3 in the link above to confirm if my program is converting properly. Some of those examples do not make sense to me.
How can these two examples have the same exponent?
.5 x 2^0 (0100 0000 0000 0000 0000 0000 0000 0000)
-1 x 2^0 (1000 0000 0000 0000 0000 0000 0000 0000)
.5 x 2^0 has one decimal place, and -1 has no decimal places, so the value for .5 x 2^0 should be
.5 x 2^0 (0100 0000 0000 0000 0000 0000 0000 0010)
right? (0010 instead of 0001, because 1750A uses plus 1 bias)
How can the last example use all 32 bits and the first bit be 1, indicating a negative value?
0.7500001x2^4 (1001 1111 1111 1111 1111 1111 0000 0100)
I can see that a value with a 127 exponent should be 7F (0111 1111) but what about a value with a negative 127 exponent? Would it be 81 (1000 0001)? If so, is it because that is the two's complement +1 of 127?
Thank you
1) How can these two examples have the same exponent?
As I understand it, the sign and mantissa effectively define a 2's-complement value in the range [-1.0,1.0).
Of course, this leads to redundant representations (0.125*21 = 0.25*20, etc.) So a canonical normalized representation is chosen, by disallowing mantissa values in the range [-0.5,0.5).
So in your two examples, both -1.0 and 0.5 fall into the "allowed" mantissa range, so they both share the same exponent value.
2) How can the last example use all 32 bits and the first bit be 1, indicating a negative value?
That doesn't look right to me; how did you obtain that representation?
3) What about a value with a negative 127 exponent? Would it be 81 (1000 0001)?
I believe so.
Remember the fraction is a "signed fraction". The signed values are stored in 2's complement format. So think of the zeros as ones.
Thus the number can be written as -0.111111111111111111111 (base 2) x 2^0
, which is close to one (converges to 1.0 if my math is correct)
On the last example, there is a negative sign in the original document (-0.7500001x2^4)

What is a difference between unsigned int and signed int in C?

Consider these definitions:
int x=5;
int y=-5;
unsigned int z=5;
How are they stored in memory? Can anybody explain the bit representation of these in memory?
Can int x=5 and int y=-5 have same bit representation in memory?
ISO C states what the differences are.
The int data type is signed and has a minimum range of at least -32767 through 32767 inclusive. The actual values are given in limits.h as INT_MIN and INT_MAX respectively.
An unsigned int has a minimal range of 0 through 65535 inclusive with the actual maximum value being UINT_MAX from that same header file.
Beyond that, the standard does not mandate twos complement notation for encoding the values, that's just one of the possibilities. The three allowed types would have encodings of the following for 5 and -5 (using 16-bit data types):
two's complement | ones' complement | sign/magnitude
+---------------------+---------------------+---------------------+
5 | 0000 0000 0000 0101 | 0000 0000 0000 0101 | 0000 0000 0000 0101 |
-5 | 1111 1111 1111 1011 | 1111 1111 1111 1010 | 1000 0000 0000 0101 |
+---------------------+---------------------+---------------------+
In two's complement, you get a negative of a number by inverting all bits then adding 1.
In ones' complement, you get a negative of a number by inverting all bits.
In sign/magnitude, the top bit is the sign so you just invert that to get the negative.
Note that positive values have the same encoding for all representations, only the negative values are different.
Note further that, for unsigned values, you do not need to use one of the bits for a sign. That means you get more range on the positive side (at the cost of no negative encodings, of course).
And no, 5 and -5 cannot have the same encoding regardless of which representation you use. Otherwise, there'd be no way to tell the difference.
As an aside, there are currently moves underway, in both C and C++ standards, to nominate two's complement as the only encoding for negative integers.
Because it's all just about memory, in the end all the numerical values are stored in binary.
A 32 bit unsigned integer can contain values from all binary 0s to all binary 1s.
When it comes to 32 bit signed integer, it means one of its bits (most significant) is a flag, which marks the value to be positive or negative.
The C standard specifies that unsigned numbers will be stored in binary. (With optional padding bits). Signed numbers can be stored in one of three formats: Magnitude and sign; two's complement or one's complement. Interestingly that rules out certain other representations like Excess-n or Base −2.
However on most machines and compilers store signed numbers in 2's complement.
int is normally 16 or 32 bits. The standard says that int should be whatever is most efficient for the underlying processor, as long as it is >= short and <= long then it is allowed by the standard.
On some machines and OSs history has causes int not to be the best size for the current iteration of hardware however.
Here is the very nice link which explains the storage of signed and unsigned INT in C -
http://answers.yahoo.com/question/index?qid=20090516032239AAzcX1O
Taken from this above article -
"process called two's complement is used to transform positive numbers into negative numbers. The side effect of this is that the most significant bit is used to tell the computer if the number is positive or negative. If the most significant bit is a 1, then the number is negative. If it's 0, the number is positive."
Assuming int is a 16 bit integer (which depends on the C implementation, most are 32 bit nowadays) the bit representation differs like the following:
5 = 0000000000000101
-5 = 1111111111111011
if binary 1111111111111011 would be set to an unsigned int, it would be decimal 65531.

Resources