Representation of -1 due to bit overflow? [closed] - c

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Hey I was trying to figure out why -1 << 4 (left) shift is FFF0 after looking and reading around web I came to know that "negative" numbers have a sign bit i.e 1. So just because "-1" would mean extra 1 bit ie (33) bits which isn't possible that's why we consider -1 as 1111 1111 1111 1111 1111 1111 1111 1111
For instance :-
#include<stdio.h>
void main()
{
printf("%x",-1<<4);
}
In this example we know that –
Internal representation of -1 is all 1’s 1111 1111 1111 1111 1111 1111 1111 1111 in an 32 bit compiler.
When we bitwise shift negative number by 4 bits to left least significant 4 bits are filled with 0’s
Format specifier %x prints specified integer value as hexadecimal format
After shifting 1111 1111 1111 1111 1111 1111 1111 0000 = FFFFFFF0 will be printed.
Source for the above
http://www.c4learn.com/c-programming/c-bitwise-shift-negative-number/

First, according to the C standard, the result of a left-shift on a signed variable with a negative value is undefined. So from a strict language-lawyer perspective, the answer to the question "why does -1 << 4 result in XYZ" is "because the standard does not specify what the result should be."
What your particular compiler is really doing, though, is left-shifting the two's-complement representation of -1 as if that representation were an unsigned value. Since the 32-bit two's-complement representation of -1 is 0xFFFFFFFF (or 11111111 11111111 11111111 11111111 in binary), the result of shifting left 4 bits is 0xFFFFFFF0 or 11111111 11111111 11111111 11110000. This is the result that gets stored back in the (signed) variable, and this value is the two's-complement representation of -16. If you were to print the result as an integer (%d) you'd get -16.
This is what most real-world compilers will do, but do not rely on it, because the C standard does not require it.

First thing first, the tutorials use void main. The comp.lang.c frequently asked question 11.15 should be of interest when assessing the quality of the tutorial:
Q: The book I've been using, C Programing for the Compleat Idiot, always uses void main().
A: Perhaps its author counts himself among the target audience. Many books unaccountably use void main() in examples, and assert that it's correct. They're wrong, or they're assuming that everyone writes code for systems where it happens to work.
That said, the rest of the example is ill-advised. The C standard does not define the behaviour of signed left shift. However, a compiler implementation is allowed to define behaviour for those cases that the standard leaves purposefully open. For example GCC does define that
all signed integers have two's-complement format
<< is well-defined on negative signed numbers and >> works as if by sign extension.
Hence, -1 << 4 on GCC is guaranteed to result in -16; the bit representation of these numbers, given 32 bit int are 1111 1111 1111 1111 1111 1111 1111 1111 and 1111 1111 1111 1111 1111 1111 1111 0000 respectively.
Now, there is another undefined behaviour here: %x expects an argument that is an unsigned int, however you're passing in a signed int, with a value that is not representable in an unsigned int. However, the behaviour on GCC / with common libc's most probably is that the bytes of the signed integer are interpreted as an unsigned integer, 1111 1111 1111 1111 1111 1111 1111 0000 in binary, which in hex is FFFFFFF0.
However, a portable C program should really never
assume two's complement representation - when the representation is of importance, use unsigned int or even uint32_t
assume that the << or >> on negative numbers have a certain behaviour
use %x with signed numbers
write void main.
A portable (C99, C11, C17) program for the same use case, with defined behaviour, would be
#include <stdio.h>
#include <inttypes.h>
int main(void)
{
printf("%" PRIx32, (uint32_t)-1 << 4);
}

Related

How does the compiler treats printing unsigned int as signed int?

I'm trying to figure out why the following code:
{
unsigned int a = 10;
a = ~a;
printf("%d\n", a);
}
a will be 00001010 to begin with, and after NOT opertaion, will transform
into 11110101.
What happens exactly when one tries to print a as signed integer, that makes
the printed result to be -11?
I thought i would end up seeing -5 maybe (according to the binary representation), but not -11.
I'll be glad to get a clarification on the matter.
2's complement notation is used to store negative numbers.
The number 10 is 0000 0000 0000 0000 0000 0000 0000 1010 in 4 byte binary.
a=~a makes the content of a as 1111 1111 1111 1111 1111 1111 1111 0101.
This number when treated as signed int will tell the compiler to
take the most significant bit as sign and rest as magnitude.
The 1 in the msb makes the number a negative number.
Hence 2's complement operation is performed on the remaining bits.
Thus 111 1111 1111 1111 1111 1111 1111 0101 becomes
000 0000 0000 0000 0000 0000 0000 1011.
This when interpreted as a decimal integer becomes -11.
When you write a = ~a; you reverse each an every bit in a, what is also called a complement to 1.
The representation of a negative number is declared as implementation dependant, meaning that different architectures could have different representation for -10 or -11.
Assuming a 32 architecture on a common processor that uses complement to 2 to represent negative numbers -1 will be represented as FFFFFFFF (hexadecimal) or 32 bits to 1.
~a will be represented as = FFFFFFF5 or in binary 1...10101 which is the representation of -11.
Nota: the first part is always the same and is not implementation dependant, ~a is FFFFFFF5 on any 32 bits architecture. It is only the second part (-11 == FFFFFFF5) that is implementation dependant. BTW it would be -10 on an architecture that would use complement to 1 to represent negative numbers.

Effect of adding 0xff on a bitwise operation

I am taking a C final in a few hours, and I am going over past exams trying to make sure I understand problems I previously missed. I had the below question and I simply left it blank as I didn't know the answer and I moved on, and looking at it now I am not sure of what the answer would be... the question is;
signed short int c = 0xff00;
unsigned short int d, e;
c = c + '\xff';
d = c;
e = d >> 2;
printf("%4x, %4x, %4x\n",c,d,e);
We were asked to show what values would be printed? It is the addition of 'xff' which is throwing me off. I have solved similar problems in binary, but this hex representation is confusing me.
Could anyone explain to me what would happen here?
'\xff' is equivalent to all 1 in binary or -1 in signed int.
So initially c = 0xff00
c = c + '\xff'
In binary is
c = 1111 1111 0000 0000 + 1111 1111 1111 1111
Which yields signed short int
c = 1111 1110 1111 1111 (0xfeff)
c and d will be equal due to assignment but e is right shifted twice
e = 0011 1111 1011 1111 (0x3fbf)
I took the liberty to test this. In the code I added short int f assigned the value of c - 1.
unsigned short int c = 0xff00, f;
unsigned short int d, e;
f = c-1;
c = c + '\xff';
d = c;
e = (d >> 2);
printf("%4x, %4x, %4x, %4x\n",c,d,e,f);
And I get the same result for both c and f. f = c - 1 is not buffer overflow. c + '\xff' isn't buffer overflow either
feff, feff, 3fbf, feff
As noted by Zan Lynx, I was using unsigned short int in my sample code but the original post is signed short int. With signed int the output will have 4 extra f's.
0xff00 means the binary string 1111 1111 0000 0000.
'\xff' is a character with numeric code of 0xff and thus simply 1111 1111.
signed short int c = 0xff00;
is initializing c with out of range value (0xff00 = 65280 in decimal). This will cause to produce an erroneous result.
The first addition adds the 16-bit number, stored in c:
1111 1111 0000 0000
Plus the number that is coded as the value of the ASCII char enclosed between ' '. But in C you can specify a character as an hexadecimal code prefixed by \x like this '\xNN' where NN is a two hex digit number. The ASCII code of that character is the value of NN itself. So '\xFF' is a somewhat unusual way to say 0xFF.
The addition is to be performed using a signed short (16 bits, signed) plus a char (8 bits, signed). For it, the compiler promotes that 8-bit value to a 16-bit value, preserving the original sign by doing a sign-extension conversion.
So before the addition, 'xFF' is decoded as the 8-bit signed number 0xFF (1111 1111), which in turn is promoted to the 16-bit number 1111 1111 1111 1111 (the sign must be preserved)
The final addition is
1111 1111 0000 0000
1111 1111 1111 1111
-------------------
1111 1110 1111 1111
Which is the hexadecimal number 0xFEFF. That is the new value in variable c.
Then, there is d=c; dis unsigned short: it has the same size of a signed short, but sign is not considered here; the MSb is just another bit. As both variables have the same size, the value in d is exactly the same we had in c. That is:
d = 1111 1110 1111 1111
The difference is that any aritmetic or logical operation with this number won't take sign into account. This means, for example, that conversions that change the size of the number won't extend the sign.
e = d >> 2;
e gets the value of d shifted two bits to the right. The >> operator behaves differently depending upon the left operand is signed or not. If it is signed, the shifting is performed preserving the sign (bits entering the number from the left will have the same value as the original sign the number had before the shifting). If it is not, there will be zeroes entering from the left.
d is unsigned, so the value e gets is the result of shifting d two bits to the right, entering zeroes from the left:
e = 0011 1111 1011 1111
Which is 0x3FBF.
Finally, values printed are c,d,e:
0xFEFF, 0xFEFF, 0x3FBF
But you may see 0xFFFFFEFF as the first printed number. This is because %x expects an int, not a short. The 4 in "%4x" means: "use at least 4 digits to print the number, but if the amount of digits needed is more, use as much as needed". To print 0xFEFF as an int (32-bit int actually), it must be promoted again, and as it's signed, this is done with sign-extension. So 0xFEFF becomes 0xFFFFFEFF, which needs 8 digits to be printed, so it does.
The second and third %4x print unsigned values (d and e). These values are promoted to 32-bit ints, but this time, unsigned. So the second value is promoted to 0x0000FEFF and the third one, to 0x00003FBF. These two numbers don't actually need 8 digits to be printed, but 4, so it does so and you see only 4 digits for each number (try changing the two last %4x by %2x and you will see that the numbers are still printed with 4 digits)

Why does my program print fffffff0?

I am trying to understand why my program
#include<stdio.h>
void main()
{
printf("%x",-1<<4);
}
prints fffffff0.
What is this program doing, and what does the << operator do?
The << operator is the left shift operator; the result of a<<b is a shifted to the left of b bits.
The problem with your code is that you are left-shifting a negative integer, and this results in undefined behavior (although your compiler may place some guarantees on this operation); also, %x is used to print in hexadecimal an unsigned integer, and you are feeding to it a signed integer - again undefined behavior.
As for why you are seeing what you are seeing: on 2's complement architectures -1 is represented as "all ones"; so, on a computer with 32-bit int you'll have:
11111111111111111111111111111111 = -1 (if interpreted as a signed integer)
now, if you shift this to the left of 4 positions, you get:
11111111111111111111111111110000
The %x specifier makes printf interprets this stuff as an unsigned integer, which, in hexadecimal notation, is 0xfffffff0. This is easy to understand, since 4 binary digits equal a single hexadecimal digit; the 1111 groups in binary become f in hex, the last 0000 in binary is that last 0 in hex.
Again, all this behavior hereby explained is just the way your specific compiler works, as far as the C standard is concerned this is all UB. This is very intentional: historically different platforms had different ways to represent negative numbers, and the shift instructions of various processors have different subtleties, so the "defined behavior" we get for shift operators is more or less the "safe subset" common to most "normal" architectures.
It means take bit representation of -1 and shift it to the left 4 times
This means take
11111111 11111111 11111111 11111111 = ffffffff
And shift:
11111111 11111111 11111111 11110000 = fffffff0
The "%x" format specifier means print it out in hexadecimal notation.
It's the left shift binary operator. You're left shifting -1 by 4 bits:
-1 == 1111 1111 1111 1111 1111 1111 1111 1111(2) == FFFFFFFF
-1 << 4 == 1111 1111 1111 1111 1111 1111 1111 0000(2) == FFFFFFF0
"%x" means your integer will be displayed in hexadecimal value.
-1 << 4 means that the binary value of "-1" will be shifted of 4 bits
"<<" is left shift operator. -1<<4 means left shifting -1 by 4. Since -1 is 0xffffffff, you will get 0xfffffff0
More can be found in wiki http://en.wikipedia.org/wiki/Bitwise_operation

Is it valid on machine which is not 16 bit

I have been asked in an interview is it valid declaration on a machine which is not 16 bit??
Below is the declaration,
unsigned int zero = 0;
unsigned int compzero = 0xFFFF;
They are both valid declarations, yes, inasmuch as there's no syntax error.
However, if your intent is to get the complement of 0 (all bits inverted), you should use:
unsigned int zero = 0;
unsigned int compzero = ~zero;
With (for example) a 32-bit unsigned int, 0xffff and ~0 are respectively:
0000 0000 0000 0000 1111 1111 1111 1111
1111 1111 1111 1111 1111 1111 1111 1111
Yes the deceleration is valid. Think about it this way, a hex literal is no different then a decimal literal. If they wanted the result of the hex converted to decimal to be zero, then this might not be the case (depends on which system is in use and which negative-number system is in use: 1's complement, 2's complement or a simple Not operator)

Representation of negative numbers in C?

How does C represent negative integers?
Is it by two's complement representation or by using the MSB (most significant bit)?
-1 in hexadecimal is ffffffff.
So please clarify this for me.
ISO C (C99 section 6.2.6.2/2 in this case but it carries forward to later iterations of the standard(a)) states that an implementation must choose one of three different representations for integral data types, two's complement, ones' complement or sign/magnitude (although it's incredibly likely that the two's complement implementations far outweigh the others).
In all those representations, positive numbers are identical, the only difference being the negative numbers.
To get the negative representation for a positive number, you:
invert all bits then add one for two's complement.
invert all bits for ones' complement.
invert just the sign bit for sign/magnitude.
You can see this in the table below:
number | two's complement | ones' complement | sign/magnitude
=======|=====================|=====================|====================
5 | 0000 0000 0000 0101 | 0000 0000 0000 0101 | 0000 0000 0000 0101
-5 | 1111 1111 1111 1011 | 1111 1111 1111 1010 | 1000 0000 0000 0101
Keep in mind that ISO doesn't mandate that all bits are used in the representation. They introduce the concept of a sign bit, value bits and padding bits. Now I've never actually seen an implementation with padding bits but, from the C99 rationale document, they have this explanation:
Suppose a machine uses a pair of 16-bit shorts (each with its own sign bit) to make up a 32-bit int and the sign bit of the lower short is ignored when used in this 32-bit int. Then, as a 32-bit signed int, there is a padding bit (in the middle of the 32 bits) that is ignored in determining the value of the 32-bit signed int. But, if this 32-bit item is treated as a 32-bit unsigned int, then that padding bit is visible to the user’s program. The C committee was told that there is a machine that works this way, and that is one reason that padding bits were added to C99.
I believe that machine they may have been referring to was the Datacraft 6024 (and it's successors from Harris Corp). In those machines, you had a 24-bit word used for the signed integer but, if you wanted the wider type, it strung two of them together as a 47-bit value with the sign bit of one of the words ignored:
+---------+-----------+--------+-----------+
| sign(1) | value(23) | pad(1) | value(23) |
+---------+-----------+--------+-----------+
\____________________/ \___________________/
upper word lower word
(a) Interestingly, given the scarcity of modern implementations that actually use the other two methods, there's been a push to have two's complement accepted as the one true method. This has gone quite a long way in the C++ standard (WG21 is the workgroup responsible for this) and is now apparently being considered for C as well (by WG14).
C allows sign/magnitude, one's complement and two's complement representations of signed integers. Most typical hardware uses two's complement for integers and sign/magnitude for floating point (and yet another possibility -- a "bias" representation for the floating point exponent).
-1 in hexadecimal is ffffffff. So please clarify me in this regard.
In two's complement (by far the most commonly used representation), each bit except the most significant bit (MSB), from right to left (increasing order of magnitude) has a value 2n where n increases from zero by one. The MSB has the value -2n.
So for example in an 8bit twos-complement integer, the MSB has the place value -27 (-128), so the binary number: 1111 11112 is equal to -128 + 0111 11112 = -128 + 127 = -1
One useful feature of two's complement is that a processor's ALU only requires an adder block to perform subtraction, by forming the two's complement of the right-hand operand. For example 10 - 6 is equivalent to 10 + (-6); in 8bit binary (for simplicity of explanation) this looks like:
0000 1010
+1111 1010
---------
[1]0000 0100 = 4 (decimal)
Where the [1] is the discarded carry bit. Another example; 10 - 11 == 10 + (-11):
0000 1010
+1111 0101
---------
1111 1111 = -1 (decimal)
Another feature of two's complement is that it has a single value representing zero, whereas sign-magnitude and one's complement each have two; +0 and -0.
For integral types it's usually two's complement (implementation specific). For floating point, there's a sign bit.

Resources