When I do a left shift of a hex I get -1 as output with the following code:
unsigned int i,j=0;
i= (0xffffffff << (32-j));
printf("%d",i);
Similarly when I changed the shift value to 32, the output is 0, but I get warnings from compiler as (left shift count >= width of type)
unsigned int i,j=32;
i= (0xffffffff << (32));
printf("%d",i);
I was expecting the same results in both the cases (ie, 0), but got confused why is displaying wrong output in case #1, and in case #2 the result is correct but the compiler warns!
The result is same in 32 and 64 bit x86 machines.
Can someone explain the results above?
It's undefined behavior to left-shit 32 or greater on a 32-bit integer. That's what the error is about.
C11 6.5.7 Bitwise shift operators
The integer promotions are performed on each of the operands. The type of the result is
that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
Shifting a 32-bit variable by 32 yields undefined behavior.
Here is the assembly generated by the VS-2013 compiler:
int n = 0;
mov dword ptr [n],0
int a = 0xFFFFFFFF << 32;
mov dword ptr [a],0
int b = 0xFFFFFFFF << (32-n);
mov ecx,20h
sub ecx,dword ptr [n]
or eax,0FFFFFFFFh
shl eax,cl
mov dword ptr [b],eax
As you can see, what happens de-facto is:
When you shift by a constant value of 32, the compiler simply sets the result to 0
When you shift by a variable (such as 32-n with n==0), the compiler uses shl
The actual result of shl depends on the implementation of this operation in the underlying architecture. On your processor, it probably takes the 2nd operand modulo 32, hence the 1st operand is shifted by 0.
Again, the description above is not dictated by the standard, so it really depends on the compiler in use.
Related
I played with bit-fields and stuck with some strange thing:
#include <stdio.h>
struct lol {
int a;
int b:1,
c:1,
d:1,
e:1;
char f;
};
int main(void) {
struct lol l = {0};
l.a = 123;
l.c = 1; // -1 ???
l.f = 'A';
printf("%d %d %d %d %d %c\n", l.a, l.b, l.c, l.d, l.e, l.f);
return 0;
}
The output is:
123 0 -1 0 0 A
Somehow the value of l.c is -1. What is the reason?
Sorry if obvious.
Use unsigned bitfields if you don't want sign-extension.
What you're getting is your 1 bit being interpreted as the sign bit in a two's complement representation. In two's complement, the sign-bit is the highest bit and it's interpreted as -(2^(width_of_the_number-1)), in your case -(2^(1-1)) == -(2^0) == -1. Normally all other bits offset this (because they're interpreted as positive) but a 1-bit number doesn't and can't have other bits, so you just get -1.
Take for example 0b10000000 as a as int8_t in two's complement.
(For the record, 0b10000000 == 0x80 && 0x80 == (1<<7)). It's the highest bit so it's interpreted as -(2^7) (==-128)
and there's no positive bits to offset it, so you get printf("%d\n", (int8_t)0x80); /*-128*/
Now if you set all bits on, you get -1, because -128 + (128-1) == -1. This (all bits on == -1) holds true for any width interpreted as in two's complement–even for width 1, where you get -1 + (1-1) == -1`.
When such a signed integer gets extended into a wider width, it undergoes so called sign extension.
Sign extension means that the highest bit gets copied into all the newly added higher bits.
If the highest bit is 0, then it's trivial to see that sign extension doesn't change the value (take for example 0x01 extended into 0x00000001).
When the highest bit is 1 as in (int8_t)0xff (all 8 bits 1), then sign extension copies the sign bit into all the new bits: ((int32_t)(int8_t)0xff == (int32_t)0xffffffff). ((int32_t)(int8_t)0x80 == (int32_t)0xffffff80) might be a better example as it more clearly shows the 1 bits are added at the high end (try _Static_assert-ing either of these).
This doesn't change the value either as long as you assume two's complement, because if you start at:
-(2^n) (value of sign bit) + X (all the positive bits) //^ means exponentiation here
and add one more 1-bit to the highest position, then you get:
-(2^(n+1)) + 2^(n) + X
which is
2*-(2^(n)) + 2^(n) + X == -(2^n) + X //same as original
//inductively, you can add any number of 1 bits
Sign extension normally happens when you width-extend a native signed integer into a native wider width (signed or unsigned), either with casts or implicit conversions.
For the native widths, platforms usually have an instruction for it.
Example:
int32_t signExtend8(int8_t X) { return X; }
Example's dissassembly on x86_64:
signExtend8:
movsx eax, dil //the sx stands for Sign-eXtending
ret
If you want to make it work for nonstandard widths, you can usually utilize the fact that signed-right-shifts normally copy the the sign bit alongside the shifted range (it's really implementation defined what signed right-shifts do)
and so you can unsigned-left-shift into the sign bit and then back to get sign-extension artificially for non-native width such as 2:
#include <stdint.h>
#define MC_signExtendIn32(X,Width) ((int32_t)((uint32_t)(X)<<(32-(Width)))>>(32-(Width)))
_Static_assert( MC_signExtendIn32(3,2 /*width 2*/)==-1,"");
int32_t signExtend2(int8_t X) { return MC_signExtendIn32(X,2); }
Disassembly (x86_64):
signExtend2:
mov eax, edi
sal eax, 30
sar eax, 30
ret
Signed bitfields essentially make the compiler generate (hidden) macros like the above for you:
struct bits2 { int bits2:2; };
int32_t signExtend2_via_bitfield(struct bits2 X) { return X.bits2; }
Disassembly (x86_64) on clang:
signExtend2_via_bitfield: # #signExtend2_via_bitfield
mov eax, edi
shl eax, 30
sar eax, 30
ret
Example code on godbolt: https://godbolt.org/z/qxd5o8 .
Bit-fields are very poorly standardized and they are generally not guaranteed to behave predictably. The standard just vaguely states (6.7.2.1/10):
A bit-field is interpreted as having a signed or unsigned integer type consisting of the
specified number of bits.125)
Where the informative note 125) says:
125) As specified in 6.7.2 above, if the actual type specifier used is int or a typedef-name defined as int,
then it is implementation-defined whether the bit-field is signed or unsigned.
So we can't know if int b:1 gives a signed type or unsigned type, it's up to the compiler. Your compiler apparently decided that it would be a great idea to have signed bits. So it treats your 1 bit as binary translated into a two's complement 1 bit number, where binary 1 is decimal -1 and zero is zero.
Furthermore, we can't know where b in your code ends up in memory, it could be anywhere and also depends on endianess. What we do know is that you save absolutely no memory from using a bit-field here, since at least 16 bits for an int will get allocated anyway.
General good advise:
Never use bit-fields for any purpose.
Use the bit-wise operators << >> | & ^ ~ and named bit-masks instead, for 100% portable and well-defined code.
Use the stdint.h types or at least unsigned ones whenver dealing with raw binary.
You are using a signed integer, and since the representation of 1 in binary has the very first bit set to 1, in a signed representation that is translated with the existence of negative signedness, so you get -1. As other comments suggest, use the unsigned keyword to remove the possibility to represent negative integers.
I just checked the C++ standard. It seems the following code should NOT be undefined behavior:
unsigned int val = 0x0FFFFFFF;
unsigned int res = val >> 34; // res should be 0 by C++ standard,
// but GCC gives warning and res is 67108863
And from the standard:
The value of E1 >> E2 is E1 right-shifted E2 bit positions. If E1
has an unsigned type or if E1 has a signed type and a non-negative
value, the value of the result is the integral part of the quotient of
E1/2^E2. If E1 has a signed type and a negative value, the resulting
value is implementation-defined.
According to the standard, since 34 is NOT an negative number, the variable res will be 0.
GCC gives the following warning for the code snippet, and res is 67108863:
warning: right shift count >= width of type
I also checked the assembly code emitted by GCC. It just calls SHRL, and the Intel instruction document for SHRL, the res is not ZERO.
So does that mean GCC doesn't implement the standard behavior on Intel platform?
The draft C++ standard in section 5.8 Shift operators in paragraph 1 says(emphasis mine):
The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.
So if unsigned int is 32 bits or less then this is undefined which is exactly the warning that gcc is giving you.
To explain exactly what happens:
The compiler will load 34 into a register, and then your constant in another register, and perform a right shift operation with those two registers. The x86 processor performs a "shiftcount % bits" on the shift value, meaning that you get a right-shift by 2.
And since 0x0FFFFFFF (268435455 decimal) divided by 4 = 67108863, that's the result you see.
If you had a different processor, for example a PowerPC (I think), it may well give you zero.
There is something different with 1U << 32 and first assign 32 to a variable (for example, n) then left shift n. I tried with printf("%u\n", 1U << 32), compilers will optimize the result to 0. But when I try this code,
#include <stdio.h>
int main() {
int n = 32;
printf("%u\n", 1U << n);
}
Compiling and executing the code above will print the result of 1.
According to C/C++ standard,
The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1×2^E2, reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1×2^E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.
In my opnion, when E1 is unsigned and E1 << E2 overflows, the result will be E1 * 2 ^ E2 mod (UINT_MAX + 1), so the result of 1U << n should be 0.
But compiling with GCC 4.4 to 7.2 or Clang 3.1 to 5.0 on x86 and ARM gave the result of 1. I checked the assembly code and found both produced the following assembly,
movl $20,-0x4(%rbp)
mov -0x4(%rbp),%eax
mov $0x1,%edx
mov %eax,%ecx
shl %cl,%edx
and
orr w8, wzr, #0x1
orr w9, wzr, #0x20
stur w9, [x29, #-4]
ldur w9, [x29, #-4]
lsl w1, w8, w9
And then I checked the instruction shl and lsl, on c9x.me I found following description about shl,
The destination operand can be a register or a memory location. The count operand can be an immediate value or register CL. The count is masked to 5 bits, which limits the count range to 0 to 31.
And assembler guide of ARM tells that the lsl's allowed shifts is 0-31.
It means at least the instruction shl works well, that is the reason why I suspect the implementation of compilers are wrong, but it seems impossible to have such a bug for so wide and long, could any one explain to me?
Thanks.
The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.
C++11, §5.8 ¶1 (emphasis added)
The integer promotions are performed on each of the operands. The type of the result is that of the promoted left operand. If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined.
C99, §6.5.7 ¶3 (emphasis added)
It's extremely likely that these rules were made exactly because the shift operators on existing platforms had this kind of limitations; without this rule in place, every shift of an unknown value would have to be transformed to code more complex than a simple underlying platform shift instruction, while it's expected for shifts to be extremely fast operations.
noting: The destination operand can be a register or a memory location. The count operand can be an immediate value or register CL. The count is masked to 5 bits, which limits the count range to 0 to 31.
the value 32 is: 0x00000020
limited to 5 bits is: 0x00000000
1 << 0 results in 1
I Learned About 2's Complement and unsigned and signed int. So I Decided to test my knowledge , as far as i know that a negative number is stored in 2's complement way so that addition and subtraction would not have different algorithm and circuitry would be simple.
Now If I Write
int main()
{
int a = -1 ;
unsigned int b = - 1 ;
printf("%d %u \n %d %u" , a ,a , b, b);
}
Output Comes To Be -1 4294967295 -1 4294967295 . Now , i looked at the bit pattern and various things and then i realized that -1 in 2's complement is 11111111 11111111 11111111 11111111 , so when i interpret it using %d , it gives -1 , but when i interpret using %u , it treat it as a positive number and so it gives 4294967295. I Checked the Assembly Of the code is
.LC0:
.string "%d %u \n %d %u"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], -1
mov DWORD PTR [rbp-8], -1
mov esi, DWORD PTR [rbp-8]
mov ecx, DWORD PTR [rbp-8]
mov edx, DWORD PTR [rbp-4]
mov eax, DWORD PTR [rbp-4]
mov r8d, esi
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
leave
ret
Now here -1 is moved to the register both the times in unsigned and signed . What i want to know if reinterpretation is only that matters , then why do we have two types unsigned and signed , it is printf format string %d and %u that matters ?
Further what really happens when i assign negative number to a unsigned integer (I learned That The initializer converts this value from int to unsigned int. ) but in the assembly code I did not saw such a thing. So what really happens ??
And How does Machine knows when it has to do 2's complement and when not , does it see the negative sign and performs 2's complement?
I have read almost every question and answer you could think this question be duplicate of , but I could not find a satisfactory solution.
Both signed and unsigned are pieces of memory and according to operations it matters how they behave.
It doesn't make any difference when adding or subtracting because due to 2-complement the operations are exactly the same.
It matters when we compare two numbers: -1 is lower than 0 while 4294967295 obviously isn't.
About conversion - for the same size it simply takes variable content and moves it to another - so 4294967295 becomes -1. For bigger size it's first signed extended and then content is moves.
How does machine now - according the instruction we use. Machines have either different instructions for comparing signed and unsigned or they provide different flags for it (x86 has Carry for unsigned overflow and Overflow for signed overflow).
Additionally, note that C is relaxed how the signed numbers are stored, they don't have to be 2-complements. But nowadays, all common architectures store the signed like this.
There are a few differences between signed and unsigned types:
The behaviors of the operators <, <=, >, >=, /, %, and >> are all different when dealing with signed and unsigned numbers.
Compilers are not required to behave predictably if any computation on a signed value exceeds the range of its type. Even when using operators which would behave identically with signed and unsigned values in all defined cases, some compilers will behave in "interesting" fashion. For example, a compiler given x+1 > y could replace it with x>=y if x is signed, but not if x is unsigned.
As a more interesting example, on a system where "short" is 16 bits and "int" is 32 bits, a compiler given the function:
unsigned mul(unsigned short x, unsigned short y) { return x*y; }
might assume that no situation could ever arise where the product would exceed 2147483647. For example, if it saw the function invoked as unsigned x = mul(y,65535); and y was an unsigned short, it may omit code elsewhere that would only be relevant if y were greater than 37268.
It seems you seem to have missed the facts that firstly, 0101 = 5 in both signed and unsigned integer values and that secondly, you assigned a negative number to an unsigned int - something your compiler may be smart enough to realise and, therfore, correct to a signed int.
Setting an unsigned int to -5 should technically cause an error because unsigned ints can't store values under 0.
You could understand it better when you try to assign a negative value to a larger sized unsigned integer. Compiler generates the assembly code to do sign extension when transferring small size negative value to larger sized unsigned integer.
see this blog post for assembly level explanation.
Choice of signed integer representation is left to the platform. The representation applies to both negative and non-negative values - for example, if 11012 (-5) is the two's complement of 01012 (5), then 01012 (5) is also the two's complement of 11012 (-5).
The platform may or may not provide separate instructions for operations on signed and unsigned integers. For example, x86 provides different multiplication and division instructions for signed (idiv and imul) and unsigned (div and mul) integers, but uses the same addition (add) and subtraction (sub) instructions for both.
Similarly, x86 provides a single comparison (cmp) instruction for both signed and unsigned integers.
Arithmetic and comparison operations will set one or more status register flags (carry, overflow, zero, etc.). These can be used differently when dealing with words that are supposed to represent signed vs. unsigned values.
As far as printf is concerned, you're absolutely correct that the conversion specifier determines whether the bit pattern 0xFFFF is displayed as -1 or 4294967295, although remember that if the type of the argument does not match up with what the conversion specifier expects, then the behavior is undefined. Using %u to display a negative signed int may or may not give you the expected equivalent unsigned value.
I have the following code in C:
int l;
short s;
l = 0xdeadbeef;
s = l;
Assuming int is 32 bits and short is 16 bits, when performing s = l, s will be promoted to 32 bits and after assignment, only lower 16 bits will be kept in s. My question is that when s is promoted to 32 bits, will the additional 16 bits be set to 0x0 or 0xf ?
Source : http://www.phrack.com/issues.html?issue=60&id=10
Actually s is not promoted at all. Since s is signed and l is too large to fit in s, assigning l to s in this case is implementation defined behavior.
6.3.1.3-3
Otherwise, the new type is signed and the value cannot be represented
in it; either the result is implementation-defined or an
implementation-defined signal is raised.
Assembler have operation for moving whole register or part of it (MOV EAX, 0, MOV AX, 0, MOV AL, 0 - respectively 32bits, 16bits, 8bits). As short is 16-bit integer MOV AX, 0 form would be used, although, that depends on compiler implementation.
I assume you're going to promote s to some wider type.
This depends on the destination type: whether it is signed or unsigned. If the destination type is signed there will be signed promotion done. Otherwise -- unsigned promotion. The signed promotion fills higher bits by 0 or 1 depending on the sign of the promoted value.