Left shift of unsigned 32-bit integer in assembly

Left shift of unsigned 32-bit integer in assembly - c

Quick question on left shifts in assembly using the "sall" instruction.
From what I understand, "sall rightop, leftop" would translate to "leftop = leftop << rightop", so taking an integer and shifting the bits 4 spaces to the left would result in a multiplication by 2^4.
But what happens when the integer is unsigned, 32-bits, and is something like:
1111 1111 1111 1111 1111 0000 0010 0010
Would a left shift in this case become 1111 1111 1111 1111 0000 0010 0010 0000 ?
Obviously this is not a multiplication by 2^4.
Thanks!!

It is a multiplication by 2^4, modulo 2^32:
n = (n * 2^4) % (2 ^ 32)
You can detect the bits that got "shifted out" by performing a shift left followed by masking, in this case
dropped = (n >> (32-4)) & (1<<4-1)

Left shifts (SAL, SHL) simply lose the bits on the left. The bits on the right get filled with 0. If any of the lost bits is 1, you have an overflow and the wrong result in terms of multiplication. You use S*L for both non-negative and negative values.
The regular right shift (SHR) works in exactly the same manner, but the direction is reverse, the bits on the left get filled with 0 and you lose the bits on the right. The result is effectively rounded/truncated towards 0. You use SHR of non-negative values because it does not preserve the sign of the value (0 gets written into it).
The arithmetic shift right (SAR) is slightly different. The most significant bit (=leftmost bit, sign bit) doesn't get filled with 0. It gets filled with its previous value, thus preserving the sign of the value. Another notable difference is that if the value is negative, the lost bits on the right result in rounding towards minus infinity instead of 0.

Related

Right bit-shift giving wrong result, can someone explain

I'm right-shifting -109 by 5 bits, and I expect -3, because
-109 = -1101101 (binary)
shift right by 5 bits
-1101101 >>5 = -11 (binary) = -3
But, I am getting -4 instead.
Could someone explain what's wrong?
Code I used:
int16_t a = -109;
int16_t b = a >> 5;
printf("%d %d\n", a,b);
I used GCC on linux, and clang on osx, same result.

The thing is you are not considering negative numbers representation correctly. With right shifting, the type of shift (arithmetic or logical) depends on the type of the value being shifted. If you cast your value to an unsigned value, you might get what you are expecting:
int16_t b = ((unsigned int)a) >> 5;
You are using -109 (16 bits) in your example. 109 in bits is:
00000000 01101101
If you take's 109 2's complement you get:
11111111 10010011
Then, you are right shifting by 5 the number 11111111 10010011:
__int16_t a = -109;
__int16_t b = a >> 5; // arithmetic shifting
__int16_t c = ((__uint16_t)a) >> 5; // logical shifting
printf("%d %d %d\n", a,b,c);
Will yield:
-109 -4 2044

The result of right shifting a negative value is implementation defined behavior, from the C99 draft standard section 6.5.7 Bitwise shift operators paragraph 5 which says (emphasis mine going forward):
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type
or if E1 has a signed type and a nonnegative value, the value of the result is the integral
part of the quotient of E1 / 2E2. If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
If we look at gcc C Implementation-defined behavior documents under the Integers section it says:
The results of some bitwise operations on signed integers (C90 6.3, C99 and C11 6.5).
Bitwise operators act on the representation of the value including both the sign and value bits, where the sign bit is considered immediately above the highest-value value bit. Signed ‘>>’ acts on negative numbers by sign extension.

That's pretty clear what's happening, when representing signed integers, negative integers have a property which is, sign extension, and the left most significant bit is the sign bit.
So, 1000 ... 0000 (32 bit) is the biggest negative number that you can represent, with 32 bits.
Because of this, when you have a negative number and you shift right, a thing called sign extension happens, which means that the left most significant bit is extended, in simple terms it means that, for a number like -109 this is what happens:
Before shifting you have (16bit):
1111 1111 1001 0011
Then you shift 5 bits right (after the pipe are the discarded bits):
1XXX X111 1111 1100 | 1 0011
The X's are the new spaces that appear in your integer bit representation, that due to the sign extension, are filled with 1's, which give you:
1111 1111 1111 1100 | 1 0011
So by shifting: -109 >> 5, you get -4 (1111 .... 1100) and not -3.
Confirming results with the 1's complement:
+3 = 0... 0000 0011
-3 = ~(0... 0000 0011) + 1 = 1... 1111 1100 + 1 = 1... 1111 1101
+4 = 0... 0000 0100
-4 = ~(0... 0000 0100) + 1 = 1... 1111 1011 + 1 = 1... 1111 1100
Note: Remember that the 1's complement is just like the 2's complement with the diference that you first must negate the bits of positive number and only then sum +1.

Pablo's answer is essentially correct, but there are two small bits (no pun intended!) that may help you see what's going on.
C (like pretty much every other language) uses what's called two's complement, which is simply a different way of representing negative numbers (it's used to avoid the problems that come up with other ways of handling negative numbers in binary with a fixed number of digits). There is a conversion process to turn a positive number in two's complement (which looks just like any other number in binary - except that the furthest most left bit must be 0 in a positive number; it's basically the sign place-holder) is reasonably simple computationally:
Take your number
00000000 01101101 (It has 0s padding it to the left because it's 16 bits. If it was long, it'd be padded with more zeros, etc.)
Flip the bits
11111111 10010010
Add one.
11111111 10010011.
This is the two's complement number that Pablo was referring to. It's how C holds -109, bitwise.
When you logically shift it to the right by five bits you would APPEAR to get
00000111 11111100.
This number is most definitely not -4. (It doesn't have a 1 in the first bit, so it's not negative, and it's way too large to be 4 in magnitude.) Why is C giving you negative 4 then?
The reason is basically that the ISO implementation for C doesn't specify how a given compiler needs to treat bit-shifting in negative numbers. GCC does what's called sign extension: the idea is to basically pad the left bits with 1s (if the initial number was negative before shifting), or 0s (if the initial number was positive before shifting).
So instead of the 5 zeros that happened in the above bit-shift, you instead get:
11111111 11111100. That number is in fact negative 4! (Which is what you were consistently getting as a result.)
To see that that is in fact -4, you can just convert it back to a positive number using the two's complement method again:
00000000 00000011 (bits flipped)
00000000 00000100 (add one).
That's four alright, so your original number (11111111 11111100) was -4.

Get mosts significant nibble, regardless of int bit length and endianness

My head is starting to hurt... I've been looking at this way too long.
I'm trying to mask the most significant nibble of an int, regardless of the int bit length and the endianness of the machine. Let's say x = 8425 = 0010 0000 1110 1001 = 0x20E9. I know that to get the least significant nibble, 9, I just need to do something like x & 0xF to get back 9. But how about the most significant nibble, 2?
I apologize if my logic from here on out falls apart, my brain is completely fried, but here I go:
My book tells me that the bit length w of the data type int can be computed with w = sizeof(int)<<3. If I knew that the machine were big-endian, I could do 0xF << w-4 to have 1111 for the most significant nibble and 0000 for the rest, i.e. 1111 0000 0000 0000. If I knew that the machine were little-endian, I could do 0xF >> w-8 to have 0000 0000 0000 1111. Fortunately, this works even though we are told to assume that right shifts are done arithmetically just because 0xF always gives me the first bit of 0000. But this is not a proper solution. We are not allowed to test for endianness and then proceed from there, so what do I do?

Bit shifting operators operate at a level of abstraction above endianness. "Left" shifts always shift towards the most significant bit, and "right" shifts always shift towards the least significant bit.

You should be able to right shift by the (number of bits) - 4 regardless of endianness.
Since you already know how to compute the number of bits, it should suffice to just subtract 4 and shift by that number, and then (for safety), mask with 0xF.
See this question for discussion about endianness.

Q But how about the most significant nibble, 2?
A (x >> (sizeof(int)*8-4)) & 0xF

Effect of adding 0xff on a bitwise operation

I am taking a C final in a few hours, and I am going over past exams trying to make sure I understand problems I previously missed. I had the below question and I simply left it blank as I didn't know the answer and I moved on, and looking at it now I am not sure of what the answer would be... the question is;
signed short int c = 0xff00;
unsigned short int d, e;
c = c + '\xff';
d = c;
e = d >> 2;
printf("%4x, %4x, %4x\n",c,d,e);
We were asked to show what values would be printed? It is the addition of 'xff' which is throwing me off. I have solved similar problems in binary, but this hex representation is confusing me.
Could anyone explain to me what would happen here?

'\xff' is equivalent to all 1 in binary or -1 in signed int.
So initially c = 0xff00
c = c + '\xff'
In binary is
c = 1111 1111 0000 0000 + 1111 1111 1111 1111
Which yields signed short int
c = 1111 1110 1111 1111 (0xfeff)
c and d will be equal due to assignment but e is right shifted twice
e = 0011 1111 1011 1111 (0x3fbf)
I took the liberty to test this. In the code I added short int f assigned the value of c - 1.
unsigned short int c = 0xff00, f;
unsigned short int d, e;
f = c-1;
c = c + '\xff';
d = c;
e = (d >> 2);
printf("%4x, %4x, %4x, %4x\n",c,d,e,f);
And I get the same result for both c and f. f = c - 1 is not buffer overflow. c + '\xff' isn't buffer overflow either
feff, feff, 3fbf, feff
As noted by Zan Lynx, I was using unsigned short int in my sample code but the original post is signed short int. With signed int the output will have 4 extra f's.

0xff00 means the binary string 1111 1111 0000 0000.
'\xff' is a character with numeric code of 0xff and thus simply 1111 1111.

signed short int c = 0xff00;
is initializing c with out of range value (0xff00 = 65280 in decimal). This will cause to produce an erroneous result.

The first addition adds the 16-bit number, stored in c:
1111 1111 0000 0000
Plus the number that is coded as the value of the ASCII char enclosed between ' '. But in C you can specify a character as an hexadecimal code prefixed by \x like this '\xNN' where NN is a two hex digit number. The ASCII code of that character is the value of NN itself. So '\xFF' is a somewhat unusual way to say 0xFF.
The addition is to be performed using a signed short (16 bits, signed) plus a char (8 bits, signed). For it, the compiler promotes that 8-bit value to a 16-bit value, preserving the original sign by doing a sign-extension conversion.
So before the addition, 'xFF' is decoded as the 8-bit signed number 0xFF (1111 1111), which in turn is promoted to the 16-bit number 1111 1111 1111 1111 (the sign must be preserved)
The final addition is
1111 1111 0000 0000
1111 1111 1111 1111
-------------------
1111 1110 1111 1111
Which is the hexadecimal number 0xFEFF. That is the new value in variable c.
Then, there is d=c; dis unsigned short: it has the same size of a signed short, but sign is not considered here; the MSb is just another bit. As both variables have the same size, the value in d is exactly the same we had in c. That is:
d = 1111 1110 1111 1111
The difference is that any aritmetic or logical operation with this number won't take sign into account. This means, for example, that conversions that change the size of the number won't extend the sign.
e = d >> 2;
e gets the value of d shifted two bits to the right. The >> operator behaves differently depending upon the left operand is signed or not. If it is signed, the shifting is performed preserving the sign (bits entering the number from the left will have the same value as the original sign the number had before the shifting). If it is not, there will be zeroes entering from the left.
d is unsigned, so the value e gets is the result of shifting d two bits to the right, entering zeroes from the left:
e = 0011 1111 1011 1111
Which is 0x3FBF.
Finally, values printed are c,d,e:
0xFEFF, 0xFEFF, 0x3FBF
But you may see 0xFFFFFEFF as the first printed number. This is because %x expects an int, not a short. The 4 in "%4x" means: "use at least 4 digits to print the number, but if the amount of digits needed is more, use as much as needed". To print 0xFEFF as an int (32-bit int actually), it must be promoted again, and as it's signed, this is done with sign-extension. So 0xFEFF becomes 0xFFFFFEFF, which needs 8 digits to be printed, so it does.
The second and third %4x print unsigned values (d and e). These values are promoted to 32-bit ints, but this time, unsigned. So the second value is promoted to 0x0000FEFF and the third one, to 0x00003FBF. These two numbers don't actually need 8 digits to be printed, but 4, so it does so and you see only 4 digits for each number (try changing the two last %4x by %2x and you will see that the numbers are still printed with 4 digits)

linux kernel code ">>" operator

We've been given an assignment to make some modifications to Linux kernel code and recompile it. I'm having hard time figuring out what does this code line do:
p->time_slice = (current->time_slice + 1) >> 1;
To be more exact, why there's ">> 1" at the end?

">>" means to shift the value bitwise to the right. "x >> y" is the same as dividing by 2^y and truncating the result. Truncating the result means rounding down in almost all cases, however with negative numbers there may exist alternate implementations. Please see comments if you think this is happening to you.

That's a bitwise shift operator. Treating a value as an array of bits, it shifts everything one bit to the right (towards the least significant bit). This is the equivalent of dividing by 2, rounded down, for positive numbers. Shifting is used as a quick way to divide by a power of 2; if you shift by 1 (>> 1), you are dividing by 2, if you shift by 2 (>> 2), you are dividing by 4, and so on.
For example, here are a couple of examples of how this would work, if you were using 4 bit integers:
6 >> 1
0110 -> 0011
3
7 >> 1
0111 -> 0011
3
6 >> 2
0110 -> 0001
1
For negative numbers, it is a bit more complicated. The C standard does not specify the format of negative numbers. On most modern machines, they are stored in two's complement; that is, to represent a negative number, you take the positive representation, invert every bit, and add 1. The most significant bit is then taken to indicate the sign bit. If you right shift a negative number, there are two possible interpretations; one in which you always shift a 0 into the most significant bit, one in which you shift in a matching value to what was already there, known as "sign extension."
-2 >> 1
1110 -> 0111
7
1110 -> 1111
-1
The C standard does not specify which of these interpretations an implementation must use. GCC does the more expected one, sign extension, which is equivelent to dividing by two and rounding down, just like the positive case. Note that rounding down means "towards negative infinity", not "towards zero" as you might assume.
-3 >> 1
1101 -> 1110
-2

Bitwise Shift Clarification

Assume I have the variable x initialized to 425. In binary, that is 110101001.
Bitshifting it to the right by 2 as follows: int a = x >> 2;, the answer is: 106. In binary that is 1101010. This makes sense as the two right-most bits are dropped and two zero's are added to the left side.
Bitshifting it to the left by 2 as follows: int a = x << 2;, the answer is: 1700. In binary this is 11010100100. I don't understand how this works. Why are the two left most bits preserved? How can I drop them?
Thank you,

This is because int is probably 32-bits on your system. (Assuming x is type int.)
So your 425, is actually:
0000 0000 0000 0000 0000 0001 1010 1001
When left-shifted by 2, you get:
0000 0000 0000 0000 0000 0110 1010 0100
Nothing gets shifted off until you go all the way past 32. (Strictly speaking, overflow of signed-integer is undefined behavior in C/C++.)
To drop the bits that are shifted off, you need to bitwise AND against a mask that's the original length of your number:
int a = (425 << 2) & 0x1ff; // 0x1ff is for 9 bits as the original length of the number.

First off, don't shift signed integers. The bitwise operations are only universally unambiguous for unsigned integral types.
Second, why shift if you can use * 4 and / 4?
Third, you only drop bits on the left when you exceed the size of the type. If you want to "truncate on the left" mathematically, perform a modulo operation:
(x * 4) % 256
The bitwise equivalent is AND with a bit pattern: (x << 2) & 0xFF
(That is, the fundamental unsigned integral types in C are always implicitly "modulo 2n", where n is the number of bits of the type.)

Why would you expect them to be dropped? Your int (probably) consumes 4 bytes. You're shifting them into a space that it rightfully occupies.
The entire 4-byte space in memory is embraced during evaluation. You'd need to shift entirely out of that space in memory to "drop" them.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight