Bitwise operation and masks - c

I am having problem understanding how this piece of code works. I understand when the x is a positive number, actually only (x & ~mark) have a value; but cannot figure what this piece of code is doing when x is a negative number.
e.g. If x is 1100(-4), and mask would be 0001, while ~mask is 1110.
The result of ((~x & mask) + (x & ~mask)) is 0001 + 1100 = 1011(-3), I tried hard but cannot figure out what this piece of code is doing, any suggestion is helpful.
/*
* fitsBits - return 1 if x can be represented as an
* n-bit, two's complement integer.
* 1 <= n <= 32
* Examples: fitsBits(5,3) = 0, fitsBits(-4,3) = 1
* Legal ops: ! ~ & ^ | + << >>
* Max ops: 15
* Rating: 2
*/
int fitsBits(int x, int n) {
/* mask the sign bit against ~x and vice versa to get highest bit in x. Shift by n-1, and not. */
int mask = x >> 31;
return !(((~x & mask) + (x & ~mask)) >> (n + ~0));
}

Note: this is pointless and only worth doing as an academic exercise.
The code makes the following assumptions (which are not guaranteed by the C standard):
int is 32-bit (1 sign bit followed by 31 value bits)
int is represented using 2's complement
Right-shifting a negative number does arithmetic shift, i.e. fill sign bit with 1
With these assumptions in place, x >> 31 will generate all-bits-0 for positive or zero numbers, and all-bits-1 for negative numbers.
So the effect of (~x & mask) + (x & ~mask) is the same as (x < 0) ? ~x : x .
Since we assumed 2's complement, ~x for negative numbers is -(x+1).
The effect of this is that if x is positive it remains unchanged. and if x is negative then it's mapped onto the range [0, INT_MAX] . In 2's complement there are exactly as many negative numbers as non-negative numbers, so this works.
Finally, we right-shift by n + ~0. In 2's complement, ~0 is -1, so this is n - 1. If we shift right by 4 bits for example, and we shifted all the bits off the end; it means that this number is representable with 1 sign bit and 4 value bits. So this shift tells us whether the number fits or not.
Putting all of that together, it is an arcane way of writing:
int x;
if ( x < 0 )
x = -(x+1);
// now x is non-negative
x >>= n - 1; // aka. x /= pow(2, n-1)
if ( x == 0 )
return it_fits;
else
return it_doesnt_fit;

Here is a stab at it, unfortunately it is hard to summarize bitwise logic easily. The general idea is to try to right shift x and see if it becomes 0 as !0 returns 1. If right shifting a positive number n-1 times results in 0, then that means n bits are enough to represent it.
The reason for what I call a and b below is due to negative numbers being allowed one extra value of representation by convention. An integer can represent some number of values, that number of values is an even number, one of the numbers required to represent is 0, and so what is left is an odd number of values to be distributed among negative and positive numbers. Negative numbers get to have that one extra value (by convention) which is where the abs(x)-1 comes into play.
Let me know if you have questions:
int fitsBits(int x, int n) {
int mask = x >> 31;
/* -------------------------------------------------
// A: Bitwise operator logic to get 0 or abs(x)-1
------------------------------------------------- */
// mask == 0x0 when x is positive, therefore a == 0
// mask == 0xffffffff when x is negative, therefore a == ~x
int a = (~x & mask);
printf("a = 0x%x\n", a);
/* -----------------------------------------------
// B: Bitwise operator logic to get abs(x) or 0
----------------------------------------------- */
// ~mask == 0xffffffff when x is positive, therefore b == x
// ~mask == 0x0 when x is negative, therefore b == 0
int b = (x & ~mask);
printf("b = 0x%x\n", b);
/* ----------------------------------------
// C: A + B is either abs(x) or abs(x)-1
---------------------------------------- */
// c is either:
// x if x is a positive number
// ~x if x is a negative number, which is the same as abs(x)-1
int c = (a + b);
printf("c = %d\n", c);
/* -------------------------------------------
// D: A ridiculous way to subtract 1 from n
------------------------------------------- */
// ~0 == 0xffffffff == -1
// n + (-1) == n-1
int d = (n + ~0);
printf("d = %d\n", d);
/* ----------------------------------------------------
// E: Either abs(x) or abs(x)-1 is shifted n-1 times
---------------------------------------------------- */
int e = (c >> d);
printf("e = %d\n", e);
// If e was right shifted into 0 then you know the number would have fit within n bits
return !e;
}

You should be performing those operations with unsigned int instead of int.
Some operations like >> will perform an arithmetic shift instead of logical shift when dealing with signed numbers and you will have this sort of unexpected outcome.
A right arithmetic shift of a binary number by 1. The empty position in the most significant bit is filled with a copy of the original MSB instead of zero. -- from Wikipedia
With unsigned int though this is what happens:
In a logical shift, zeros are shifted in to replace the discarded bits. Therefore the logical and arithmetic left-shifts are exactly the same.
However, as the logical right-shift inserts value 0 bits into the most significant bit, instead of copying the sign bit, it is ideal for unsigned binary numbers, while the arithmetic right-shift is ideal for signed two's complement binary numbers. -- from Wikipedia

Related

What do these 3 lines of C code do by themselves, and what do they do altogether?

int mystery(int x) {
int mask = x >> 31;
return (x ^ mask)
+ ~mask + 1L;
I believe the first line creates a mask from x, such that it is all 1s if the most significant bit is 1, and all 0s if the MSB is 0.
The second line XORs the mask with the original x, which flips all the bits if the mask is 1s, and does nothing if the mask is all 0s.
Then the third line adds the complement of the mask, and also adds 1L... this is where I don't understand.
So my question is, what does the 3rd line do specifically, particularly the 1L?
And what does the entire function do to x?
This is returning the absolute value of a number without branching on a two's complement machine. However, there is one important exception: if x is originally INT_MIN, this will return INT_MIN.
Let's take a number -3 as an example and step through this.
int mask = x >> 31; defines a variable called mask that is -1 (or all bit if x was originally negative (not portable!) and 0 otherwise. With x as -3 mask is -1.
(x ^ mask) is the value of x with all bits flipped if x was originally negative. The result of this expression is 2 if x was originally -3.
+ ~mask + 1L; adds the result of the above to 1 if x was negative or 0 otherwise and returns it. The result would be 3 if x was originally -3.
To explain this step further, let's consider when mask is -1. The ~ will flip all the bits to 0. After that, adding 1 causes this to, well, add 1 to the result.
Considering when mask is 0, the ~ flips all the bits to 1 (which is -1) and then adds 1, so the result would be no change.
Let's also step through the INT_MIN scenario:
mask is -1.
flip all bits of INT_MIN is INT_MAX.
adding 1 to INT_MAX is INT_MIN (and undefined behavior!!!).
Now let's see what happens with a positive x. Using 5 as an example:
mask is 0.
x is left unchanged.
x is again left unchanged.
Unless you're using a compiler that defines all of this behavior, this won't work. I highly suggest against using this.
The function returns the absolute value of a number.
(x ^ mask) will flip the bits of the number if mask is all 1's (the number is initially negative).
~mask + 1L; will evaluate to either -1 + 1 or 0 + 1 depending on if mask is all 0s or 1s respectively.
Putting it all together, this means that when the number is negative, we flip the bits and add 1. When the number is positive, we don't flip the bits and add 0 (keep it the same). This converts a negative number to a positive number because of how computers store negative numbers. You can read further about this by looking up "two's compliment"
Assuming x is a 32-bit integer with the sign in the 32th (leftmost) bit,
int mask = x >> 31;
"mask" holds now either -1 (if x was negative) or 0. If it is -1, its binary representation is 0xffffffff (all 1's).
return (x ^ mask)
So x^mask is x XORed with either all 0s (remaining unchanged) or all 1's, including sign, which will invert its sign and map it to the positive value one less than the original. So 42 will remain 42, but -42 will become 41.
+ ~mask
not-mask will be 0 if x is negative, or all 1's, i.e. 0xffffffff (hence -1) if x was positive. Adding it, 42 will yield 41, and -42 (which was transformed to 41) will remain 41. Note that now in both cases we have the same number.
+ 1L;
Now, 42 (which was changed to 41) will become back 42, while -42 (which was 41) will become 42. The L is superfluous, as 1 alone will be taken as an int.
All in all, the function will return a value if it's positive, its opposite (i.e. its absolute value) if not.
NOTE: this function will obviously fail if its argument is the minimum (maximally negative) number, as that number has no positive representation in the same space as integers. So, mystery(-2147483647) will yield 2147483647 as expected, but mystery(-2147483648) will yield -2147483648.
Chances are very good that you'll fare better by using abs() instead of your mystery function, which could be an abs implementation for some architectures, while abs tends to be the best implementation for each architecture it's compiled for.
I believe it's supposed to return the absolute value of its argument, but it has a bug: if the argument passed in is 0x8000000 then it returns its argument. Test code:
#include <stdio.h>
int mystery(int x) {
printf("x=%d (0x%X)\n", x, x);
int mask = x >> 31;
printf("mask=%d (0x%X)\n", mask, mask);
printf("(x ^ mask)=0x%X\n", x ^ mask);
printf("~mask=0x%X\n", ~mask);
int retval = (x ^ mask) + ~mask + 1L;
printf("retval=%d (0x%X)\n\n", retval, retval);
return retval;
}
int main()
{
mystery(1);
mystery(0);
mystery(-1);
mystery(0x80000000);
}
All I've done is break up the calculation and print the intermediate results. For the calls shown this produces:
x=1 (0x1)
mask=0 (0x0)
(x ^ mask)=0x1
~mask=0xFFFFFFFF
retval=1 (0x1)
x=0 (0x0)
mask=0 (0x0)
(x ^ mask)=0x0
~mask=0xFFFFFFFF
retval=0 (0x0)
x=-1 (0xFFFFFFFF)
mask=-1 (0xFFFFFFFF)
(x ^ mask)=0x0
~mask=0x0
retval=1 (0x1)
x=-656 (0xFFFFFD70)
mask=-1 (0xFFFFFFFF)
(x ^ mask)=0x28F
~mask=0x0
retval=656 (0x290)
x=-2147483648 (0x80000000)
mask=-1 (0xFFFFFFFF)
(x ^ mask)=0x7FFFFFFF
~mask=0x0
retval=-2147483648 (0x80000000)

simulate jg instruction(datalab's isGreater)

I am doing CSAPP's datalab, the isGreater function.
Here's the description
isGreater - if x > y then return 1, else return 0
Example: isGreater(4,5) = 0, isGreater(5,4) = 1
Legal ops: ! ~ & ^ | + << >>
Max ops: 24
Rating: 3
x and y are both int type.
So i consider to simulate the jg instruction to implement it.Here's my code
int isGreater(int x, int y)
{
int yComplement = ~y + 1;
int minusResult = x + yComplement; // 0xffffffff
int SF = (minusResult >> 31) & 0x1; // 1
int ZF = !minusResult; // 0
int xSign = (x >> 31) & 0x1; // 0
int ySign = (yComplement >> 31) & 0x1; // 1
int OF = !(xSign ^ ySign) & (xSign ^ SF); // 0
return !(OF ^ SF) & !ZF;
}
The jg instruction need SF == OF and ZF == 0.
But it can't pass a special case, that is, x = 0x7fffffff(INT_MAX), y = 0x80000000(INT_MIN).
I deduce it like this:
x + yComplement = 0xffffffff, so SF = 1, ZF = 0, since xSign != ySign, the OF is set to 0.
So, what's wrong with my code, is my OF setting operation wrong?
You're detecting overflow in the addition x + yComplement, rather than in the overall subtraction
-INT_MIN itself overflows in 2's complement; INT_MIN == -INT_MIN. This is the 2's complement anomaly1.
You should be getting fast-positive overflow detection for any negative number (other than INT_MIN) minus INT_MIN. The resulting addition will have signed overflow. e.g. -10 + INT_MIN overflows.
http://teaching.idallen.com/dat2343/10f/notes/040_overflow.txt has a table of input/output signs for add and subtraction. The cases that overflow are where the inputs signs are opposite but the result sign matches y.
SUBTRACTION SIGN BITS (for num1 - num2 = sum)
num1sign num2sign sumsign
---------------------------
0 0 0
0 0 1
0 1 0
*OVER* 0 1 1 (subtracting a negative is the same as adding a positive)
*OVER* 1 0 0 (subtracting a positive is the same as adding a negative)
1 0 1
1 1 0
1 1 1
You could use this directly with the original x and y, and only use yComplement as part of getting the minusResult. Adjust your logic to match this truth table.
Or you could use int ySign = (~y) >> 31; and leave the rest of your code unmodified. (Use a tmp to hold ~y so you only do the operation once, for this and yComplement). The one's complement inverse (~) does not suffer from the 2's complement anomaly.
Footnote 1: sign/magnitude and one's complement have two redundant ways to represent 0, instead of an value with no inverse.
Fun fact: if you make an integer absolute-value function, you should consider the result unsigned to avoid this problem. int can't represent the absolute value of INT_MIN.
Efficiency improvements:
If you use unsigned int, you don't need & 1 after a shift because logical shifts don't sign-extend. (And as a bonus, it would avoid C signed-overflow undefined behaviour in +: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html).
Then (if you used uint32_t, or sizeof(unsigned) * CHAR_BIT instead of 31) you'd have a safe and portable implementation of 2's complement comparison. (signed shift semantics for negative numbers are implementation-defined in C.) I think you're using C as a sort of pseudo-code for bit operations, and aren't interested in actually writing a portable implementation, and that's fine. The way you're doing things will work on normal compilers on normal CPUs.
Or you can use & 0x80000000 to leave the high bits in place (but then you'd have to left shift your ! result).
It's just the lab's restriction, you can't use unsigned or any constant larger than 0xff(255)
Ok, so you don't have access to logical right shift. Still, you need at most one &1. It's ok to work with numbers where all you care about is the low bit, but where the rest hold garbage.
You eventually do & !ZF, which is either &0 or &1. Thus, any high garbage in OF` is wiped away.
You can also delay the >> 31 until after XORing together two numbers.
This is a fun problem that I want to optimize myself:
// untested, 13 operations
int isGreater_optimized(int x, int y)
{
int not_y = ~y;
int minus_y = not_y + 1;
int sum = x + minus_y;
int x_vs_y = x ^ y; // high bit = 1 if they were opposite signs: OF is possible
int x_vs_sum = x ^ sum; // high bit = 1 if they were opposite signs: OF is possible
int OF = (x_vs_y & x_vs_sum) >> 31; // high bits hold garbage
int SF = sum >> 31;
int non_zero = !!sum; // 0 or 1
return (~(OF ^ SF)) & non_zero; // high garbage is nuked by `& 1`
}
Note the use of ~ instead of ! to invert a value that has high garbage.
It looks like there's still some redundancy in calculating OF separately from SF, but actually the XORing of sum twice doesn't cancel out. x ^ sum is an input for &, and we XOR with sum after that.
We can delay the shifts even later, though, and I found some more optimizations by avoiding an extra inversion. This is 11 operations
// replace 31 with sizeof(int) * CHAR_BIT if you want. #include <limit.h>
// or use int32_t
int isGreater_optimized2(int x, int y)
{
int not_y = ~y;
int minus_y = not_y + 1;
int sum = x + minus_y;
int SF = sum; // value in the high bit, rest are garbage
int x_vs_y = x ^ y; // high bit = 1 if they were opposite signs: OF is possible
int x_vs_sum = x ^ sum; // high bit = 1 if they were opposite signs: OF is possible
int OF = x_vs_y & x_vs_sum; // low bits hold garbage
int less = (OF ^ SF);
int ZF = !sum; // 0 or 1
int le = (less >> 31) & ZF; // clears high garbage
return !le; // jg == jnle
}
I wondered if any compilers might see through this manual compare and optimize it into cmp edi, esi/ setg al, but no such luck :/ I guess that's not a pattern that they look for, because code that could have been written as x > y tends to be written that way :P
But anyway, here's the x86 asm output from gcc and clang on the Godbolt compiler explorer.
Assuming two's complement, INT_MIN's absolute value isn't representable as an int. So, yComplement == y (ie. still negative), and ySign is 1 instead of the desired 0.
You could instead calculate the sign of y like this (changing as little as possible in your code) :
int ySign = !((y >> 31) & 0x1);
For a more detailed analysis, and a more optimal alternative, check Peter Cordes' answer.

Divide a signed integer by a power of 2

I'm working on a way to divide a signed integer by a power of 2 using only binary operators (<< >> + ^ ~ & | !), and the result has to be round toward 0. I came across this question also on Stackoverflow on the problem, however, I cannot understand why it works. Here's the solution:
int divideByPowerOf2(int x, int n)
{
return (x + ((x >> 31) & ((1 << n) + ~0))) >> n;
}
I understand the x >> 31 part (only add the next part if x is negative, because if it's positive x will be automatically round toward 0). But what's bothering me is the (1 << n) + ~0 part. How can it work?
Assuming 2-complement, just bit-shifting the dividend is equivalent to a certain kind of division: not the conventional division where we round the dividend to next multiple of divisor toward zero. But another kind where we round the dividend toward negative infinity. I rediscovered that in Smalltalk, see http://smallissimo.blogspot.fr/2015/03/is-bitshift-equivalent-to-division-in.html.
For example, let's divide -126 by 8. traditionally, we would write
-126 = -15 * 8 - 6
But if we round toward infinity, we get a positive remainder and write it:
-126 = -16 * 8 + 2
The bit-shifting is performing the second operation, in term of bit patterns (assuming 8 bits long int for the sake of being short):
1000|0010 >> 3 = 1111|0000
1000|0010 = 1111|0000 * 0000|1000 + 0000|0010
So what if we want the traditional division with quotient rounded toward zero and remainder of same sign as dividend? Simple, we just have to add 1 to the quotient - if and only if the dividend is negative and the division is inexact.
You saw that x>>31 corresponds to first condition, dividend is negative, assuming int has 32 bits.
The second term corresponds to the second condition, if division is inexact.
See how are encoded -1, -2, -4, ... in two complement: 1111|1111 , 1111|1110 , 1111|1100. So the negation of nth power of two has n trailing zeros.
When the dividend has n trailing zeros and we divide by 2^n, then no need to add 1 to final quotient. In any other case, we need to add 1.
What ((1 << n) + ~0) is doing is creating a mask with n trailing ones.
The n last bits don't really matter, because we are going to shift to the right and just throw them away. So, if the division is exact, the n trailing bits of dividend are zero, and we just add n 1s that will be skipped. On the contrary, if the division is inexact, then one or more of the n trailing bits of the dividend is 1, and we are sure to cause a carry to the n+1 bit position: that's how we add 1 to the quotient (we add 2^n to the dividend). Does that explain it a bit more?
This is "write-only code": instead of trying to understand the code, try to create it by yourself.
For example, let's divide a number by 8 (shift right by 3).
If the number is negative, the normal right-shift rounds in the wrong direction. Let's "fix" it by adding a number:
int divideBy8(int x)
{
if (x >= 0)
return x >> 3;
else
return (x + whatever) >> 3;
}
Here you can come up with a mathematical formula for whatever, or do some trial and error. Anyway, here whatever = 7:
int divideBy8(int x)
{
if (x >= 0)
return x >> 3;
else
return (x + 7) >> 3;
}
How to unify the two cases? You need to make an expression that looks like this:
(x + stuff) >> 3
where stuff is 7 for negative x, and 0 for positive x. The trick here is using x >> 31, which is a 32-bit number whose bits are equal to the sign-bit of x: all 0 or all 1. So stuff is
(x >> 31) & 7
Combining all these, and replacing 8 and 7 by the more general power of 2, you get the code you asked about.
Note: in the description above, I assume that int represents a 32-bit hardware register, and hardware uses two's complement representation to do right shift.
OP's reference is of a C# code and so many subtle differences that cause it to be bad code with C, as this post is tagged.
int is not necessarily 32-bits so using a magic number of 32 does not make for a robust solution.
In particular (1 << n) + ~0 results in implementation defined behavior when n causes a bit to be shifted into the sign place. Not good coding.
Restricting code to only using "binary" operators << >> + ^ ~ & | ! encourages a coder to assume things about int which is not portable nor compliant with the C spec. So OP's posted code does not "work" in general, although may work in many common implementations.
OP code fails when int is not 2's complement, not uses the range [-2147483648 .. 2147483647] or when 1 << n uses implementation behavior that is not as expected.
// weak code
int divideByPowerOf2(int x, int n) {
return (x + ((x >> 31) & ((1 << n) + ~0))) >> n;
}
A simple alternative, assuming long long exceeds the range of int follows. I doubt this meets some corner of OP's goals, but OP's given goals encourages non-robust coding.
int divideByPowerOf2(int x, int n) {
long long ill = x;
if (x < 0) ill = -ill;
while (n--) ill >>= 1;
if (x < 0) ill = -ill;
return (int) ill;
}

how do I set up an if statement

How would I go about setting up an if statement using only the following bitwise operations:
!
~
&
^
|
&plus;
<<
>>
As an example: "if x is negative then add 15"
This would mean that if the input was x = 0xF0000000, then we would produce 0xF000000F. If the input was x = 0x00000004, we would produce 0x00000004.
You can add 15 to a number if it is negative, and 0 otherwise as follows. Shifting a negative number right will shift in 1's if the value is negative, and 0's otherwise. Shifting by 31 will fill the int with either 1's or 0's. ANDing by 0xF will set the summand to 15 if it is negative, and 0 otherwise, resulting in no change to x.
x += (x >> 31) & 0xF;
If you're worried about the implementation dependent behavior of shifting a signed number to the right. You can do the same thing with the following code, however you still are depending on a two's complement representation of the number. The shift results in 0 or 1, the multiplication scales the number to the appropriate value.
x += (((unsigned)x >> 31) * 0xF);

How to represent negation using bitwise operators, C

Suppose you have 2 numbers:
int x = 1;
int y = 2;
Using bitwise operators, how can i represent x-y?
When comparing the bits of two numbers A and B there are three posibilities. The following assumes unsigned numbers.
A == B : All of the bits are the same
A > B: The most significant bit that differs between the two numbers is set in A and not in B
A < B: The most significant bit that differs between the two numbers is set in B and not in A
Code might look like the following
int getDifType(uint32_t A, uint32_t B)
{
uint32_t bitMask = 0x8000000;
// From MSB to LSB
for (bitMask = 0x80000000; 0 != bitMask; bitMask >>= 1)
{
if (A & bitMask != B & bitMask)
return (A & bitMask) - (B & bitMask);
}
// No difference found
return 0;
}
You need to read about two's complement arithmetic. Addition, subtraction, negation, sign testing, and everything else are all done by the hardware using bitwise operations, so you can definitely do it in your C program. The wikipedia link above should teach you everything you need to know to solve your problem.
Your first step will be to implement addition using only bitwise operators. After that, everything should be easy. Start small- what do you have to do to implement 00 + 00, 01 + 01, etc? Go from there.
You need to start checking from the most significant end to find if a number is greater or not. This logic will work only for non-negative integers.
int x,y;
//get x & y
unsigned int mask=1; // make the mask 000..0001
mask=mask<<(8*sizeoF(int)-1); // make the mask 1000..000
while(mask!=0)
{
if(x & mask > y & mask)
{printf("x greater");break;}
else if(y & mask > x & mask)
{printf("y greater");break;}
mask=mask>>1; // shift 1 in mask to the right
}
Compare the bits from left to right, looking for the leftmost bits that differ. Assuming a machine that is two's complement, the topmost bit determines the sign and will have a flipped comparison sense versus the other bits. This should work on any two's complement machine:
int compare(int x, int y) {
unsigned int mask = ~0U - (~0U >> 1); // select left-most bit
if (x & mask && ~y & mask)
return -1; // x < 0 and y >= 0, therefore y > x
else if (~x & mask && y & mask)
return 1; // x >= 0 and y < 0, therefore x > y
for (; mask; mask >>= 1) {
if (x & mask && ~y & mask)
return 1;
else if (~x & mask && y & mask)
return -1;
}
return 0;
}
[Note that this technically isn't portable. C makes no guarantees that signed arithmetic will be two's complement. But you'll be hard pressed to find a C implementation on a modern machine that behaves differently.]
To see why this works, consider first comparing two unsigned numbers, 13d = 1101b and 11d = 1011b. (I'm assuming a 4-bit wordsize for brevity.) The leftmost differing bit is the second from the left, which the former has set, while the other does not. The former number is therefore the larger. It should be fairly clear that this principle holds for all unsigned numbers.
Now, consider two's complement numbers. You negate a number by complementing the bits and adding one. Thus, -1d = 1111b, -2d = 1110b, -3d = 1101b, -4d = 1100b, etc. You can see that two negative numbers can be compared as though they were unsigned. Likewise, two non-negative numbers can also be compared as though unsigned. Only when the signs differ do we have to consider them -- but if they differ, the comparison is trivial!

Resources