How can I divide a signed integer using only binary operators? - c

I can only use ! ~ & ^ | + << >>
I am writing this in C.
I am trying to divide a number x by 2^n.
So i thought if I shift x >> n that would work, but it does not work with odd negative integers. It originally looked like this:
int dl18(int x, int n) {
return (x >> n);
}
but if x = -9 and n = 1 the output should be -4 but it is -5.
and if x = -9 and n = 0 the output is correct (-9).
Thanks in advance.
So I figured out doing this makes it work for everything unless n = 0 and x is a negative number:
return (~(x >> 31) & (x >> n)) | ((x >> 31) & ((x >> n) + 1));

Assuming two's complement representation of signed integers and arithmetic shift behaviour of >> operator, the answer could be:
int dl18(int x, int n) {
if (x < 0) {
x += (1 << n) - 1;
}
return x >> n;
}
The addition is necessary, because >> rounds for negative numbers towards negative infinity. By adding 2^n - 1, the result is always truncated towards zero, just like it happens for / operator.
Due to your requirements, assuming that int has 4 bytes (and to be extra pedantic CHAR_BIT = 8), the expression may be rewritten (obfuscated) as:
(x + ((x >> 31) & ((1 << n) + ~0))) >> n
The idea of x >> 31 is to replicate MSB bit, so the mask becomes either all ones (i.e. 0xFFFFFFFF), or all zeros, which is then used to either preserve or eliminate ((1 << n) - 1) from addition. Parentheses around & are necessary, because addition has higher precedence than bitwise AND.
This algorithm is also used by GCC compiler. For instance:
int dl18_4(int x) { return x / 4; }
translates with -O1 into:
dl18_4:
lea eax, [rdi+3] ; eax = rdi + 3
test edi, edi ; set sign flag if edi < 0
cmovns eax, edi ; eax = edi if SF = 0
sar eax, 2 ; eax = eax >> 2
ret
Note that shifting by negative number invokes undefined behavior, so it may be safer to declare second parameter as unsigned int.

Here is a solution that avoids bit-shifting negative values. It does assume twos-complement representation, but it does not use the unary negative operator.
A bitmask is used to set neg to a non-zero value if x is negative, or to zero if x is non-negative. Here a trick suggested by #Grzegorz Szpetkowski is used to avoid subtraction by 1: adding ~0 instead. If x is negative, the value of x is changed to the magnitude of x. To avoid using the unary negative here, using a trick suggested by #chux, we take advantage of the fact that for a negative value in twos-complement, the corresponding positive value is equal to the bitwise negation of the negative representation plus 1.
This magnitude of x can be bit-shifted without encountering implementation-dependent behavior. After performing the division, the result is converted back to a negative value if the original value was negative, by performing the same transformation as before.
#include <stdio.h>
#include <limits.h>
int divide_2n(int x, unsigned n);
int main(void)
{
printf("-7 / 4 = %d\n", divide_2n(-7, 2));
printf("27 / 8 = %d\n", divide_2n(27, 3));
printf("-27 / 8 = %d\n", divide_2n(-27, 3));
printf("-9 / 2 = %d\n", divide_2n(-9, 1));
printf("-9 / 1 = %d\n", divide_2n(-9, 0));
return 0;
}
int divide_2n(int x, unsigned n)
{
unsigned n_bits = CHAR_BIT * sizeof(int);
unsigned neg = x & (1U << (n_bits + ~0));
if (neg) {
x = ~(unsigned)x + 1;
}
x = (unsigned)x >> n;
if (neg) {
x = ~x + 1;
}
return x;
}
-7 / 4 = -1
27 / 8 = 3
-27 / 8 = -3
-9 / 2 = -4
-9 / 1 = -9

Related

Efficient modulo-255 computation

I am trying to find the most efficient way to compute modulo 255 of an 32-bit unsigned integer. My primary focus is to find an algorithm that works well across x86 and ARM platforms with an eye towards applicability beyond that. To first order, I am trying to avoid memory operations (which could be expensive), so I am looking for bit-twiddly approaches while avoiding tables. I am also trying to avoid potentially expensive operations such as branches and multiplies, and minimize the number of operations and registers used.
The ISO-C99 code below captures the eight variants I tried so far. It includes a framework for exhaustive test. I bolted onto this some crude execution time measurement which seems to work well enough to get a first performance impression. On the few platforms I tried (all with fast integer multiplies) the variants WARREN_MUL_SHR_2, WARREN_MUL_SHR_1, and DIGIT_SUM_CARRY_OUT_1 seem to be the most performant. My experiments show that the x86, ARM, PowerPC and MIPS compilers I tried at Compiler Explorer all make very good use of platform-specific features such as three-input LEA, byte-expansion instructions, multiply-accumulate, and instruction predication.
The variant NAIVE_USING_DIV uses an integer division, back-multiply with the divisor followed by subtraction. This is the baseline case. Modern compilers know how to efficiently implement the unsigned integer division by 255 (via multiplication) and will use a discrete replacement for the backmultiply where appropriate. To compute modulo base-1 one can sum base digits, then fold the result. For example 3334 mod 9: sum 3+3+3+4 = 13, fold 1+3 = 4. If the result after folding is base-1, we need to generate 0 instead. DIGIT_SUM_THEN_FOLD uses this method.
A. Cockburn, "Efficient implementation of the OSI transport protocol checksum algorithm using 8/16-bit arithmetic", ACM SIGCOMM Computer Communication Review, Vol. 17, No. 3, July/Aug. 1987, pp. 13-20
showed a different way of adding digits modulo base-1 efficiently in the context of a checksum computation modulo 255. Compute a byte-wise sum of the digits, and after each addition, add any carry-out from the addition as well. So this would be an ADD a, b, ADC a, 0 sequence. Writing out the addition chain for this using base 256 digits it becomes clear that the computation is basically a multiply with 0x0101 ... 0101. The result will be in the most significant digit position, except that one needs to capture the carry-out from the addition in that position separately. This method only works when a base digit comprises 2k bits. Here we have k=3. I tried three different ways of remapping a result of base-1 to 0, resulting in variants DIGIT_SUM_CARRY_OUT_1, DIGIT_SUM_CARRY_OUT_2, DIGIT_SUM_CARRY_OUT_3.
An intriguing approach to computing modulo-63 efficiently was demonstrated by Joe Keane in the newsgroup comp.lang.c on 1995/07/09. While thread participant Peter L. Montgomery proved the algorithm correct, unfortunately Mr. Keane did not respond to requests to explain its derivation. This algorithm is also reproduced in H. Warren's Hacker's Delight 2nd ed. I was able to extend it, in purely mechanical fashion, to modulo-127 and modulo-255. This is the (appropriately named) KEANE_MAGIC variant. Update: Since I originally posted this question, I have worked out that Keane's approach is basically a clever fixed-point implementation of the following: return (uint32_t)(fmod (x * 256.0 / 255.0 + 0.5, 256.0) * (255.0 / 256.0));. This makes it a close relative of the next variant.
Henry S. Warren, Hacker's Delight 2nd ed., p. 272 shows a "multiply-shift-right" algorithm, presumably devised by the author themself, that is based on the mathematical property that n mod 2k-1 = floor (2k / 2k-1 * n) mod 2k. Fixed point computation is used to multiply with the factor 2k / 2k-1. I constructed two variants of this that differ in how they handle the mapping of a preliminary result of base-1 to 0. These are variants WARREN_MUL_SHR_1 and WARREN_MUL_SHR_2.
Are there algorithms for modulo-255 computation that are even more efficient than the three top contenders I have identified so far, in particular for platforms with slow integer multiplies? An efficient modification of Keane's multiplication-free algorithm for the summing of four base 256 digits would seem to be of particular interest in this context.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#define NAIVE_USING_DIV (1)
#define DIGIT_SUM_THEN_FOLD (2)
#define DIGIT_SUM_CARRY_OUT_1 (3)
#define DIGIT_SUM_CARRY_OUT_2 (4)
#define DIGIT_SUM_CARRY_OUT_3 (5)
#define KEANE_MAGIC (6) // Joe Keane, comp.lang.c, 1995/07/09
#define WARREN_MUL_SHR_1 (7) // Hacker's Delight, 2nd ed., p. 272
#define WARREN_MUL_SHR_2 (8) // Hacker's Delight, 2nd ed., p. 272
#define VARIANT (WARREN_MUL_SHR_2)
uint32_t mod255 (uint32_t x)
{
#if VARIANT == NAIVE_USING_DIV
return x - 255 * (x / 255);
#elif VARIANT == DIGIT_SUM_THEN_FOLD
x = (x & 0xffff) + (x >> 16);
x = (x & 0xff) + (x >> 8);
x = (x & 0xff) + (x >> 8) + 1;
x = (x & 0xff) + (x >> 8) - 1;
return x;
#elif VARIANT == DIGIT_SUM_CARRY_OUT_1
uint32_t t;
t = 0x01010101 * x;
t = (t >> 24) + (t < x);
if (t == 255) t = 0;
return t;
#elif VARIANT == DIGIT_SUM_CARRY_OUT_2
uint32_t t;
t = 0x01010101 * x;
t = (t >> 24) + (t < x) + 1;
t = (t & 0xff) + (t >> 8) - 1;
return t;
#elif VARIANT == DIGIT_SUM_CARRY_OUT_3
uint32_t t;
t = 0x01010101 * x;
t = (t >> 24) + (t < x);
t = t & ((t - 255) >> 8);
return t;
#elif VARIANT == KEANE_MAGIC
x = (((x >> 16) + x) >> 14) + (x << 2);
x = ((x >> 8) + x + 2) & 0x3ff;
x = (x - (x >> 8)) >> 2;
return x;
#elif VARIANT == WARREN_MUL_SHR_1
x = (0x01010101 * x + (x >> 8)) >> 24;
x = x & ((x - 255) >> 8);
return x;
#elif VARIANT == WARREN_MUL_SHR_2
x = (0x01010101 * x + (x >> 8)) >> 24;
if (x == 255) x = 0;
return x;
#else
#error unknown VARIANT
#endif
}
uint32_t ref_mod255 (uint32_t x)
{
volatile uint32_t t = x;
t = t % 255;
return t;
}
// timing with microsecond resolution
#if defined(_WIN32)
#if !defined(WIN32_LEAN_AND_MEAN)
#define WIN32_LEAN_AND_MEAN
#endif
#include <windows.h>
double second (void)
{
LARGE_INTEGER t;
static double oofreq;
static int checkedForHighResTimer;
static BOOL hasHighResTimer;
if (!checkedForHighResTimer) {
hasHighResTimer = QueryPerformanceFrequency (&t);
oofreq = 1.0 / (double)t.QuadPart;
checkedForHighResTimer = 1;
}
if (hasHighResTimer) {
QueryPerformanceCounter (&t);
return (double)t.QuadPart * oofreq;
} else {
return (double)GetTickCount() * 1.0e-3;
}
}
#elif defined(__linux__) || defined(__APPLE__)
#include <stddef.h>
#include <sys/time.h>
double second (void)
{
struct timeval tv;
gettimeofday(&tv, NULL);
return (double)tv.tv_sec + (double)tv.tv_usec * 1.0e-6;
}
#else
#error unsupported platform
#endif
int main (void)
{
double start, stop;
uint32_t res, ref, x = 0;
printf ("Testing VARIANT = %d\n", VARIANT);
start = second();
do {
res = mod255 (x);
ref = ref_mod255 (x);
if (res != ref) {
printf ("error # %08x: res=%08x ref=%08x\n", x, res, ref);
return EXIT_FAILURE;
}
x++;
} while (x);
stop = second();
printf ("test passed\n");
printf ("elapsed = %.6f seconds\n", stop - start);
return EXIT_SUCCESS;
}
For arbitrary unsigned integers, x and n, evaluating the modulo expression x % n involves (conceptually, at least), three operations: division, multiplication and subtraction:
quotient = x / n;
product = quotient * n;
modulus = x - product;
However, when n is a power of 2 (n = 2p), the modulo can be determined much more rapidly, simply by masking out all but the lower p bits.
On most CPUs, addition, subtraction and bit-masking are very 'cheap' (rapid) operations, multiplication is more 'expensive' and division is very expensive – but note that most optimizing compilers will convert division by a compile-time constant into a multiplication (by a different constant) and a bit-shift (vide infra).
Thus, if we can convert our modulo 255 into a modulo 256, without too much overhead, we can likely speed up the process. We can do just this by noting that x % n is equivalent to (x + x / n) % (n + 1)†. Thus, our conceptual operations are now: division, addition and masking.
In the specific case of masking the lower 8 bits, x86/x64-based CPUs (and others?) will likely be able to perform a further optimization, as they can access 8-bit versions of (most) registers.
Here's what the clang-cl compiler generates for a naïve modulo 255 function (argument passed in ecx and returned in eax):
unsigned Naive255(unsigned x)
{
return x % 255;
}
mov edx, ecx
mov eax, 2155905153 ;
imul rax, rdx ; Replacing the IDIV with IMUL and SHR
shr rax, 39 ;
mov edx, eax
shl edx, 8
sub eax, edx
add eax, ecx
And here's the (clearly faster) code generated using the 'trick' described above:
unsigned Trick255(unsigned x)
{
return (x + x / 255) & 0xFF;
}
mov eax, ecx
mov edx, 2155905153
imul rdx, rax
shr rdx, 39
add edx, ecx
movzx eax, dl ; Faster than an explicit AND mask?
Testing this code on a Windows-10 (64-bit) platform (Intel® Core™ i7-8550U CPU) shows that it significantly (but not hugely) out-performs the other algorithms presented in the question.
† The answer given by David Eisenstat explains how/why this equivalence is valid.
Here’s my sense of how the fastest answers work. I don’t know yet whether Keane can be improved or easily generalized.
Given an integer x ≥ 0, let q = ⌊x/255⌋ (in C, q = x / 255;) and r = x − 255 q (in C, r = x % 255;) so that q ≥ 0 and 0 ≤ r < 255 are integers and x = 255 q + r.
Adrian Mole’s method
This method evaluates (x + ⌊x/255⌋) mod 28 (in C, (x + x / 255) & 0xff), which equals (255 q + r + q) mod 28 = (28 q + r) mod 28 = r.
Henry S. Warren’s method
Note that x + ⌊x/255⌋ = ⌊x + x/255⌋ = ⌊(28/255) x⌋, where the first step follows from x being an integer. This method uses the multiplier (20 + 2−8 + 2−16 + 2−24 + 2−32) instead of 28/255, which is the sum of the infinite series 20 + 2−8 + 2−16 + 2−24 + 2−32 + …. Since the approximation is slightly under, this method must detect the residue 28 − 1 = 255.
Joe Keane’s method
The intuition for this method is to compute y = (28/255) x mod 28, which equals (28/255) (255 q + r) mod 28 = (28 q + (28/255) r) mod 28 = (28/255) r, and return y − y/28, which equals r.
Since these formulas don’t use the fact that ⌊(28/255) r⌋ = r, Keane can switch from 28 to 210 for two guard bits. Ideally, these would always be zero, but due to fixed-point truncation and an approximation for 210/255, they’re not. Keane adds 2 to switch from truncation to rounding, which also avoids the special case in Warren.
This method sort of uses the multiplier 22 (20 + 2−8 + 2−16 + 2−24 + 2−32 + 2−40) = 22 (20 + 2−16 + 2−32) (20 + 2−8). The C statement x = (((x >> 16) + x) >> 14) + (x << 2); computes x′ = ⌊22 (20 + 2−16 + 2−32) x⌋ mod 232. Then ((x >> 8) + x) & 0x3ff is x′′ = ⌊(20 + 2−8) x′⌋ mod 210.
I don’t have time right now to do the error analysis formally. Informally, the error interval of the first computation has width < 1; the second, width < 2 + 2−8; the third, width < ((2 − 2−8) + 1)/22 < 1, which allows correct rounding.
Regarding improvements, the 2−40 term of the approximation seems not necessary (?), but we might as well have it unless we can drop the 2−32 term. Dropping 2−32 pushes the approximation quality out of spec.
Guess you're probably not looking for solutions that require fast 64-bit multiplication, but for the record:
return (x * 0x101010101010102ULL) >> 56;
This method (improved slightly since the previous edit) mashes up Warren and Keane. On my laptop, it’s faster than Keane but not as fast as a 64-bit multiply and shift. It avoids multiplication but benefits from a single rotate instruction. Unlike the original version, it’s probably OK on RISC-V.
Like Warren, this method approximates ⌊(256/255) x mod 256⌋ in 8.24 fixed point. Mod 256, each byte b contributes a term (256/255) b, which is approximately b.bbb base 256. The original version of this method just sums all four byte rotations. (I’ll get to the revised version in a moment.) This sum always underestimates the real value, but by less than 4 units in the last place. By adding 4/2−24 before truncating, we guarantee the right answer as in Keane.
The revised version saves work by relaxing the approximation quality. We write (256/255) x = (257/256) (65536/65535) x, evaluate (65536/65535) x in 16.16 fixed point (i.e., add x to its 16-bit rotation), and then multiply by 257/256 and mod by 256 into 8.24 fixed point. The first multiplication has error less than 2 units in the last place of 16.16, and the second is exact (!). The sum underestimates by less than (2/216) (257/256), so a constant term of 514/224 suffices to fix the truncation. It’s also possible to use a greater value in case a different immediate operand is more efficient.
uint32_t mod255(uint32_t x) {
x += (x << 16) | (x >> 16);
return ((x << 8) + x + 514) >> 24;
}
If we were to have a builtin, intrinsic, or method that is optimised to single instruction addc, one could use 32-bit arithmetic in the following way:
uint32_t carry = 0;
// sum up top and bottom 16 bits while generating carry out
x = __builtin_addc(x, x<<16, carry, &carry);
x &= 0xffff0000;
// store the previous carry to bit 0 while adding
// bits 16:23 over bits 24:31, and producing one more carry
x = __builtin_addc(x, x << 8, carry, &carry);
x = __builtin_addc(x, x >> 24, carry, &carry);
x &= 0x0000ffff; // actually 0x1ff is enough
// final correction for 0<=x<=257, i.e. min(x,x-255)
x = x < x-255 ? x : x - 255;
In Arm64 at least the regular add instruction can take the form of add r0, r1, r2 LSL 16; the masking with immediate or clearing consecutive bits is a single instruction bfi r0, wzr, #start_bit, #length.
For parallel calculation one can't use that efficiently widening multiplication. Instead one can divide-and-conquer while calculating carries -- starting with 16 uint32_t elements interpreted as 16+16 uint16_t elements, then moving to uint8_t arithmetic, one can calculate one result in slightly less than one instruction.
a0 = vld2q_u16(ptr); // split input to top16+bot16 bits
a1 = vld2q_u16(ptr + 8); // load more inputs
auto b0 = vaddq_u16(a0.val[0], a0.val[1]);
auto b1 = vaddq_u16(a1.val[0], a1.val[1]);
auto c0 = vcltq_u16(b0, a0.val[1]); // 8 carries
auto c1 = vcltq_u16(b1, a1.val[1]); // 8 more carries
b0 = vsubq_u16(b0, c0);
b1 = vsubq_u16(b1, c1);
auto d = vuzpq_u8(b0, b1);
auto result = vaddq_u8(d.val[0], d.val[1]);
auto carry = vcltq_u8(result, d.val[1]);
result = vsubq_u8(result, carry);
auto is_255 = vceqq_u8(result, vdupq_n_u8(255));
result = vbicq_u8(result, is_255);

C checking for overflow during subtraction

I've been trying to determine whether there is overflow when subtracting two numbers of 32 bits. The rules I was given are:
Can only use: ! ~ & ^ | + << >>
* Max uses: 20
Example: subCheck(0x80000000,0x80000000) = 1,
* subCheck(0x80000000,0x70000000) = 0
No conditionals, loops, additional functions, or casting
So far I have
int dif = x - y; // dif is x - y
int sX = x >> 31; // get the sign of x
int sY = y >> 31; // get the sign of y
int sDif = dif >> 31; // get the sign of the difference
return (((!!sX) & (!!sY)) | (!sY)); // if the sign of x and the sign of y
// are the same, no overflow. If y is
// 0, no overflow.
I realize now I cannot use subtraction in the actual function (-), so my entire function is useless anyways. How can I use a different method than subtraction and determine whether there is overflow using only bitwise operations?
Thank you all for your help! Here is what I came up with to solve my issue:
int ny = 1 + ~y; // -y
int dif = x + ny; // dif is x - y
int sX = x >> 31; // get the sign of x
int sY = y >> 31; // get the sign of -y
int sDif = dif >> 31; // get the sign of the difference
return (!(sX ^ sY) | !(sDif ^ sX));
Every case I tried it with worked. I changed around what #HackerBoss suggested by getting the sign for y rather than ny and then reversing the two checks in the return statement. That way, if the signs are the same, or if the sign of the result and the sign of x are the same, it returns true.
Please buy and read Hacker's Delight for this stuff. Its a very good book.
int overflow_subtraction(int a, int b, int overflow)
{
unsigned int sum = (unsigned int)a - (unsigned int)b; // wrapround subtraction
int ssum = (int)sum;
// Hackers Delight: section Overflow Detection, subsection Signed Add/Subtract
// Let sum = a -% b == a - b - carry == wraparound subtraction.
// Overflow in a-b-carry occurs, iff a and b have opposite signs
// and the sign of a-b-carry is opposite of a (or equivalently same as b).
// Faster routine: res = (a ^ b) & (sum ^ a)
// Slower routine: res = (sum^a) & ~(sum^b)
// Oerflow occured, iff (res < 0)
if (((a ^ b) & (ssum ^ a)) < 0)
panic();
return ssum;
}
To avoid undefined behavior, I will assume that integers are represented in two's complement, inferred from your calculation of sX, sY, and sDif. I will also assume that sizeof(int) is 4. It would probably be better to use int32_t if you are working only with 32-bit integers, since the size of int can vary by platform.
Since you are allowed to use addition, you can think of subtraction as addition of the negation of a number. A number stored in two's complement may be negated by flipping all of the bits and adding one. This gives the following modified code:
int ny = 1 + ~y; // -y
int dif = x + ny; // dif is x - y
int sX = x >> 31; // get the sign of x
int sNY = ny >> 31; // get the sign of -y
int sDif = dif >> 31; // get the sign of the difference
return ((sX ^ sNY) | (~sDif ^ sX)); // if the sign of x and the sign of y
// are the same, no overflow. If the
// sign of dif is the same as the signs
// of x and -y, no overflow.

simulate jg instruction(datalab's isGreater)

I am doing CSAPP's datalab, the isGreater function.
Here's the description
isGreater - if x > y then return 1, else return 0
Example: isGreater(4,5) = 0, isGreater(5,4) = 1
Legal ops: ! ~ & ^ | + << >>
Max ops: 24
Rating: 3
x and y are both int type.
So i consider to simulate the jg instruction to implement it.Here's my code
int isGreater(int x, int y)
{
int yComplement = ~y + 1;
int minusResult = x + yComplement; // 0xffffffff
int SF = (minusResult >> 31) & 0x1; // 1
int ZF = !minusResult; // 0
int xSign = (x >> 31) & 0x1; // 0
int ySign = (yComplement >> 31) & 0x1; // 1
int OF = !(xSign ^ ySign) & (xSign ^ SF); // 0
return !(OF ^ SF) & !ZF;
}
The jg instruction need SF == OF and ZF == 0.
But it can't pass a special case, that is, x = 0x7fffffff(INT_MAX), y = 0x80000000(INT_MIN).
I deduce it like this:
x + yComplement = 0xffffffff, so SF = 1, ZF = 0, since xSign != ySign, the OF is set to 0.
So, what's wrong with my code, is my OF setting operation wrong?
You're detecting overflow in the addition x + yComplement, rather than in the overall subtraction
-INT_MIN itself overflows in 2's complement; INT_MIN == -INT_MIN. This is the 2's complement anomaly1.
You should be getting fast-positive overflow detection for any negative number (other than INT_MIN) minus INT_MIN. The resulting addition will have signed overflow. e.g. -10 + INT_MIN overflows.
http://teaching.idallen.com/dat2343/10f/notes/040_overflow.txt has a table of input/output signs for add and subtraction. The cases that overflow are where the inputs signs are opposite but the result sign matches y.
SUBTRACTION SIGN BITS (for num1 - num2 = sum)
num1sign num2sign sumsign
---------------------------
0 0 0
0 0 1
0 1 0
*OVER* 0 1 1 (subtracting a negative is the same as adding a positive)
*OVER* 1 0 0 (subtracting a positive is the same as adding a negative)
1 0 1
1 1 0
1 1 1
You could use this directly with the original x and y, and only use yComplement as part of getting the minusResult. Adjust your logic to match this truth table.
Or you could use int ySign = (~y) >> 31; and leave the rest of your code unmodified. (Use a tmp to hold ~y so you only do the operation once, for this and yComplement). The one's complement inverse (~) does not suffer from the 2's complement anomaly.
Footnote 1: sign/magnitude and one's complement have two redundant ways to represent 0, instead of an value with no inverse.
Fun fact: if you make an integer absolute-value function, you should consider the result unsigned to avoid this problem. int can't represent the absolute value of INT_MIN.
Efficiency improvements:
If you use unsigned int, you don't need & 1 after a shift because logical shifts don't sign-extend. (And as a bonus, it would avoid C signed-overflow undefined behaviour in +: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html).
Then (if you used uint32_t, or sizeof(unsigned) * CHAR_BIT instead of 31) you'd have a safe and portable implementation of 2's complement comparison. (signed shift semantics for negative numbers are implementation-defined in C.) I think you're using C as a sort of pseudo-code for bit operations, and aren't interested in actually writing a portable implementation, and that's fine. The way you're doing things will work on normal compilers on normal CPUs.
Or you can use & 0x80000000 to leave the high bits in place (but then you'd have to left shift your ! result).
It's just the lab's restriction, you can't use unsigned or any constant larger than 0xff(255)
Ok, so you don't have access to logical right shift. Still, you need at most one &1. It's ok to work with numbers where all you care about is the low bit, but where the rest hold garbage.
You eventually do & !ZF, which is either &0 or &1. Thus, any high garbage in OF` is wiped away.
You can also delay the >> 31 until after XORing together two numbers.
This is a fun problem that I want to optimize myself:
// untested, 13 operations
int isGreater_optimized(int x, int y)
{
int not_y = ~y;
int minus_y = not_y + 1;
int sum = x + minus_y;
int x_vs_y = x ^ y; // high bit = 1 if they were opposite signs: OF is possible
int x_vs_sum = x ^ sum; // high bit = 1 if they were opposite signs: OF is possible
int OF = (x_vs_y & x_vs_sum) >> 31; // high bits hold garbage
int SF = sum >> 31;
int non_zero = !!sum; // 0 or 1
return (~(OF ^ SF)) & non_zero; // high garbage is nuked by `& 1`
}
Note the use of ~ instead of ! to invert a value that has high garbage.
It looks like there's still some redundancy in calculating OF separately from SF, but actually the XORing of sum twice doesn't cancel out. x ^ sum is an input for &, and we XOR with sum after that.
We can delay the shifts even later, though, and I found some more optimizations by avoiding an extra inversion. This is 11 operations
// replace 31 with sizeof(int) * CHAR_BIT if you want. #include <limit.h>
// or use int32_t
int isGreater_optimized2(int x, int y)
{
int not_y = ~y;
int minus_y = not_y + 1;
int sum = x + minus_y;
int SF = sum; // value in the high bit, rest are garbage
int x_vs_y = x ^ y; // high bit = 1 if they were opposite signs: OF is possible
int x_vs_sum = x ^ sum; // high bit = 1 if they were opposite signs: OF is possible
int OF = x_vs_y & x_vs_sum; // low bits hold garbage
int less = (OF ^ SF);
int ZF = !sum; // 0 or 1
int le = (less >> 31) & ZF; // clears high garbage
return !le; // jg == jnle
}
I wondered if any compilers might see through this manual compare and optimize it into cmp edi, esi/ setg al, but no such luck :/ I guess that's not a pattern that they look for, because code that could have been written as x > y tends to be written that way :P
But anyway, here's the x86 asm output from gcc and clang on the Godbolt compiler explorer.
Assuming two's complement, INT_MIN's absolute value isn't representable as an int. So, yComplement == y (ie. still negative), and ySign is 1 instead of the desired 0.
You could instead calculate the sign of y like this (changing as little as possible in your code) :
int ySign = !((y >> 31) & 0x1);
For a more detailed analysis, and a more optimal alternative, check Peter Cordes' answer.

Bitwise operation and masks

I am having problem understanding how this piece of code works. I understand when the x is a positive number, actually only (x & ~mark) have a value; but cannot figure what this piece of code is doing when x is a negative number.
e.g. If x is 1100(-4), and mask would be 0001, while ~mask is 1110.
The result of ((~x & mask) + (x & ~mask)) is 0001 + 1100 = 1011(-3), I tried hard but cannot figure out what this piece of code is doing, any suggestion is helpful.
/*
* fitsBits - return 1 if x can be represented as an
* n-bit, two's complement integer.
* 1 <= n <= 32
* Examples: fitsBits(5,3) = 0, fitsBits(-4,3) = 1
* Legal ops: ! ~ & ^ | + << >>
* Max ops: 15
* Rating: 2
*/
int fitsBits(int x, int n) {
/* mask the sign bit against ~x and vice versa to get highest bit in x. Shift by n-1, and not. */
int mask = x >> 31;
return !(((~x & mask) + (x & ~mask)) >> (n + ~0));
}
Note: this is pointless and only worth doing as an academic exercise.
The code makes the following assumptions (which are not guaranteed by the C standard):
int is 32-bit (1 sign bit followed by 31 value bits)
int is represented using 2's complement
Right-shifting a negative number does arithmetic shift, i.e. fill sign bit with 1
With these assumptions in place, x >> 31 will generate all-bits-0 for positive or zero numbers, and all-bits-1 for negative numbers.
So the effect of (~x & mask) + (x & ~mask) is the same as (x < 0) ? ~x : x .
Since we assumed 2's complement, ~x for negative numbers is -(x+1).
The effect of this is that if x is positive it remains unchanged. and if x is negative then it's mapped onto the range [0, INT_MAX] . In 2's complement there are exactly as many negative numbers as non-negative numbers, so this works.
Finally, we right-shift by n + ~0. In 2's complement, ~0 is -1, so this is n - 1. If we shift right by 4 bits for example, and we shifted all the bits off the end; it means that this number is representable with 1 sign bit and 4 value bits. So this shift tells us whether the number fits or not.
Putting all of that together, it is an arcane way of writing:
int x;
if ( x < 0 )
x = -(x+1);
// now x is non-negative
x >>= n - 1; // aka. x /= pow(2, n-1)
if ( x == 0 )
return it_fits;
else
return it_doesnt_fit;
Here is a stab at it, unfortunately it is hard to summarize bitwise logic easily. The general idea is to try to right shift x and see if it becomes 0 as !0 returns 1. If right shifting a positive number n-1 times results in 0, then that means n bits are enough to represent it.
The reason for what I call a and b below is due to negative numbers being allowed one extra value of representation by convention. An integer can represent some number of values, that number of values is an even number, one of the numbers required to represent is 0, and so what is left is an odd number of values to be distributed among negative and positive numbers. Negative numbers get to have that one extra value (by convention) which is where the abs(x)-1 comes into play.
Let me know if you have questions:
int fitsBits(int x, int n) {
int mask = x >> 31;
/* -------------------------------------------------
// A: Bitwise operator logic to get 0 or abs(x)-1
------------------------------------------------- */
// mask == 0x0 when x is positive, therefore a == 0
// mask == 0xffffffff when x is negative, therefore a == ~x
int a = (~x & mask);
printf("a = 0x%x\n", a);
/* -----------------------------------------------
// B: Bitwise operator logic to get abs(x) or 0
----------------------------------------------- */
// ~mask == 0xffffffff when x is positive, therefore b == x
// ~mask == 0x0 when x is negative, therefore b == 0
int b = (x & ~mask);
printf("b = 0x%x\n", b);
/* ----------------------------------------
// C: A + B is either abs(x) or abs(x)-1
---------------------------------------- */
// c is either:
// x if x is a positive number
// ~x if x is a negative number, which is the same as abs(x)-1
int c = (a + b);
printf("c = %d\n", c);
/* -------------------------------------------
// D: A ridiculous way to subtract 1 from n
------------------------------------------- */
// ~0 == 0xffffffff == -1
// n + (-1) == n-1
int d = (n + ~0);
printf("d = %d\n", d);
/* ----------------------------------------------------
// E: Either abs(x) or abs(x)-1 is shifted n-1 times
---------------------------------------------------- */
int e = (c >> d);
printf("e = %d\n", e);
// If e was right shifted into 0 then you know the number would have fit within n bits
return !e;
}
You should be performing those operations with unsigned int instead of int.
Some operations like >> will perform an arithmetic shift instead of logical shift when dealing with signed numbers and you will have this sort of unexpected outcome.
A right arithmetic shift of a binary number by 1. The empty position in the most significant bit is filled with a copy of the original MSB instead of zero. -- from Wikipedia
With unsigned int though this is what happens:
In a logical shift, zeros are shifted in to replace the discarded bits. Therefore the logical and arithmetic left-shifts are exactly the same.
However, as the logical right-shift inserts value 0 bits into the most significant bit, instead of copying the sign bit, it is ideal for unsigned binary numbers, while the arithmetic right-shift is ideal for signed two's complement binary numbers. -- from Wikipedia

Moving a "nibble" to the left using C

I've been working on this puzzle for awhile. I'm trying to figure out how to rotate 4 bits in a number (x) around to the left (with wrapping) by n where 0 <= n <= 31.. The code will look like:
moveNib(int x, int n){
//... some code here
}
The trick is that I can only use these operators:
~ & ^ | + << >>
and of them only a combination of 25. I also can not use If statements, loops, function calls. And I may only use type int.
An example would be moveNib(0x87654321,1) = 0x76543218.
My attempt: I have figured out how to use a mask to store the the bits and all but I can't figure out how to move by an arbitrary number. Any help would be appreciated thank you!
How about:
uint32_t moveNib(uint32_t x, int n) { return x<<(n<<2) | x>>((8-n)<<2); }
It uses <<2 to convert from nibbles to bits, and then shifts the bits by that much. To handle wraparound, we OR by a copy of the number which has been shifted by the opposite amount in the opposite direciton. For example, with x=0x87654321 and n=1, the left part is shifted 4 bits to the left and becomes 0x76543210, and the right part is shifted 28 bits to the right and becomes 0x00000008, and when ORed together, the result is 0x76543218, as requested.
Edit: If - really isn't allowed, then this will get the same result (assuming an architecture with two's complement integers) without using it:
uint32_t moveNib(uint32_t x, int n) { return x<<(n<<2) | x>>((9+~n)<<2); }
Edit2: OK. Since you aren't allowed to use anything but int, how about this, then?
int moveNib(int x, int n) { return (x&0xffffffff)<<(n<<2) | (x&0xffffffff)>>((9+~n)<<2); }
The logic is the same as before, but we force the calculation to use unsigned integers by ANDing with 0xffffffff. All this assumes 32 bit integers, though. Is there anything else I have missed now?
Edit3: Here's one more version, which should be a bit more portable:
int moveNib(int x, int n) { return ((x|0u)<<((n&7)<<2) | (x|0u)>>((9+~(n&7))<<2))&0xffffffff; }
It caps n as suggested by chux, and uses |0u to convert to unsigned in order to avoid the sign bit duplication you get with signed integers. This works because (from the standard):
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
Since int and 0u have the same rank, but 0u is unsigned, then the result is unsigned, even though ORing with 0 otherwise would be a null operation.
It then truncates the result to the range of a 32-bit int so that the function will still work if ints have more bits than this (though the rotation will still be performed on the lowest 32 bits in that case. A 64-bit version would replace 7 by 15, 9 by 17 and truncate using 0xffffffffffffffff).
This solution uses 12 operators (11 if you skip the truncation, 10 if you store n&7 in a variable).
To see what happens in detail here, let's go through it for the example you gave: x=0x87654321, n=1. x|0u results in a the unsigned number 0x87654321u. (n&7)<<2=4, so we will shift 4 bits to the left, while ((9+~(n&7))<<2=28, so we will shift 28 bits to the right. So putting this together, we will compute 0x87654321u<<4 | 0x87654321u >> 28. For 32-bit integers, this is 0x76543210|0x8=0x76543218. But for 64-bit integers it is 0x876543210|0x8=0x876543218, so in that case we need to truncate to 32 bits, which is what the final &0xffffffff does. If the integers are shorter than 32 bits, then this won't work, but your example in the question had 32 bits, so I assume the integer types are at least that long.
As a small side-note: If you allow one operator which is not on the list, the sizeof operator, then we can make a version that works with all the bits of a longer int automatically. Inspired by Aki, we get (using 16 operators (remember, sizeof is an operator in C)):
int moveNib(int x, int n) {
int nbit = (n&((sizeof(int)<<1)+~0u))<<2;
return (x|0u)<<nbit | (x|0u)>>((sizeof(int)<<3)+1u+~nbit);
}
Without the additional restrictions, the typical rotate_left operation (by 0 < n < 32) is trivial.
uint32_t X = (x << 4*n) | (x >> 4*(8-n));
Since we are talking about rotations, n < 0 is not a problem. Rotation right by 1 is the same as rotation left by 7 units. Ie. nn=n & 7; and we are through.
int nn = (n & 7) << 2; // Remove the multiplication
uint32_t X = (x << nn) | (x >> (32-nn));
When nn == 0, x would be shifted by 32, which is undefined. This can be replaced simply with x >> 0, i.e. no rotation at all. (x << 0) | (x >> 0) == x.
Replacing the subtraction with addition: a - b = a + (~b+1) and simplifying:
int nn = (n & 7) << 2;
int mm = (33 + ~nn) & 31;
uint32_t X = (x << nn) | (x >> mm); // when nn=0, also mm=0
Now the only problem is in shifting a signed int x right, which would duplicate the sign bit. That should be cured by a mask: (x << nn) - 1
int nn = (n & 7) << 2;
int mm = (33 + ~nn) & 31;
int result = (x << nn) | ((x >> mm) & ((1 << nn) + ~0));
At this point we have used just 12 of the allowed operations -- next we can start to dig into the problem of sizeof(int)...
int nn = (n & (sizeof(int)-1)) << 2; // etc.

Resources