Explain this snippet which finds the maximum of two integers without using if-else or any other comparison operator? - c

Find the maximum of two numbers. You should not use if-else or any other comparison operator. I found this question on online bulletin board, so i thought i should ask in StackOverflow
EXAMPLE
Input: 5, 10
Output: 10
I found this solution, can someone help me understand these lines of code
int getMax(int a, int b) {
int c = a - b;
int k = (c >> 31) & 0x1;
int max = a - k * c;
return max;
}

int getMax(int a, int b) {
int c = a - b;
int k = (c >> 31) & 0x1;
int max = a - k * c;
return max;
}
Let's dissect this. This first line appears to be straightforward - it stores the difference of a and b. This value is negative if a < b and is nonnegative otherwise. But there's actually a bug here - if the difference of the numbers a and b is so big that it can't fit into an integer, this will lead to undefined behavior - oops! So let's assume that doesn't happen here.
In the next line, which is
int k = (c >> 31) & 0x1;
the idea is to check if the value of c is negative. In virtually all modern computers, numbers are stored in a format called two's complement in which the highest bit of the number is 0 if the number is positive and 1 if the number is negative. Moreover, most ints are 32 bits. (c >> 31) shifts the number down 31 bits, leaving the highest bit of the number in the spot for the lowest bit. The next step of taking this number and ANDing it with 1 (whose binary representation is 0 everywhere except the last bit) erases all the higher bits and just gives you the lowest bit. Since the lowest bit of c >> 31 is the highest bit of c, this reads the highest bit of c as either 0 or 1. Since the highest bit is 1 iff c is 1, this is a way of checking whether c is negative (1) or positive (0). Combining this reasoning with the above, k is 1 if a < b and is 0 otherwise.
The final step is to do this:
int max = a - k * c;
If a < b, then k == 1 and k * c = c = a - b, and so
a - k * c = a - (a - b) = a - a + b = b
Which is the correct max, since a < b. Otherwise, if a >= b, then k == 0 and
a - k * c = a - 0 = a
Which is also the correct max.

Here we go: (a + b) / 2 + |a - b| / 2

Use bitwise hacks
r = x ^ ((x ^ y) & -(x < y)); // max(x, y)
If you know that INT_MIN <= x - y <= INT_MAX, then you can use the following, which is faster because (x - y) only needs to be evaluated once.
r = x - ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // max(x, y)
Source : Bit Twiddling Hacks by Sean Eron Anderson

(sqrt( a*a + b*b - 2*a*b ) + a + b) / 2
This is based on the same technique as mike.dld's solution, but it is less "obvious" here what I am doing. An "abs" operation looks like you are comparing the sign of something but I here am taking advantage of the fact that sqrt() will always return you the positive square root so I am squaring (a-b) writing it out in full then square-rooting it again, adding a+b and dividing by 2.
You will see it always works: eg the user's example of 10 and 5 you get sqrt(100 + 25 - 100) = 5 then add 10 and 5 gives you 20 and divide by 2 gives you 10.
If we use 9 and 11 as our numbers we would get (sqrt(121 + 81 - 198) + 11 + 9)/2 = (sqrt(4) + 20) / 2 = 22/2 = 11

The simplest answer is below.
#include <math.h>
int Max(int x, int y)
{
return (float)(x + y) / 2.0 + abs((float)(x - y) / 2);
}
int Min(int x, int y)
{
return (float)(x + y) / 2.0 - abs((float)(x - y) / 2);
}

int max(int i, int j) {
int m = ((i-j) >> 31);
return (m & j) + ((~m) & i);
}
This solution avoids multiplication.
m will either be 0x00000000 or 0xffffffff

Using the shifting idea to extract the sign as posted by others, here's another way:
max (a, b) = new[] { a, b } [((a - b) >> 31) & 1]
This pushes the two numbers into an array with the maximum number given by the array-element whose index is sign bit of the difference between the two numbers.
Do note that:
The difference (a - b) may overflow.
If the numbers are unsigned and the >> operator refers to a logical right-shift, the & 1 is unnecessary.

Here's how I think I'd do the job. It's not as readable as you might like, but when you start with "how do I do X without using the obvious way of doing X, you have to kind of expect that.
In theory, this gives up some portability too, but you'd have to find a pretty unusual system to see a problem.
#define BITS (CHAR_BIT * sizeof(int) - 1)
int findmax(int a, int b) {
int rets[] = {a, b};
return rets[unsigned(a-b)>>BITS];
}
This does have some advantages over the one shown in the question. First of all, it calculates the correct size of shift, instead of being hard-coded for 32-bit ints. Second, with most compilers we can expect all the multiplication to happen at compile time, so all that's left at run time is trivial bit manipulation (subtract and shift) followed by a load and return. In short, this is almost certain to be pretty fast, even on the smallest microcontroller, where the original used multiplication that had to happen at run-time, so while it's probably pretty fast on a desktop machine, it'll often be quite a bit slower on a small microcontroller.

Here's what those lines are doing:
c is a-b. if c is negative, a<b.
k is 32nd bit of c which is the sign bit of c (assuming 32 bit integers. If done on a platform with 64 bit integers, this code will not work). It's shifted 31 bits to the right to remove the rightmost 31 bits leaving the sign bit in the right most place and then anding it with 1 to remove all the bits to the left (which will be filled with 1s if c is negative). So k will be 1 if c is negative and 0 if c is positive.
Then max = a - k * c. If c is 0, this means a>=b, so max is a - 0 * c = a. If c is 1, this means that a<b and then a - 1 * c = a - (a - b) = a - a + b = b.
In the overall, it's just using the sign bit of the difference to avoid using greater than or less than operations. It's honestly a little silly to say that this code doesn't use a comparison. c is the result of comparing a and b. The code just doesn't use a comparison operator. You could do a similar thing in many assembly codes by just subtracting the numbers and then jumping based on the values set in the status register.
I should also add that all of these solutions are assuming that the two numbers are integers. If they are floats, doubles, or something more complicated (BigInts, Rational numbers, etc.) then you really have to use a comparison operator. Bit-tricks will not generally do for those.

getMax() Function Without Any Logical Operation-
int getMax(int a, int b){
return (a+b+((a-b)>>sizeof(int)*8-1|1)*(a-b))/2;
}
Explanation:
Lets smash the 'max' into pieces,
max
= ( max + max ) / 2
= ( max + (min+differenceOfMaxMin) ) / 2
= ( max + min + differenceOfMaxMin ) / 2
= ( max + min + | max - min | ) ) / 2
So the function should look like this-
getMax(a, b)
= ( a + b + absolute(a - b) ) / 2
Now,
absolute(x)
= x [if 'x' is positive] or -x [if 'x' is negative]
= x * ( 1 [if 'x' is positive] or -1 [if 'x' is negative] )
In integer positive number the first bit (sign bit) is- 0; in negative it is- 1. By shifting bits to the right (>>) the first bit can be captured.
During right shift the empty space is filled by the sign bit. So 01110001 >> 2 = 00011100, while 10110001 >> 2 = 11101100.
As a result, for 8 bit number shifting 7 bit will either produce- 1 1 1 1 1 1 1 [0 or 1] for negative, or 0 0 0 0 0 0 0 [0 or 1] for positive.
Now, if OR operation is performed with 00000001 (= 1), negative number yields- 11111111 (= -1), and positive- 00000001 (= 1).
So,
absolute(x)
= x * ( 1 [if 'x' is positive] or -1 [if 'x' is negative] )
= x * ( ( x >> (numberOfBitsInInteger-1) ) | 1 )
= x * ( ( x >> ((numberOfBytesInInteger*bitsInOneByte) - 1) ) | 1 )
= x * ( ( x >> ((sizeOf(int)*8) - 1) ) | 1 )
Finally,
getMax(a, b)
= ( a + b + absolute(a - b) ) / 2
= ( a + b + ((a-b) * ( ( (a-b) >> ((sizeOf(int)*8) - 1) ) | 1 )) ) / 2
Another way-
int getMax(int a, int b){
int i[] = {a, b};
return i[( (i[0]-i[1]) >> (sizeof(int)*8 - 1) ) & 1 ];
}

static int mymax(int a, int b)
{
int[] arr;
arr = new int[3];
arr[0] = b;
arr[1] = a;
arr[2] = a;
return arr[Math.Sign(a - b) + 1];
}
If b > a then (a-b) will be negative, sign will return -1, by adding 1 we get index 0 which is b, if b=a then a-b will be 0, +1 will give 1 index so it does not matter if we are returning a or b, when a > b then a-b will be positive and sign will return 1, adding 1 we get index 2 where a is stored.

#include<stdio.h>
main()
{
int num1,num2,diff;
printf("Enter number 1 : ");
scanf("%d",&num1);
printf("Enter number 2 : ");
scanf("%d",&num2);
diff=num1-num2;
num1=abs(diff);
num2=num1+diff;
if(num1==num2)
printf("Both number are equal\n");
else if(num2==0)
printf("Num2 > Num1\n");
else
printf("Num1 > Num2\n");
}

The code which I am providing is for finding maximum between two numbers, the numbers can be of any data type(integer, floating). If the input numbers are equal then the function returns the number.
double findmax(double a, double b)
{
//find the difference of the two numbers
double diff=a-b;
double temp_diff=diff;
int int_diff=temp_diff;
/*
For the floating point numbers the difference contains decimal
values (for example 0.0009, 2.63 etc.) if the left side of '.' contains 0 then we need
to get a non-zero number on the left side of '.'
*/
while ( (!(int_diff|0)) && ((temp_diff-int_diff)||(0.0)) )
{
temp_diff = temp_diff * 10;
int_diff = temp_diff;
}
/*
shift the sign bit of variable 'int_diff' to the LSB position and find if it is
1(difference is -ve) or 0(difference is +ve) , then multiply it with the difference of
the two numbers (variable 'diff') then subtract it with the variable a.
*/
return a- (diff * ( int_diff >> (sizeof(int) * 8 - 1 ) & 1 ));
}
Description
The first thing the function takes the arguments as double and has return type as double. The reason for this is that to create a single function which can find maximum for all types. When integer type numbers are provided or one is an integer and other is the floating point then also due to implicit conversion the function can be used to find the max for integers also.
The basic logic is simple, let's say we have two numbers a & b if a-b>0(i.e. the difference is positive) then a is maximum else if a-b==0 then both are equal and if a-b<0(i.e. diff is -ve) b is maximum.
The sign bit is saved as the Most Significant Bit(MSB) in the memory. If MSB is 1 and vice-versa. To check if MSB is 1 or 0 we shift the MSB to the LSB position and Bitwise & with 1, if the result is 1 then the number is -ve else no. is +ve. This result is obtained by the statement:
int_diff >> (sizeof(int) * 8 - 1 ) & 1
Here to get the sign bit from the MSB to LSB we right shift it to k-1 bits(where k is the number of bits needed to save an integer number in the memory which depends on the type of system). Here k= sizeof(int) * 8 as sizeof() gives the number of bytes needed to save an integer to get no. of bits, we multiply it with 8. After the right shift, we apply the bitwise & with 1 to get the result.
Now after obtaining the result(let us assume it as r) as 1(for -ve diff) and 0(for +ve diff) we multiply the result with the difference of the two numbers, the logic is given as follows:
if a>b then a-b>0 i.e., is +ve so the result is 0(i.e., r=0). So a-(a-b)*r => a-(a-b)*0, which gives 'a' as the maximum.
if a < b then a-b<0 i.e., is -ve so the result is 1(i.e., r=1). So a-(a-b)*r => a-(a-b)*1 => a-a+b =>b , which gives 'b' as the maximum.
Now there are two remaining points 1. the use of while loop and 2. why I have used the variable 'int_diff' as an integer. To answer these properly we have to understand some points:
Floating type values cannot be used as an operand for the bitwise operators.
Due to above reason, we need to get the value in an integer value to get the sign of difference by using bitwise operators. These two points describe the need of variable 'int_diff' as integer type.
Now let's say we find the difference in variable 'diff' now there are 3 possibilities for the values of 'diff' irrespective of the sign of these values. (a). |diff|>=1 , (b). 0<|diff|<1 , (c). |diff|==0.
When we assign a double value to integer variable the decimal part is lost.
For case(a) the value of 'int_diff' >0 (i.e.,1,2,...). For other two cases int_diff=0.
The condition (temp_diff-int_diff)||0.0 checks if diff==0 so both numbers are equal.
If diff!=0 then we check if int_diff|0 is true i.e., case(b) is true
In the while loop, we try to get the value of int_diff as non-zero so that the value of int_diff also gets the sign of diff.

Here are a couple of bit-twiddling methods to get the max of two integral values:
Method 1
int max1(int a, int b) {
static const size_t SIGN_BIT_SHIFT = sizeof(a) * 8 - 1;
int mask = (a - b) >> SIGN_BIT_SHIFT;
return (a & ~mask) | (b & mask);
}
Explanation:
(a - b) >> SIGN_BIT_SHIFT - If a > b then a - b is positive, thus the sign bit is 0, and the mask is 0x00.00. Otherwise, a < b so a - b is negative, the sign bit is 1 and after shifting, we get a mask of 0xFF..FF
(a & ~mask) - If the mask is 0xFF..FF, then ~mask is 0x00..00 and then this value is 0. Otherwise, ~mask is 0xFF..FF and the value is a
(b & mask) - If the mask is 0xFF..FF, then this value is b. Otherwise, mask is 0x00..00 and the value is 0.
Finally:
If a >= b then a - b is positive, we get max = a | 0 = a
If a < b then a - b is negative, we get max = 0 | b = b
Method 2
int max2(int a, int b) {
static const size_t SIGN_BIT_SHIFT = sizeof(a) * 8 - 1;
int mask = (a - b) >> SIGN_BIT_SHIFT;
return a ^ ((a ^ b) & mask);
}
Explanation:
Mask explanation is the same as for Method 1. If a > b the mask is 0x00..00, otherwise the mask is 0xFF..FF
If the mask is 0x00..00, then (a ^ b) & mask is 0x00..00
If the mask is 0xFF..FF, then (a ^ b) & mask is a ^ b
Finally:
If a >= b, we get a ^ 0x00..00 = a
If a < b, we get a ^ a ^ b = b

//In C# you can use math library to perform min or max function
using System;
class NumberComparator
{
static void Main()
{
Console.Write(" write the first number to compare: ");
double first_Number = double.Parse(Console.ReadLine());
Console.Write(" write the second number to compare: ");
double second_Number = double.Parse(Console.ReadLine());
double compare_Numbers = Math.Max(first_Number, second_Number);
Console.Write("{0} is greater",compare_Numbers);
}
}

No logical operators, no libs (JS)
function (x, y) {
let z = (x - y) ** 2;
z = z ** .5;
return (x + y + z) / 2
}

The logic described in a problem can be explained as if 1st number is smaller then 0 will be subtracted else difference will be subtracted from 1st number to get 2nd number.
I found one more mathematical solution which I think is bit simpler to understand this concept.
Considering a and b as given numbers
c=|a/b|+1;
d=(c-1)/b;
smallest number= a - d*(a-b);
Again,The idea is to find k which is wither 0 or 1 and multiply it with difference of two numbers.And finally this number should be subtracted from 1st number to yield the smaller of the two numbers.
P.S. this solution will fail in case 2nd number is zero

There is one way
public static int Min(int a, int b)
{
int dif = (int)(((uint)(a - b)) >> 31);
return a * dif + b * (1 - dif);
}
and one
return (a>=b)?b:a;

int a=151;
int b=121;
int k=Math.abs(a-b);
int j= a+b;
double k1=(double)(k);
double j1= (double) (j);
double c=Math.ceil(k1/2) + Math.floor(j1/2);
int c1= (int) (c);
System.out.println(" Max value = " + c1);

Guess we can just multiply the numbers with their bitwise comparisons eg:
int max=(a>b)*a+(a<=b)*b;

Related

how to find if M is actually an output of 2power(2n) + 1 in C program

I have a tricky requirement in project asking to write function which returns a value 1 (0 otherwise) if given an integer representable as 22n+1. Where n is any non-negative integer.
int find_pow_2n_1(int M);
for e.g: return 1, when M=5 since 5 is output when n=1 -> 21*2+1 .
I am trying to evaluate the equation but it results in log function, not able to find any kind of hint while browsing in google as well .
Solution
int find_pow_2n_1(int M)
{
return 1 < M && !(M-1 & M-2) && M % 3;
}
Explanation
First, we discard values less than two, as we know the first matching number is two.
Then M-1 & M-2 tests whether there is more than one bit set in M-1:
M-1 cannot have zero bits set, since M is greater than one, so M-1 is not zero.
If M-1 has one bit set, then that bit is zero in M-2 and all lower bits are set, so M-1 and M-2 have no set bits in common, so M-1 & M-2 is zero.
If M-1 has more than one bit set, then M-2 has the lowest set bit cleared, but higher set bits remain set. So M-1 and M-2 have set bits in common, so M-1 & M-2 is non-zero.
So, if the test !(M-1 & M-2) passes, we know M-1 is a power of two. So M is one more than a power of two.
Our remaining concern is whether that is an even power of two. We can see that when M is an even power of two plus one, its remainder modulo three is two, whereas when M is an odd power of two plus one, its remainder modulo three is zero:
Remainder of 20+1 = 2 modulo 3 is 2.
Remainder of 21+1 = 3 modulo 3 is 0.
Remainder of 22+1 = 5 modulo 3 is 2.
Remainder of 23+1 = 9 modulo 3 is 0.
Remainder of 24+1 = 17 modulo 3 is 2.
Remainder of 25+1 = 33 modulo 3 is 0.
…
Therefore, M % 3, which tests whether the remainder of M modulo three is non-zero, tests whether M-1 is an even power of two.
There are only a few numbers with that property: make a table lookup array :-)
$ bc
for(n=0;n<33;n++)2^(2*n)+1
2
5
17
65
257
1025
4097
16385
65537
262145
1048577
4194305
16777217
67108865
268435457
1073741825
4294967297
17179869185
68719476737
274877906945
1099511627777
4398046511105
17592186044417
70368744177665
281474976710657
1125899906842625
4503599627370497
18014398509481985
72057594037927937
288230376151711745
1152921504606846977
4611686018427387905
18446744073709551617
Last number above is 2^64 + 1, probably will not fit an int in your implementation.
All proposed solutions are way too complicated or bad in performance. Try the simpler one:
static int is_power_of_2(unsigned long n)
{
return (n != 0 && ((n & (n - 1)) == 0));
}
static int is_power_of_2n(unsigned long n)
{
return is_power_of_2(n) && (__builtin_ffsl(n) & 1);
}
int main(void)
{
int x;
for (x = -3; x < 20; x++)
printf("Is %d = 2^2n + 1? %s\n", x, is_power_of_2n(x - 1) ? "Yes" : "no");
return 0;
}
Implementing __builtin_ffsl(), if you are using ancient compiler, I leave it as a homework (it can be done without tables or divisions).
Example: https://wandbox.org/permlink/gMrzZqhuP4onF8ku
While commenting on #Lundin's comment I realized that you may read a very nice set of bit twiddling hacks from Standford University.
UPDATE. As #grenix noticed the initial question was about the direct check, it may be done with the above code by introducing an additional wrapper, so nothing basically changes:
...
static int is_power_of_2n_plus_1(unsigned long n)
{
return is_power_of_2n(n - 1);
}
int main(void)
{
int x;
for (x = -3; x < 20; x++)
printf("Is %d = 2^2n + 1? %s\n", x, is_power_of_2n_plus_1(x) ? "Yes" : "no");
return 0;
}
Here I am leaving you a pseudocode (or a code that I haven't tested) which I think could help you think of the way to handle your problem :)
#include <math.h>
#include <stdlib.h>
#define EPSILON 0.000001
int find_pow_2n_1(int M) {
M--; // M = pow 2n now
double val = log2(M); // gives us 2n
val /= 2; // now we have n
if((val * 10) / 10 - val) <= EPSILON) return 1; // check whether n is an integer or not
else return 0;
}

How to efficiently verify whether pow(a, b) % b == a in C (without overflow)

I'd like to verify whether
pow(a, b) % b == a
is true in C, with 2 ≤ b ≤ 32768 (215) and 2 ≤ a ≤ b with a and b being integers.
However, directly computing pow(a, b) % b with b being a large number, this will quickly cause C to overflow. What would be a trick/efficient way of verifying whether this condition holds?
This question is based on finding a witness for Fermat's little theorem, which states that if this condition is false, b is not prime.
Also, I am also limited in the time it may take, it can't be too slow (near or over 2 seconds). The biggest Carmichael number, a number b that's not prime but also doesn't satisfy pow(a, b)% b == a with 2 <= a <= b (with b <= 32768) is 29341. Thus the method for checking pow(a, b) % b == a with 2 <= a <= 29341 shouldn't be too slow.
You can use the Exponentiation by squaring method.
The idea is the following:
Decompose b in binary form and decompose the product
Notice that we always use %b which is below 32768, so the result will always fit in a 32 bit number.
So the C code is:
/*
* this function computes (num ** pow) % mod
*/
int pow_mod(int num, int pow, int mod)
{
int res = 1
while (pow>0)
{
if (pow & 1)
{
res = (res*num) % mod;
}
pow /= 2;
num = (num*num)%mod;
}
return res;
}
You are doing modular arithmetic in Z/bZ.
Note that, in a quotient ring, the n-th power of the class of an element is the class of the n-th power of the element, so we have the following result:
(a^b) mod b = ((((a mod b) * a) mod b) * a) mod b [...] (b times)
So, you do not need a big integer library.
You can simply write a C program using the following algorithm (pseudo-code):
declare your variables a and b as integers.
use a temporary variable temp that is initialized with a.
do a loop with b steps, and compute (temp * a) mod b at each step, to get the new temp value.
compare the result with a.
With this formula, you can see that the highest value for temp is 32768, so you can choose an integer to store temp.

bitwise division by multiples of 2

I found many posts about bitwise division and I completely understand most bitwise usage but I can't think of a specific division. I want to divide a given number (lets say 100) with all the multiples of 2 possible (ATTENTION: I don't want to divide with powers of 2 bit multiples!)
For example: 100/2, 100/4, 100/6, 100/8, 100/10...100/100
Also I know that because of using unsigned int the answers will be rounded for example 100/52=0 but it doesn't really matter, because I can both skip those answers or print them, no problem. My concern is mostly how I can divide with 6 or 10, etc. (multiples of 2). There is need for it to be done in C, because I can manage to transform any code you give me from Java to C.
Following the math shown for the accepted solution to the division by 3 question, you can derive a recurrence for the division algorithm:
To compute (int)(X / Y)
Let k be such that 2k &geq; Y and 2k-1 < Y
(note, 2k = (1 << k))
Let d = 2k - Y
Then, if A = (int)(X / 2k) and B = X % 2k,
X = (1 << k) * A + B
= (1 << k) * A - Y * A + Y * A + B
= d * A + Y * A + B
= Y * A + (d * A + B)
Thus,
X/Y = A + (d * A + B)/Y
In otherwords,
If S(X, Y) := X/Y, then S(X, Y) := A + S(d * A + B, Y).
This recurrence can be implemented with a simple loop. The stopping condition for the loop is when the numerator falls below 2k. The function divu implements the recurrence, using only bitwise operators and using unsigned types. Helper functions for the math operations are left unimplemented, but shouldn't be too hard (the linked answer provides a full add implementation already). The rs() function is for "right-shift", which does sign extension on the unsigned input. The function div is the actual API for int, and checks for divide by zero and negative y before delegating to divu. negate does 2's complement negation.
static unsigned divu (unsigned x, unsigned y) {
unsigned k = 0;
unsigned pow2 = 0;
unsigned mask = 0;
unsigned diff = 0;
unsigned sum = 0;
while ((1 << k) < y) k = add(k, 1);
pow2 = (1 << k);
mask = sub(pow2, 1);
diff = sub(pow2, y);
while (x >= pow2) {
sum = add(sum, rs(x, k));
x = add(mul(diff, rs(x, k)), (x & mask));
}
if (x >= y) sum = add(sum, 1);
return sum;
}
int div (int x, int y) {
assert(y);
if (y > 0) return divu(x, y);
return negate(divu(x, negate(y)));
}
This implementation depends on signed int using 2's complement. For maximal portability, div should convert negative arguments to 2's complement before calling divu. Then, it should convert the result from divu back from 2's complement to the native signed representation.
The following code works for positive numbers. When the dividend or the divisor or both are negative, have flags to change the sign of the answer appropriately.
int divi(long long m, long long n)
{
if(m==0 || n==0 || m<n)
return 0;
long long a,b;
int f=0;
a=n;b=1;
while(a<=m)
{
b = b<<1;
a = a<<1;
f=1;
}
if(f)
{
b = b>>1;
a = a>>1;
}
b = b + divi(m-a,n);
return b;
}
Use the operator / for integer division as much as you can.
For instance, when you want to divide 100 by 6 or 10 you should write 100/6 or 100/10.
When you mention bit wise division do you (1) mean an implementation of operator / or (2) you are referring to the division by a power of two number.
For (1) a processor should have an integer division unit. If not the compiler should provide a good implementation.
For (2) you can use 100>>2 instead of 100/4. If the numerator is known at compile time then a good compiler should automatically use the shift instruction.

Take the average of two signed numbers in C

Let us say we have x and y and both are signed integers in C, how do we find the most accurate mean value between the two?
I would prefer a solution that does not take advantage of any machine/compiler/toolchain specific workings.
The best I have come up with is:(a / 2) + (b / 2) + !!(a % 2) * !!(b %2) Is there a solution that is more accurate? Faster? Simpler?
What if we know if one is larger than the other a priori?
Thanks.
D
Editor's Note: Please note that the OP expects answers that are not subject to integer overflow when input values are close to the maximum absolute bounds of the C int type. This was not stated in the original question, but is important when giving an answer.
After accept answer (4 yr)
I would expect the function int average_int(int a, int b) to:
1. Work over the entire range of [INT_MIN..INT_MAX] for all combinations of a and b.
2. Have the same result as (a+b)/2, as if using wider math.
When int2x exists, #Santiago Alessandri approach works well.
int avgSS(int a, int b) {
return (int) ( ((int2x) a + b) / 2);
}
Otherwise a variation on #AProgrammer:
Note: wider math is not needed.
int avgC(int a, int b) {
if ((a < 0) == (b < 0)) { // a,b same sign
return a/2 + b/2 + (a%2 + b%2)/2;
}
return (a+b)/2;
}
A solution with more tests, but without %
All below solutions "worked" to within 1 of (a+b)/2 when overflow did not occur, but I was hoping to find one that matched (a+b)/2 for all int.
#Santiago Alessandri Solution works as long as the range of int is narrower than the range of long long - which is usually the case.
((long long)a + (long long)b) / 2
#AProgrammer, the accepted answer, fails about 1/4 of the time to match (a+b)/2. Example inputs like a == 1, b == -2
a/2 + b/2 + (a%2 + b%2)/2
#Guy Sirton, Solution fails about 1/8 of the time to match (a+b)/2. Example inputs like a == 1, b == 0
int sgeq = ((a<0)==(b<0));
int avg = ((!sgeq)*(a+b)+sgeq*(b-a))/2 + sgeq*a;
#R.., Solution fails about 1/4 of the time to match (a+b)/2. Example inputs like a == 1, b == 1
return (a-(a|b)+b)/2+(a|b)/2;
#MatthewD, now deleted solution fails about 5/6 of the time to match (a+b)/2. Example inputs like a == 1, b == -2
unsigned diff;
signed mean;
if (a > b) {
diff = a - b;
mean = b + (diff >> 1);
} else {
diff = b - a;
mean = a + (diff >> 1);
}
If (a^b)<=0 you can just use (a+b)/2 without fear of overflow.
Otherwise, try (a-(a|b)+b)/2+(a|b)/2. -(a|b) is at least as large in magnitude as both a and b and has the opposite sign, so this avoids the overflow.
I did this quickly off the top of my head so there might be some stupid errors. Note that there are no machine-specific hacks here. All behavior is completely determined by the C standard and the fact that it requires twos-complement, ones-complement, or sign-magnitude representation of signed values and specifies that the bitwise operators work on the bit-by-bit representation. Nope, the relative magnitude of a|b depends on the representation...
Edit: You could also use a+(b-a)/2 when they have the same sign. Note that this will give a bias towards a. You can reverse it and get a bias towards b. My solution above, on the other hand, gives bias towards zero if I'm not mistaken.
Another try: One standard approach is (a&b)+(a^b)/2. In twos complement it works regardless of the signs, but I believe it also works in ones complement or sign-magnitude if a and b have the same sign. Care to check it?
Edit: version fixed by #chux - Reinstate Monica:
if ((a < 0) == (b < 0)) { // a,b same sign
return a/2 + b/2 + (a%2 + b%2)/2;
} else {
return (a+b)/2;
}
Original answer (I'd have deleted it if it hadn't been accepted).
a/2 + b/2 + (a%2 + b%2)/2
Seems the simplest one fitting the bill of no assumption on implementation characteristics (it has a dependency on C99 which specifying the result of / as "truncated toward 0" while it was implementation dependent for C90).
It has the advantage of having no test (and thus no costly jumps) and all divisions/remainder are by 2 so the use of bit twiddling techniques by the compiler is possible.
For unsigned integers the average is the floor of (x+y)/2. But the same fails for signed integers. This formula fails for integers whose sum is an odd -ve number as their floor is one less than their average.
You can read up more at Hacker's Delight in section 2.5
The code to calculate average of 2 signed integers without overflow is
int t = (a & b) + ((a ^ b) >> 1)
unsigned t_u = (unsigned)t
int avg = t + ( (t_u >> 31 ) & (a ^ b) )
I have checked it's correctness using Z3 SMT solver
Just a few observations that may help:
"Most accurate" isn't necessarily unique with integers. E.g. for 1 and 4, 2 and 3 are an equally "most accurate" answer. Mathematically (not C integers):
(a+b)/2 = a+(b-a)/2 = b+(a-b)/2
Let's try breaking this down:
If sign(a)!=sign(b) then a+b will will not overflow. This case can be determined by comparing the most significant bit in a two's complement representation.
If sign(a)==sign(b) then if a is greater than b, (a-b) will not overflow. Otherwise (b-a) will not overflow. EDIT: Actually neither will overflow.
What are you trying to optimize exactly? Different processor architectures may have different optimal solutions. For example, in your code replacing the multiplication with an AND may improve performance. Also in a two's complement architecture you can simply (a & b & 1).
I'm just going to throw some code out, not looking too fast but perhaps someone can use and improve:
int sgeq = ((a<0)==(b<0));
int avg = ((!sgeq)*(a+b)+sgeq*(b-a))/2 + sgeq*a
I would do this, convert both to long long(64 bit signed integers) add them up, this won't overflow and then divide the result by 2:
((long long)a + (long long)b) / 2
If you want the decimal part, store it as a double.
It is important to note that the result will fit in a 32 bit integer.
If you are using the highest-rank integer, then you can use:
((double)a + (double)b) / 2
This answer fits to any number of integers:
int[] array = { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
decimal avg = 0;
for (int i = 0; i < array.Length; i++){
avg = (array[i] - avg) / (i+1) + avg;
}
expects avg == 5.0 for this test

Bitwise C programming

Hey I have been having trouble with a C program. The program I have to write simulates the operation of a VAX computer. I have to take in 2 variables x and y to generate z.
within that there are two functions, the first
Sets Z to 1 where each bit position of y = 1
2nd sets z to 0 where each bit position of y = 1
I'm not asking for someone to do this for me, I just need an explanation on how this is carried out as I have a bare bones of the two functions that I need. I was thinking of something like this but I don't know if it's right at all.
#include<stdio.h>
int main()
{
int x1 = 1010;
int y1 = 0101;
bis(x1, y1);
bic(x1, y1);
}
/* BIT SET function that sets the result to 1 wherever y = 1 */
int bis (int x, int y)
{
int z = x & y;
int result = ?;
printf("BIT SET: \n\n", result);
return result;
}
/* BIT CLEAR function that sets result to 0 wherever y = 1 */
int bic(int x, int y)
{
int z = x & y;
int result = ?;
printf("BIT CLEAR:\n\n ", result);
return result;
}
Apologies for the poor naming conventions. Am I anyway on the right track for this program?
Let's look at bitset() first. I won't post C code, but we can solve this on paper as a start.
Say you have your integers with the following bit patterns: x = 1011 and y = 0101. (I'm changing your example numbers. And, incidentally, this is not how you would define two integers having these bit patterns, but right now we're focusing on the logic.)
If I am understanding correctly, when you call bitset(x, y), you want the answer, Z, to be 1111.
x = 1011
y = 0101
^ ^-------- Because these two bits have the value 1, then your answer also
has to set them to 1 while leaving the other bits in x alone.
Well, which bitwise operation will accomplish this? You have AND (&), OR (\), XOR (^), and COMPLEMENT (~).
In this case, you are ORing the two values. Looking at the following truth table:
x 1 0 1 1
y 0 1 0 1
-----------------
(x OR y) 1 1 1 1
Each bit in the last row is given by ORing that column in x and y. So (1 OR 0) = 1, (0 OR 1) = 1, (1 OR 0) = 1, (1 OR 1) = 1
So now you can write a C function bitset(x, y), ORs x and y, and returns the result as Z.
What bitwise operator - and you can do it in multiple steps with multiple operators - would you use to clear the bits?
x 1 0 1 1
y 0 1 0 1
-------------------------------------------
(SOME OPERATONS INVOLVING x and y) 1 0 1 0
What would those logical operators (from the list above) be? Think about the "and" and "complement" operators.
Good luck on your hw!
Bonus: A quick primer on expressing integers in C.
int x = 1337 creates an integer and gives it the value 1337. If you said x = 01337, x WILL NOT have the value "1337" like you might expect. By placing the 0 in front of the number, you're telling C that that number is in octal (base 8). The digits "1337", interpreted in base 8, is equivalent to decimal (base 10) 735. If you said x = 0x1337 then you are expressing the number in base 16, as a hexadecimal, equivalent to 4919 in base 10.
Nope... what you have there will and together two integers. One of which is 1010 (base10), and the other of which is 101 (base 8 - octal -> 65 base 10).
First you'll want to declare your constants as binary (by prefixing them with 0b).
Second, you'll want to out put them (for your instructor or TA) as a binary representation. Check out this question for more ideas

Resources