How to add two numbers without using + operator in C using bit manipulation - c

I recently came across this interview question and I'm not good in bit manipulation. Can you guys explain what the function 'f' does. I'm not sure what this recursive function does.
unsigned int f (unsigned int a , unsigned int b)
{
return a ? f ( (a&b) << 1, a ^b) : b;
}
I tried to paste the code in Visual Studio to test the logic but compiler is throwing some error message "cannot implicitly convert type 'uint' to 'bool'. Is the condition statement (a ?) in the return missing something? but I'm sure the interview question was exactly same as mentioned above

Well already a few people in the comments mentioning this just adds two numbers. I'm not sure of a better way to figure that out than just try some inputs and note the results.
Ex:
f(5,1) --> returns f(2,4) --> returns f(0,6) --> returns 6
1.) 5&1 = 1 bit shifted = 2: 5^1 = 4
2.) 2&4 = 0 bit shifted = 0: 2^4 = 6
3.) a = 0 so return b of 6
f(4,3) --> returns f(0,7) --> returns 7
1.) 4&3 = 0 bit shifted = 0: 4^3 = 7
2.) a = 0 so return b of 7
After you show a few examples of the output I suppose you could postulate f returns the two inputs added together.

The prospective employer is either very thorough, or enjoys cruel & unusual punishment.
For the answer to the question, a review of the freely available chapter 2 from HackersDelight is worth its weight in gold. The addition operator in the function is taken from HAKMEM memo -- Item 23 and substitutes a left-shift in place of multiplying by 2. The original HAKMEM memo proposes:
(A AND B) + (A OR B) = A + B = (A XOR B) + 2 (A AND B)
or re-written in C:
x + y = (x ^ y) + 2 * (x & y)
The creative employer then uses (x & y) << 1 in place of 2 * (x & y) and recursion to compute to sum and or a and b until a = 0 at which time b is returned.
Glad it wasn't my interview.

Related

FIFO implementation in C

I am analysing an Internet guide, where I fond code like that. Can somebody explain me the usage of ~ and & operators?
Thanks in advance
uint8_t tx_fifo_put(tx_dataType data)
{
/*Check if FIFO is full*/
if((tx_put_itr - tx_get_itr) & ~(TXFIFOSIZE-1))
{
/*FIFO full - return TXFAIL*/
return (TXFAIL);
}
/*Put data into fifo*/
TX_FIFO[tx_put_itr & (TXFIFOSIZE - 1)] = data;
/*Incerment itr*/
tx_put_itr++;
return(TXSUCCESS);
}
What the code does, is an obfuscated way to replace a more human readable code.
As a commenter wrote before me, the TX_FIFO[tx_put_itr & (TXFIFOSIZE - 1)] = data; loops the output. Also as it was mentioned in comments, the code is meant to have size being power of two.
I do not know why it is done so, for me TX_FIFO[tx_put_itr % TXFIFOSIZE] = data does the same, but more readable. Also, a person expects predicate checks to be before data access. At least it is my nature.
The (w - r) &~ size part is a way to check for (1)w < r and, (2) as an edge case, w being equal to FIFOSIZE and r being zero. Semantically it should have meant, that "if the write pointer points to boundary, and read pointer points to start of a buffer, we suggest that, for our data structure, next write could be an overflow."
Let us see some code, numbers and their binary representation.
let s = 8 - 1, in binary is 00000111 and negated is 11111000.
let w = 0, let r = 1.
now in binary w = 00000000, r = 00000001.
w - r = 11111111, logical and that with ~(8 - 1) and get some value, other then zero.
Continuing the logic for the w < r case, we get that any negative integer will produce some bits in the above. So this definitely gives true for the OP if code.
Now the w = r case can not commit bits to the boolean test.
And last case,
let s = 8,
let w = 8
let r = 0
w - r = 00001000
~(8 - 1) = 11111000
(w - r) &~ 7 = 00001000
All other cases where w > r give zero.
Update
To my great grief, the #UkropUkraine had deleted all comments and his answer. There were some discussion there about the fact, that one can use (w - r) >= mask in place of (w - r) & mask.
Here I present a code, and an explanation that it is not an optimization, or just syntax, or whatever came to mind to the person who wrote the OP code. It is intended code. And it fails to do its purpose: to run as a FIFO or circular queue, or whatever that part of code was meant to do.
First, take an example of usage. The part where Ukrop user had difficulties. The w pointer can be less than r pointer. And the result of w - r will be negative.
The common usage is to add a byte to the buffer and wrap write pointer as soon as it reaches the end. Imagine situation where w pointer already wrapped.
#include <stdio.h>
int main()
{
unsigned char w = 0, r = 1;
int r;
r = (a - b) & 0xffffffff;
printf("%d\n", r);
return 0;
}
-1
I do not know what is a common boolean result type with micro controllers. For a common x86 C machine, it is int. So I expect the if((w - r) &~ size) to be converted to an int. And the result is negative. You can not just write the above with >=, '>', or == as it was stated by the comments and the other answer here.
More than that, the code fails its semantics. It is meant to be a FIFO, or something, I do not know. But in the above situation, the read pointer still has some sensible data to read. And it can be done, because the write pointer, even if it is wrapped, does not overwrite the read portion of a buffer, yet. But the code returns BUFFULL.
I thought about read/write being different directions, but it does not change anything. The code OP gave, fails to do what one would expect.
Maybe I do miss some insight here, as Ukrop user, and OP, point me to the fact that they know code semantics. The OP just did not get a ~ and & usage. Well, this is an answer, the ~& is used to test for a negative value and for the edge cases.
The two operators:
& is a bitwise and operator
~ is a bitwise complement operator
Now for the posted code it's important to notice that TXFIFOSIZE must have a value which is a power of 2, i.e. values like 2, 4, 8, 16, 32, ...
When that is true, the code:
TX_FIFO[tx_put_itr & (TXFIFOSIZE - 1)] = data;
is equivalent to:
TX_FIFO[tx_put_itr % TXFIFOSIZE] = data;
Notice that tx_put_itr is being incremented in such a way that it will take value higher than TXFIFOSIZE. So in order to get a valid array index the code must find the remainder of tx_put_itr with respect to TXFIFOSIZE.
So how does work? Why are the above lines equivalent?
Let's take a value as example.
Assume TXFIFOSIZE is 8 (2 to the power of 3)
So TXFIFOSIZE-1 is 7
7 is bitwise 00....00111
And when you do:
SOME_NUMBER & 00....00111
You keep the 3 least significant bits of SOME_NUMBER
And that is exactly the remainder of when diving by 8
So let's look at
if((tx_put_itr - tx_get_itr) & ~(TXFIFOSIZE-1))
It is equivalent to
if((tx_put_itr - tx_get_itr) >= TXFIFOSIZE)
So it checks for "FIFO full"
Again using an example it works like this:
Assume TXFIFOSIZE is 8 (2 to the power of 3)
So TXFIFOSIZE-1 is 7
7 is bitwise 00....00111
~7 is bitwise 11....11000
And when you do:
SOME_NUMBER & 11....11000
You clear the 3 least significant bits of SOME_NUMBER and keep the rest unchanged
So if the result is non-zero it means that the difference between
tx_put_itr and tx_get_itr is 8 (or more).

Big integer addition code

I am trying to immplement big integer addition in CUDA using the following code
__global__ void add(unsigned *A, unsigned *B, unsigned *C/*output*/, int radix){
int id = blockIdx.x * blockDim.x + threadIdx.x;
A[id ] = A[id] + B[id];
C[id ] = A[id]/radix;
__syncthreads();
A[id] = A[id]%radix + ((id>0)?C[id -1]:0);
__syncthreads();
C[id] = A[id];
}
but it does not work properly and also i don't now how to handle the extra carry bit. Thanks
TL;DR build a carry-lookahead adder where each individual additionner adds modulo radix, instead of modulo 2
Additions need incoming carries
The problem in your model is that you have a rippling carry. See Rippling carry adders.
If you were in an FPGA that wouldn't be a problem because they have dedicated logic to do that fast (carry chains, they're cool). But alas, you're on a GPU !
That is, for a given id, you only know the input carry (thus whether you are going to sum A[id]+B[id] or A[id]+B[id]+1) when all the sums with smaller id values have been computed. As a matter of fact, initially, you only know the first carry.
A[3]+B[3] + ? A[2]+B[2] + ? A[1]+B[1] + ? A[0]+B[0] + 0
| | | |
v v v v
C[3] C[2] C[1] C[0]
Characterize the carry output
And each sum also has a carry output, which isn't on the drawing. So you have to think of the addition in this larger scheme as a function with 3 inputs and 2 outputs : (C, c_out) = add(A, B, c_in)
In order to not wait O(n) for the sum to complete (where n is the number of items your sum is cut into), you can precompute all the possible results at each id. That isn't such a huge load of work, since A and B don't change, only the carries. So you have 2 possible outputs : (c_out0, C) = add(A, B, 0) and (c_out1, C') = add(A, B, 1).
Now with all these results, we need to basically implement a carry lookahead unit.
For that, we need to figure out to functions of each sum's carry output P and G :
P a.k.a. all of the following definitions
Propagate
"if a carry comes in, then a carry will go out of this sum"
c_out1 && !c_out0
A + B == radix-1
G a.k.a. all of the following definitions
Generate
"whatever carry comes in, a carry will go out of this sum"
c_out1 && c_out0
c_out0
A + B >= radix
So in other terms, c_out = G or (P and c_in). So now we have a start of an algorithm that can tell us easily for each id the carry output as a function of its carry input directly :
At each id, compute C[id] = A[id]+B[id]+0
Get G[id] = C[id] > radix -1
Get P[id] = C[id] == radix-1
Logarithmic tree
Now we can finish in O(log(n)), even though treeish things are nasty on GPUs, but still shorter than waiting. Indeed, from 2 additions next to each other, we can get a group G and a group P :
For id and id+1 :
step = 2
if id % step == 0, do steps 6 through 10, otherwise, do nothing
group_P = P[id] and P[id+step/2]
group_G = (P[id+step/2] and G[id]) or G[id+step/2]
c_in[id+step/2] = G[id] or (P[id] and c_in[id])
step = step * 2
if step < n, go to 5
At the end (after repeating steps 5-10 for every level of your tree with less ids every time), everything will be expressed in terms of Ps and Gs which you computed, and c_in[0] which is 0. On the wikipedia page there are formulas for the grouping by 4 instead of 2, which will get you an answer in O(log_4(n)) instead of O(log_2(n)).
Hence the end of the algorithm :
At each id, get c_in[id]
return (C[id]+c_in[id]) % radix
Take advantage of hardware
What we really did in this last part, was mimic the circuitry of a carry-lookahead adder with logic. However, we already have additionners in the hardware that do similar things (by definition).
Let us replace our definitions of P and G based on radix by those based on 2 like the logic inside our hardware, mimicking a sum of 2 bits a and b at each stage : if P = a ^ b (xor), and G = a & b (logical and). In other words, a = P or G and b = G. So if we create a intP integer and a intG integer, where each bit is respectively the P and G we computed from each ids sum (limiting us to 64 sums), then the addition (intP | intG) + intG has the exact same carry propagation as our elaborate logical scheme.
The reduction to form these integers will still be a logarithmic operation I guess, but that was to be expected.
The interesting part, is that each bit of the sum is function of its carry input. Indeed, every bit of the sum is eventually function of 3 bits a+b+c_in % 2.
If at that bit P == 1, then a + b == 1, thus a+b+c_in % 2 == !c_in
Otherwise, a+b is either 0 or 2, and a+b+c_in % 2 == c_in
Thus we can trivially form the integer (or rather bit-array) int_cin = ((P|G)+G) ^ P with ^ being xor.
Thus we have an alternate ending to our algorithm, replacing steps 4 and later :
at each id, shift P and G by id : P = P << id and G = G << id
do an OR-reduction to get intG and intP which are the OR of all the P and G for id 0..63
Compute (once) int_cin = ((P|G)+G) ^ P
at each id, get `c_in = int_cin & (1 << id) ? 1 : 0;
return (C[id]+c_in) % radix
PS : Also, watch out for integer overflow in your arrays, if radix is big. If it isn't then the whole thing doesn't really make sense I guess...
PPS : in the alternate ending, if you have more than 64 items, characterize them by their P and G as if radix was 2^64, and re-run the same steps at a higher level (reduction, get c_in) and then get back to the lower level apply 7 with P+G+carry in from higher level

explicit MOD in C? [duplicate]

This question already has answers here:
How to code a modulo (%) operator in C/C++/Obj-C that handles negative numbers
(16 answers)
Closed 9 years ago.
Ok So I know and understand the difference between MOD and REM. I also am aware that C's % operation is a REM operation. I wanted to know, and could not find online, if there is some C library or function for an explicit MOD.
Specifically, I'd like (-1)%4 == 3 to be true. In C (-1)%4 = -1 since it is a remainder. And preferably I'd like to avoid using absolute values and even better would be to utilize some built in function that I can't seem to find.
Any advice will be much appreciated!
The best option I can think of is to compute:
((-1 % 4) + 4 ) % 4
Here you may replace -1 with any value and you will get MOD not REM.
The most common way to do what you expect is:
((a % b) + b ) % b
It works because (a % b) is a number in ]-b; b[ so (a % b) + b is positive (in ]0; 2 * b[) and adding b did not changed the mod.
Just do,
int mod(int a, int b)
{
int res = a % b;
return(res < 0 ? (res + b) : res);
}
Every negative res content after MOD operation is added to b to get modulus of a & b.

Bitwise C programming

Hey I have been having trouble with a C program. The program I have to write simulates the operation of a VAX computer. I have to take in 2 variables x and y to generate z.
within that there are two functions, the first
Sets Z to 1 where each bit position of y = 1
2nd sets z to 0 where each bit position of y = 1
I'm not asking for someone to do this for me, I just need an explanation on how this is carried out as I have a bare bones of the two functions that I need. I was thinking of something like this but I don't know if it's right at all.
#include<stdio.h>
int main()
{
int x1 = 1010;
int y1 = 0101;
bis(x1, y1);
bic(x1, y1);
}
/* BIT SET function that sets the result to 1 wherever y = 1 */
int bis (int x, int y)
{
int z = x & y;
int result = ?;
printf("BIT SET: \n\n", result);
return result;
}
/* BIT CLEAR function that sets result to 0 wherever y = 1 */
int bic(int x, int y)
{
int z = x & y;
int result = ?;
printf("BIT CLEAR:\n\n ", result);
return result;
}
Apologies for the poor naming conventions. Am I anyway on the right track for this program?
Let's look at bitset() first. I won't post C code, but we can solve this on paper as a start.
Say you have your integers with the following bit patterns: x = 1011 and y = 0101. (I'm changing your example numbers. And, incidentally, this is not how you would define two integers having these bit patterns, but right now we're focusing on the logic.)
If I am understanding correctly, when you call bitset(x, y), you want the answer, Z, to be 1111.
x = 1011
y = 0101
^ ^-------- Because these two bits have the value 1, then your answer also
has to set them to 1 while leaving the other bits in x alone.
Well, which bitwise operation will accomplish this? You have AND (&), OR (\), XOR (^), and COMPLEMENT (~).
In this case, you are ORing the two values. Looking at the following truth table:
x 1 0 1 1
y 0 1 0 1
-----------------
(x OR y) 1 1 1 1
Each bit in the last row is given by ORing that column in x and y. So (1 OR 0) = 1, (0 OR 1) = 1, (1 OR 0) = 1, (1 OR 1) = 1
So now you can write a C function bitset(x, y), ORs x and y, and returns the result as Z.
What bitwise operator - and you can do it in multiple steps with multiple operators - would you use to clear the bits?
x 1 0 1 1
y 0 1 0 1
-------------------------------------------
(SOME OPERATONS INVOLVING x and y) 1 0 1 0
What would those logical operators (from the list above) be? Think about the "and" and "complement" operators.
Good luck on your hw!
Bonus: A quick primer on expressing integers in C.
int x = 1337 creates an integer and gives it the value 1337. If you said x = 01337, x WILL NOT have the value "1337" like you might expect. By placing the 0 in front of the number, you're telling C that that number is in octal (base 8). The digits "1337", interpreted in base 8, is equivalent to decimal (base 10) 735. If you said x = 0x1337 then you are expressing the number in base 16, as a hexadecimal, equivalent to 4919 in base 10.
Nope... what you have there will and together two integers. One of which is 1010 (base10), and the other of which is 101 (base 8 - octal -> 65 base 10).
First you'll want to declare your constants as binary (by prefixing them with 0b).
Second, you'll want to out put them (for your instructor or TA) as a binary representation. Check out this question for more ideas

Explain this snippet which finds the maximum of two integers without using if-else or any other comparison operator?

Find the maximum of two numbers. You should not use if-else or any other comparison operator. I found this question on online bulletin board, so i thought i should ask in StackOverflow
EXAMPLE
Input: 5, 10
Output: 10
I found this solution, can someone help me understand these lines of code
int getMax(int a, int b) {
int c = a - b;
int k = (c >> 31) & 0x1;
int max = a - k * c;
return max;
}
int getMax(int a, int b) {
int c = a - b;
int k = (c >> 31) & 0x1;
int max = a - k * c;
return max;
}
Let's dissect this. This first line appears to be straightforward - it stores the difference of a and b. This value is negative if a < b and is nonnegative otherwise. But there's actually a bug here - if the difference of the numbers a and b is so big that it can't fit into an integer, this will lead to undefined behavior - oops! So let's assume that doesn't happen here.
In the next line, which is
int k = (c >> 31) & 0x1;
the idea is to check if the value of c is negative. In virtually all modern computers, numbers are stored in a format called two's complement in which the highest bit of the number is 0 if the number is positive and 1 if the number is negative. Moreover, most ints are 32 bits. (c >> 31) shifts the number down 31 bits, leaving the highest bit of the number in the spot for the lowest bit. The next step of taking this number and ANDing it with 1 (whose binary representation is 0 everywhere except the last bit) erases all the higher bits and just gives you the lowest bit. Since the lowest bit of c >> 31 is the highest bit of c, this reads the highest bit of c as either 0 or 1. Since the highest bit is 1 iff c is 1, this is a way of checking whether c is negative (1) or positive (0). Combining this reasoning with the above, k is 1 if a < b and is 0 otherwise.
The final step is to do this:
int max = a - k * c;
If a < b, then k == 1 and k * c = c = a - b, and so
a - k * c = a - (a - b) = a - a + b = b
Which is the correct max, since a < b. Otherwise, if a >= b, then k == 0 and
a - k * c = a - 0 = a
Which is also the correct max.
Here we go: (a + b) / 2 + |a - b| / 2
Use bitwise hacks
r = x ^ ((x ^ y) & -(x < y)); // max(x, y)
If you know that INT_MIN <= x - y <= INT_MAX, then you can use the following, which is faster because (x - y) only needs to be evaluated once.
r = x - ((x - y) & ((x - y) >> (sizeof(int) * CHAR_BIT - 1))); // max(x, y)
Source : Bit Twiddling Hacks by Sean Eron Anderson
(sqrt( a*a + b*b - 2*a*b ) + a + b) / 2
This is based on the same technique as mike.dld's solution, but it is less "obvious" here what I am doing. An "abs" operation looks like you are comparing the sign of something but I here am taking advantage of the fact that sqrt() will always return you the positive square root so I am squaring (a-b) writing it out in full then square-rooting it again, adding a+b and dividing by 2.
You will see it always works: eg the user's example of 10 and 5 you get sqrt(100 + 25 - 100) = 5 then add 10 and 5 gives you 20 and divide by 2 gives you 10.
If we use 9 and 11 as our numbers we would get (sqrt(121 + 81 - 198) + 11 + 9)/2 = (sqrt(4) + 20) / 2 = 22/2 = 11
The simplest answer is below.
#include <math.h>
int Max(int x, int y)
{
return (float)(x + y) / 2.0 + abs((float)(x - y) / 2);
}
int Min(int x, int y)
{
return (float)(x + y) / 2.0 - abs((float)(x - y) / 2);
}
int max(int i, int j) {
int m = ((i-j) >> 31);
return (m & j) + ((~m) & i);
}
This solution avoids multiplication.
m will either be 0x00000000 or 0xffffffff
Using the shifting idea to extract the sign as posted by others, here's another way:
max (a, b) = new[] { a, b } [((a - b) >> 31) & 1]
This pushes the two numbers into an array with the maximum number given by the array-element whose index is sign bit of the difference between the two numbers.
Do note that:
The difference (a - b) may overflow.
If the numbers are unsigned and the >> operator refers to a logical right-shift, the & 1 is unnecessary.
Here's how I think I'd do the job. It's not as readable as you might like, but when you start with "how do I do X without using the obvious way of doing X, you have to kind of expect that.
In theory, this gives up some portability too, but you'd have to find a pretty unusual system to see a problem.
#define BITS (CHAR_BIT * sizeof(int) - 1)
int findmax(int a, int b) {
int rets[] = {a, b};
return rets[unsigned(a-b)>>BITS];
}
This does have some advantages over the one shown in the question. First of all, it calculates the correct size of shift, instead of being hard-coded for 32-bit ints. Second, with most compilers we can expect all the multiplication to happen at compile time, so all that's left at run time is trivial bit manipulation (subtract and shift) followed by a load and return. In short, this is almost certain to be pretty fast, even on the smallest microcontroller, where the original used multiplication that had to happen at run-time, so while it's probably pretty fast on a desktop machine, it'll often be quite a bit slower on a small microcontroller.
Here's what those lines are doing:
c is a-b. if c is negative, a<b.
k is 32nd bit of c which is the sign bit of c (assuming 32 bit integers. If done on a platform with 64 bit integers, this code will not work). It's shifted 31 bits to the right to remove the rightmost 31 bits leaving the sign bit in the right most place and then anding it with 1 to remove all the bits to the left (which will be filled with 1s if c is negative). So k will be 1 if c is negative and 0 if c is positive.
Then max = a - k * c. If c is 0, this means a>=b, so max is a - 0 * c = a. If c is 1, this means that a<b and then a - 1 * c = a - (a - b) = a - a + b = b.
In the overall, it's just using the sign bit of the difference to avoid using greater than or less than operations. It's honestly a little silly to say that this code doesn't use a comparison. c is the result of comparing a and b. The code just doesn't use a comparison operator. You could do a similar thing in many assembly codes by just subtracting the numbers and then jumping based on the values set in the status register.
I should also add that all of these solutions are assuming that the two numbers are integers. If they are floats, doubles, or something more complicated (BigInts, Rational numbers, etc.) then you really have to use a comparison operator. Bit-tricks will not generally do for those.
getMax() Function Without Any Logical Operation-
int getMax(int a, int b){
return (a+b+((a-b)>>sizeof(int)*8-1|1)*(a-b))/2;
}
Explanation:
Lets smash the 'max' into pieces,
max
= ( max + max ) / 2
= ( max + (min+differenceOfMaxMin) ) / 2
= ( max + min + differenceOfMaxMin ) / 2
= ( max + min + | max - min | ) ) / 2
So the function should look like this-
getMax(a, b)
= ( a + b + absolute(a - b) ) / 2
Now,
absolute(x)
= x [if 'x' is positive] or -x [if 'x' is negative]
= x * ( 1 [if 'x' is positive] or -1 [if 'x' is negative] )
In integer positive number the first bit (sign bit) is- 0; in negative it is- 1. By shifting bits to the right (>>) the first bit can be captured.
During right shift the empty space is filled by the sign bit. So 01110001 >> 2 = 00011100, while 10110001 >> 2 = 11101100.
As a result, for 8 bit number shifting 7 bit will either produce- 1 1 1 1 1 1 1 [0 or 1] for negative, or 0 0 0 0 0 0 0 [0 or 1] for positive.
Now, if OR operation is performed with 00000001 (= 1), negative number yields- 11111111 (= -1), and positive- 00000001 (= 1).
So,
absolute(x)
= x * ( 1 [if 'x' is positive] or -1 [if 'x' is negative] )
= x * ( ( x >> (numberOfBitsInInteger-1) ) | 1 )
= x * ( ( x >> ((numberOfBytesInInteger*bitsInOneByte) - 1) ) | 1 )
= x * ( ( x >> ((sizeOf(int)*8) - 1) ) | 1 )
Finally,
getMax(a, b)
= ( a + b + absolute(a - b) ) / 2
= ( a + b + ((a-b) * ( ( (a-b) >> ((sizeOf(int)*8) - 1) ) | 1 )) ) / 2
Another way-
int getMax(int a, int b){
int i[] = {a, b};
return i[( (i[0]-i[1]) >> (sizeof(int)*8 - 1) ) & 1 ];
}
static int mymax(int a, int b)
{
int[] arr;
arr = new int[3];
arr[0] = b;
arr[1] = a;
arr[2] = a;
return arr[Math.Sign(a - b) + 1];
}
If b > a then (a-b) will be negative, sign will return -1, by adding 1 we get index 0 which is b, if b=a then a-b will be 0, +1 will give 1 index so it does not matter if we are returning a or b, when a > b then a-b will be positive and sign will return 1, adding 1 we get index 2 where a is stored.
#include<stdio.h>
main()
{
int num1,num2,diff;
printf("Enter number 1 : ");
scanf("%d",&num1);
printf("Enter number 2 : ");
scanf("%d",&num2);
diff=num1-num2;
num1=abs(diff);
num2=num1+diff;
if(num1==num2)
printf("Both number are equal\n");
else if(num2==0)
printf("Num2 > Num1\n");
else
printf("Num1 > Num2\n");
}
The code which I am providing is for finding maximum between two numbers, the numbers can be of any data type(integer, floating). If the input numbers are equal then the function returns the number.
double findmax(double a, double b)
{
//find the difference of the two numbers
double diff=a-b;
double temp_diff=diff;
int int_diff=temp_diff;
/*
For the floating point numbers the difference contains decimal
values (for example 0.0009, 2.63 etc.) if the left side of '.' contains 0 then we need
to get a non-zero number on the left side of '.'
*/
while ( (!(int_diff|0)) && ((temp_diff-int_diff)||(0.0)) )
{
temp_diff = temp_diff * 10;
int_diff = temp_diff;
}
/*
shift the sign bit of variable 'int_diff' to the LSB position and find if it is
1(difference is -ve) or 0(difference is +ve) , then multiply it with the difference of
the two numbers (variable 'diff') then subtract it with the variable a.
*/
return a- (diff * ( int_diff >> (sizeof(int) * 8 - 1 ) & 1 ));
}
Description
The first thing the function takes the arguments as double and has return type as double. The reason for this is that to create a single function which can find maximum for all types. When integer type numbers are provided or one is an integer and other is the floating point then also due to implicit conversion the function can be used to find the max for integers also.
The basic logic is simple, let's say we have two numbers a & b if a-b>0(i.e. the difference is positive) then a is maximum else if a-b==0 then both are equal and if a-b<0(i.e. diff is -ve) b is maximum.
The sign bit is saved as the Most Significant Bit(MSB) in the memory. If MSB is 1 and vice-versa. To check if MSB is 1 or 0 we shift the MSB to the LSB position and Bitwise & with 1, if the result is 1 then the number is -ve else no. is +ve. This result is obtained by the statement:
int_diff >> (sizeof(int) * 8 - 1 ) & 1
Here to get the sign bit from the MSB to LSB we right shift it to k-1 bits(where k is the number of bits needed to save an integer number in the memory which depends on the type of system). Here k= sizeof(int) * 8 as sizeof() gives the number of bytes needed to save an integer to get no. of bits, we multiply it with 8. After the right shift, we apply the bitwise & with 1 to get the result.
Now after obtaining the result(let us assume it as r) as 1(for -ve diff) and 0(for +ve diff) we multiply the result with the difference of the two numbers, the logic is given as follows:
if a>b then a-b>0 i.e., is +ve so the result is 0(i.e., r=0). So a-(a-b)*r => a-(a-b)*0, which gives 'a' as the maximum.
if a < b then a-b<0 i.e., is -ve so the result is 1(i.e., r=1). So a-(a-b)*r => a-(a-b)*1 => a-a+b =>b , which gives 'b' as the maximum.
Now there are two remaining points 1. the use of while loop and 2. why I have used the variable 'int_diff' as an integer. To answer these properly we have to understand some points:
Floating type values cannot be used as an operand for the bitwise operators.
Due to above reason, we need to get the value in an integer value to get the sign of difference by using bitwise operators. These two points describe the need of variable 'int_diff' as integer type.
Now let's say we find the difference in variable 'diff' now there are 3 possibilities for the values of 'diff' irrespective of the sign of these values. (a). |diff|>=1 , (b). 0<|diff|<1 , (c). |diff|==0.
When we assign a double value to integer variable the decimal part is lost.
For case(a) the value of 'int_diff' >0 (i.e.,1,2,...). For other two cases int_diff=0.
The condition (temp_diff-int_diff)||0.0 checks if diff==0 so both numbers are equal.
If diff!=0 then we check if int_diff|0 is true i.e., case(b) is true
In the while loop, we try to get the value of int_diff as non-zero so that the value of int_diff also gets the sign of diff.
Here are a couple of bit-twiddling methods to get the max of two integral values:
Method 1
int max1(int a, int b) {
static const size_t SIGN_BIT_SHIFT = sizeof(a) * 8 - 1;
int mask = (a - b) >> SIGN_BIT_SHIFT;
return (a & ~mask) | (b & mask);
}
Explanation:
(a - b) >> SIGN_BIT_SHIFT - If a > b then a - b is positive, thus the sign bit is 0, and the mask is 0x00.00. Otherwise, a < b so a - b is negative, the sign bit is 1 and after shifting, we get a mask of 0xFF..FF
(a & ~mask) - If the mask is 0xFF..FF, then ~mask is 0x00..00 and then this value is 0. Otherwise, ~mask is 0xFF..FF and the value is a
(b & mask) - If the mask is 0xFF..FF, then this value is b. Otherwise, mask is 0x00..00 and the value is 0.
Finally:
If a >= b then a - b is positive, we get max = a | 0 = a
If a < b then a - b is negative, we get max = 0 | b = b
Method 2
int max2(int a, int b) {
static const size_t SIGN_BIT_SHIFT = sizeof(a) * 8 - 1;
int mask = (a - b) >> SIGN_BIT_SHIFT;
return a ^ ((a ^ b) & mask);
}
Explanation:
Mask explanation is the same as for Method 1. If a > b the mask is 0x00..00, otherwise the mask is 0xFF..FF
If the mask is 0x00..00, then (a ^ b) & mask is 0x00..00
If the mask is 0xFF..FF, then (a ^ b) & mask is a ^ b
Finally:
If a >= b, we get a ^ 0x00..00 = a
If a < b, we get a ^ a ^ b = b
//In C# you can use math library to perform min or max function
using System;
class NumberComparator
{
static void Main()
{
Console.Write(" write the first number to compare: ");
double first_Number = double.Parse(Console.ReadLine());
Console.Write(" write the second number to compare: ");
double second_Number = double.Parse(Console.ReadLine());
double compare_Numbers = Math.Max(first_Number, second_Number);
Console.Write("{0} is greater",compare_Numbers);
}
}
No logical operators, no libs (JS)
function (x, y) {
let z = (x - y) ** 2;
z = z ** .5;
return (x + y + z) / 2
}
The logic described in a problem can be explained as if 1st number is smaller then 0 will be subtracted else difference will be subtracted from 1st number to get 2nd number.
I found one more mathematical solution which I think is bit simpler to understand this concept.
Considering a and b as given numbers
c=|a/b|+1;
d=(c-1)/b;
smallest number= a - d*(a-b);
Again,The idea is to find k which is wither 0 or 1 and multiply it with difference of two numbers.And finally this number should be subtracted from 1st number to yield the smaller of the two numbers.
P.S. this solution will fail in case 2nd number is zero
There is one way
public static int Min(int a, int b)
{
int dif = (int)(((uint)(a - b)) >> 31);
return a * dif + b * (1 - dif);
}
and one
return (a>=b)?b:a;
int a=151;
int b=121;
int k=Math.abs(a-b);
int j= a+b;
double k1=(double)(k);
double j1= (double) (j);
double c=Math.ceil(k1/2) + Math.floor(j1/2);
int c1= (int) (c);
System.out.println(" Max value = " + c1);
Guess we can just multiply the numbers with their bitwise comparisons eg:
int max=(a>b)*a+(a<=b)*b;

Resources