Clarification of a bitwise statement

Clarification of a bitwise statement - c

What does the following condition if(a & (a-1)/2) mean in C?
#include <stdio.h>
int main()
{
int a;
scanf("%d", &a);
if(a & (a-1)/2)
{
printf("Yes\n");
}
else{
printf("No\n");
}
return 0;
}
I didn't get the meaning of / operator here. what does the division operator mean in the condition ?

The division operator / simply divides the left hand side by the right hand side. In this case the right hand side is 2, so it's shifting a-1 right by one bit (an arithmetic shift, which sign-extends the integer).
The whole expression therefore calculates - for a>=0 - whether a and ((a-1)>>1) both have any of the same bits set (in which case & will produce a non-zero value). For a<0 see the comment at the end.
Here's some test code:
#include <stdio.h>
#include <stdlib.h>
int
main (int argc, char **argv)
{
int i;
for (i = 0; i < 256; i++)
{
printf ("%3d %s\n", i, (i & (i - 1) / 2) ? "yes" : "no");
}
exit (0);
}
which gives
0 no
1 no
2 no
3 yes
4 no
5 no
6 yes
7 yes
8 no
9 no
10 no
11 yes
12 yes
13 yes
14 yes
15 yes
16 no
17 no
18 no
19 yes
20 no
21 no
22 yes
23 yes
24 yes
25 yes
26 yes
27 yes
28 yes
29 yes
30 yes
31 yes
32 no
33 no
34 no
... abbreviated for clarity ...
This detects numbers that either:
have two or more adjacent binary 1s or;
are negative, but not MININT.
Here's how:
If a>=0, then (a-1)/2 == a/2 (as rounding is always towards zero). So this is equivalent (for a>=0) to a & (a>>1) which will be one if the bit to the right of any bit in a is set, i.e. if there are two or more adjacent ones.
If a<0 and a!=MININT, (and thus have 1 as its MSB), then (a-1) must also be negative and less than -1, so (a-1)/2 must also be negative (and thus have 1 as its MSB), in which case a & ((a-1)/2)) is non-zero because its MSB must also be 1.
If a==MININT, then a-1 is MAXINT, so (a-1)/2 has no bits that are both 1.

For non-negative values, division by 2 is equivalent to shifting it 1 bit to the right. For negative values it is not equivalent, but I suspect that the authors of the code did not intend it to work with negative values.
(Note that this non-equivalence has nothing to do with exotic non-2's-complement architectures or arithmetic sifts. In C and C++ language signed division is required to round towards 0, which makes it impossible to implement signed division by a simple shift for negative values on 2's complement architectures. A post-shift correction is required.)
Now, as for what the whole a & (a - 1) / 2 expression is doing... Let's restrict consideration for non-negative values only.
Subtracting 1 is equivalent to inverting all trailing 0 bits in the number and then inverting the last (least significant) 1 bit. Division by 2 is equivalent to shift right by 1 bit.
In the context of & operation, the whole thing is equivalent to: kill the last 1 bit in a, shift the whole thing to the right 1 bit and & it with the original value.
I don't see the point of killing the last 1 bit in this case. It looks like for positive values the whole thing is equivalent to a & (a / 2) (i.e. to a & (a >> 1))), which simply detects if there are two adjacent 1 bits in the original a.

It substracts 1 from a then divides it by 2 and after that it makes a bit-and operation.

In the condition / operator simply does division. The only thing that matters is whether the result in condition is 0 (false) or not 0 (true).
As fot what the program actually does, it looks binary math related so you should ask someone who knows more about that.

In the expression a & (a - 1) / 2 there is a bitwise and operator, and a division operator. It is highly unusual that these are used together. Therefore, I don't know what the precedence is (does it divide a-1 by two and does a bitwise and with a, or does it do a bitwise and of a and a-1, then divide the result by two). I actually don't care what the precedence is, because I don't know what precedence the programmer writing the code assumed it would be.
If you see code like this, you need to find out what it does, you need to find out what it was intended to do, if both are the same you add parentheses to make the intent clear, and if they are not the same you have found a bug and figure out how it affects the program and what to do about it.

Related

How does printf work in this program to convert a decimal into binary?

I came across this program to convert decimals numbers into their binary equivalent in C. I do not understand how the printf statement works in this program.
int main()
{
int N;
scanf("%d", &N); // Enter decimal equivalent here
for( int j = floor(log2(N)); j >= 0; j-- ){
printf("%d", (N >> j) & 1);
}
}

Let's take an example to get through this problem. Suppose you enter N=65. Its binary representation is - 1000001. When your given code goes through it, j will start at floor(log2(65)), which is 6. So, the given loop will run 7 times, which means 7 numbers will be printed out (which fits the fact that 65's binary representation has 7 digits).
Inside the loop - The number is shifted by j bits each time to the right. When 1000001 is shifted to the right by 6 bits, it becomes 0000001. If shifted by 5, it is 0000010, and so on. It goes down to a shift by 0 bits which is the original number. When each of these shifted numbers are &ed with 1, only the least significant bit (the right most bit) remains. And this digit can either be a 0 or a 1.
If you would have noticed each right shift divides the number by 2. So when 1000001 is shifted by 1 to make 0100000, it is the binary representation of 32, which indeed is 65/2 in C. After all, this is the way someone manually calculates the binary representation of a number. Each division by 2 gives you a digit (starting from the end) of the representation, and that digit is either a 0 or a 1. The & helps in getting the 0 or 1.
In the end, 65 becomes 1000001.

What it is doing is:
Finding the largest number j such that 2^j <= N
Starting at the jth bit (counting from the right) and moving to the right ...
chopping off all of the bits to the right of the current chosen bit
chopping off all of the bits to the left of current chosen bit
printing the value of the single remaining bit

The code actually has undefined behavior because you did not include <stdio.h>, nor <math.h>. Without a proper declaration of floor() and log2(), the compiler infers the prototype from the calling context and gets int log2(int) and int floor(int), which is incompatible with the actual definition in the C library.
With the proper includes, log2(N) gives the logarithm of N in base 2. Its integral part is the power of 2 <= N, hence 1 less than the number of bits in N.
Note however that this method does not work for negative values of N and only works by coincidence for 0 as the conversion of NaN to int gives 0, hence a single binary digit.

Is this how the + operator is implemented in C?

When understanding how primitive operators such as +, -, * and / are implemented in C, I found the following snippet from an interesting answer.
// replaces the + operator
int add(int x, int y) {
while(x) {
int t = (x & y) <<1;
y ^= x;
x = t;
}
return y;
}
It seems that this function demonstrates how + actually works in the background. However, it's too confusing for me to understand it. I believed that such operations are done using assembly directives generated by the compiler for a long time!
Is the + operator implemented as the code posted on MOST implementations? Does this take advantage of two's complement or other implementation-dependent features?

To be pedantic, the C specification does not specify how addition is implemented.
But to be realistic, the + operator on integer types smaller than or equal to the word size of your CPU get translated directly into an addition instruction for the CPU, and larger integer types get translated into multiple addition instructions with some extra bits to handle overflow.
The CPU internally uses logic circuits to implement the addition, and does not use loops, bitshifts, or anything that has a close resemblance to how C works.

When you add two bits, following is the result: (truth table)
a | b | sum (a^b) | carry bit (a&b) (goes to next)
--+---+-----------+--------------------------------
0 | 0 | 0 | 0
0 | 1 | 1 | 0
1 | 0 | 1 | 0
1 | 1 | 0 | 1
So if you do bitwise xor, you can get the sum without carry.
And if you do bitwise and you can get the carry bits.
Extending this observation for multibit numbers a and b
a+b = sum_without_carry(a, b) + carry_bits(a, b) shifted by 1 bit left
= a^b + ((a&b) << 1)
Once b is 0:
a+0 = a
So algorithm boils down to:
Add(a, b)
if b == 0
return a;
else
carry_bits = a & b;
sum_bits = a ^ b;
return Add(sum_bits, carry_bits << 1);
If you get rid of recursion and convert it to a loop
Add(a, b)
while(b != 0) {
carry_bits = a & b;
sum_bits = a ^ b;
a = sum_bits;
b = carrry_bits << 1; // In next loop, add carry bits to a
}
return a;
With above algorithm in mind explanation from code should be simpler:
int t = (x & y) << 1;
Carry bits. Carry bit is 1 if 1 bit to the right in both operands is 1.
y ^= x; // x is used now
Addition without carry (Carry bits ignored)
x = t;
Reuse x to set it to carry
while(x)
Repeat while there are more carry bits
A recursive implementation (easier to understand) would be:
int add(int x, int y) {
return (y == 0) ? x : add(x ^ y, (x&y) << 1);
}
Seems that this function demonstrates how + actually works in the
background
No. Usually (almost always) integer addition translates to machine instruction add. This just demonstrate an alternate implementation using bitwise xor and and.

Seems that this function demonstrates how + actually works in the background
No. This is translated to the native add machine instruction, which is actually using the hardware adder, in the ALU.
If you're wondering how does the computer add, here is a basic adder.
Everything in the computer is done using logic gates, which are mostly made of transistors. The full adder has half-adders in it.
For a basic tutorial on logic gates, and adders, see this. The video is extremely helpful, though long.
In that video, a basic half-adder is shown. If you want a brief description, this is it:
The half adder add's two bits given. The possible combinations are:
Add 0 and 0 = 0
Add 1 and 0 = 1
Add 1 and 1 = 10 (binary)
So now how does the half adder work? Well, it is made up of three logic gates, the and, xor and the nand. The nand gives a positive current if both the inputs are negative, so that means this solves the case of 0 and 0. The xor gives a positive output one of the input is positive, and the other negative, so that means that it solves the problem of 1 and 0. The and gives a positive output only if both the inputs are positive, so that solves the problem of 1 and 1. So basically, we have now got our half-adder. But we still can only add bits.
Now we make our full-adder. A full adder consists of calling the half-adder again and again. Now this has a carry. When we add 1 and 1, we get a carry 1. So what the full-adder does is, it takes the carry from the half-adder, stores it, and passes it as another argument to the half-adder.
If you're confused how can you pass the carry, you basically first add the bits using the half-adder, and then add the sum and the carry. So now you've added the carry, with the two bits. So you do this again and again, till the bits you have to add are over, and then you get your result.
Surprised? This is how it actually happens. It looks like a long process, but the computer does it in fractions of a nanosecond, or to be more specific, in half a clock cycle. Sometimes it is performed even in a single clock cycle. Basically, the computer has the ALU (a major part of the CPU), memory, buses, etc..
If you want to learn computer hardware, from logic gates, memory and the ALU, and simulate a computer, you can see this course, from which I learnt all this: Build a Modern Computer from First Principles
It's free if you do not want an e-certificate. The part two of the course is coming up in spring this year

C uses an abstract machine to describe what C code does. So how it works is not specified. There are C "compilers" that actually compile C into a scripting language, for example.
But, in most C implementations, + between two integers smaller than the machine integer size will be translated into an assembly instruction (after many steps). The assembly instruction will be translated into machine code and embedded within your executable. Assembly is a language "one step removed" from machine code, intended to be easier to read than a bunch of packed binary.
That machine code (after many steps) is then interpreted by the target hardware platform, where it is interpreted by the instruction decoder on the CPU. This instruction decoder takes the instruction, and translates it into signals to send along "control lines". These signals route data from registers and memory through the CPU, where the values are added together often in an arithmetic logic unit.
The arithmetic logic unit might have separate adders and multipliers, or might mix them together.
The arithmetic logic unit has a bunch of transistors that perform the addition operation, then produce the output. Said output is routed via the signals generated from the instruction decoder, and stored in memory or registers.
The layout of said transistors in both the arithmetic logic unit and instruction decoder (as well as parts I have glossed over) is etched into the chip at the plant. The etching pattern is often produced by compiling a hardware description language, which takes an abstraction of what is connected to what and how they operate and generates transistors and interconnect lines.
The hardware description language can contain shifts and loops that don't describe things happening in time (like one after another) but rather in space -- it describes the connections between different parts of hardware. Said code may look very vaguely like the code you posted above.
The above glosses over many parts and layers and contains inaccuracies. This is both from my own incompetence (I have written both hardware and compilers, but am an expert in neither) and because full details would take a career or two, and not a SO post.
Here is a SO post about an 8-bit adder. Here is a non-SO post, where you'll note some of the adders just use operator+ in the HDL! (The HDL itself understands + and generates the lower level adder code for you).

Almost any modern processor that can run compiled C code will have builtin support for integer addition. The code you posted is a clever way to perform integer addition without executing an integer add opcode, but it is not how integer addition is normally performed. In fact, the function linkage probably uses some form of integer addition to adjust the stack pointer.
The code you posted relies on the observation that when adding x and y, you can decompose it into the bits they have in common and the bits that are unique to one of x or y.
The expression x & y (bitwise AND) gives the bits common to x and y. The expression x ^ y (bitwise exclusive OR) gives the bits that are unique to one of x or y.
The sum x + y can be rewritten as the sum of two times the bits they have in common (since both x and y contribute those bits) plus the bits that are unique to x or y.
(x & y) << 1 is twice the bits they have in common (the left shift by 1 effectively multiplies by two).
x ^ y is the bits that are unique to one of x or y.
So if we replace x by the first value and y by the second, the sum should be unchanged. You can think of the first value as the carries of the bitwise additions, and the second as the low-order bit of the bitwise additions.
This process continues until x is zero, at which point y holds the sum.

The code that you found tries to explain how very primitive computer hardware might implement an "add" instruction. I say "might" because I can guarantee that this method isn't used by any CPU, and I'll explain why.
In normal life, you use decimal numbers and you have learned how to add them: To add two numbers, you add the lowest two digits. If the result is less than 10, you write down the result and proceed to the next digit position. If the result is 10 or more, you write down the result minus 10, proceed to the next digit, buy you remember to add 1 more. For example: 23 + 37, you add 3+7 = 10, you write down 0 and remember to add 1 more for the next position. At the 10s position, you add (2+3) + 1 = 6 and write that down. Result is 60.
You can do the exact same thing with binary numbers. The difference is that the only digits are 0 and 1, so the only possible sums are 0, 1, 2. For a 32 bit number, you would handle one digit position after the other. And that is how really primitive computer hardware would do it.
This code works differently. You know the sum of two binary digits is 2 if both digits are 1. So if both digits are 1 then you would add 1 more at the next binary position and write down 0. That's what the calculation of t does: It finds all places where both binary digits are 1 (that's the &) and moves them to the next digit position (<< 1). Then it does the addition: 0+0 = 0, 0+1 = 1, 1+0 = 1, 1+1 is 2, but we write down 0. That's what the excludive or operator does.
But all the 1's that you had to handle in the next digit position haven't been handled. They still need to be added. That's why the code does a loop: In the next iteration, all the extra 1's are added.
Why does no processor do it that way? Because it's a loop, and processors don't like loops, and it is slow. It's slow, because in the worst case, 32 iterations are needed: If you add 1 to the number 0xffffffff (32 1-bits), then the first iteration clears bit 0 of y and sets x to 2. The second iteration clears bit 1 of y and sets x to 4. And so on. It takes 32 iterations to get the result. However, each iteration has to process all bits of x and y, which takes a lot of hardware.
A primitive processor would do things just as quick in the way you do decimal arithmetic, from the lowest position to the highest. It also takes 32 steps, but each step processes only two bits plus one value from the previous bit position, so it is much easier to implement. And even in a primitive computer, one can afford to do this without having to implement loops.
A modern, fast and complex CPU will use a "conditional sum adder". Especially if the number of bits is high, for example a 64 bit adder, it saves a lot of time.
A 64 bit adder consists of two parts: First, a 32 bit adder for the lowest 32 bit. That 32 bit adder produces a sum, and a "carry" (an indicator that a 1 must be added to the next bit position). Second, two 32 bit adders for the higher 32 bits: One adds x + y, the other adds x + y + 1. All three adders work in parallel. Then when the first adder has produced its carry, the CPU just picks which one of the two results x + y or x + y + 1 is the correct one, and you have the complete result. So a 64 bit adder only takes a tiny bit longer than a 32 bit adder, not twice as long.
The 32 bit adder parts are again implemented as conditional sum adders, using multiple 16 bit adders, and the 16 bit adders are conditional sum adders, and so on.

My question is: Is the + operator implemented as the code posted on MOST implementations?
Let's answer the actual question. All operators are implemented by the compiler as some internal data structure that eventually gets translated into code after some transformations. You can't say what code will be generated by a single addition because almost no real world compiler generates code for individual statements.
The compiler is free to generate any code as long as it behaves as if the actual operations were performed according to the standard. But what actually happens can be something completely different.
A simple example:
static int
foo(int a, int b)
{
return a + b;
}
[...]
int a = foo(1, 17);
int b = foo(x, x);
some_other_function(a, b);
There's no need to generate any addition instructions here. It's perfectly legal for the compiler to translate this into:
some_other_function(18, x * 2);
Or maybe the compiler notices that you call the function foo a few times in a row and that it is a simple arithmetic and it will generate vector instructions for it. Or that the result of the addition is used for array indexing later and the lea instruction will be used.
You simply can't talk about how an operator is implemented because it is almost never used alone.

In case a breakdown of the code helps anyone else, take the example x=2, y=6:
x isn't zero, so commence adding to y:
while(2) {
x & y = 2 because
x: 0 0 1 0 //2
y: 0 1 1 0 //6
x&y: 0 0 1 0 //2
2 <<1 = 4 because << 1 shifts all bits to the left:
x&y: 0 0 1 0 //2
(x&y) <<1: 0 1 0 0 //4
In summary, stash that result, 4, in t with
int t = (x & y) <<1;
Now apply the bitwise XOR y^=x:
x: 0 0 1 0 //2
y: 0 1 1 0 //6
y^=x: 0 1 0 0 //4
So x=2, y=4. Finally, sum t+y by resetting x=t and going back to the beginning of the while loop:
x = t;
When t=0 (or, at the beginning of the loop, x=0), finish with
return y;

Just out of interest, on the Atmega328P processor, with the avr-g++ compiler, the following code implements adding one by subtracting -1 :
volatile char x;
int main ()
{
x = x + 1;
}
Generated code:
00000090 <main>:
volatile char x;
int main ()
{
x = x + 1;
90: 80 91 00 01 lds r24, 0x0100
94: 8f 5f subi r24, 0xFF ; 255
96: 80 93 00 01 sts 0x0100, r24
}
9a: 80 e0 ldi r24, 0x00 ; 0
9c: 90 e0 ldi r25, 0x00 ; 0
9e: 08 95 ret
Notice in particular that the add is done by the subi instruction (subtract constant from register) where 0xFF is effectively -1 in this case.
Also of interest is that this particular processor does not have a addi instruction, which implies that the designers thought that doing a subtract of the complement would be adequately handled by the compiler-writers.
Does this take advantage of two's complement or other implementation-dependent features?
It would probably be fair to say that compiler-writers would attempt to implement the wanted effect (adding one number to another) in the most efficient way possible for that particularly architecture. If that requires subtracting the complement, so be it.

How to use bit manipulation to find 5th bit, and to return number of 1 bits in an integer

Assume Z is an unsigned integer. Using ~, <<, >>, &, | , +, and - provide statements which return the desired result.
I am allowed to introduce new binary values if needed.
I have these problems:
1.Extract the 5th bit from the left Z.
For this I was thinking about doing something like
x x x x x x x x
& 0 0 0 0 1 0 0 0
___________________
0 0 0 0 1 0 0 0
Does this make sense for extracting the fifth bit? I am not totally sure how I would make this work by using just Z when I do not know its values. (I am relatively new to all of this). Would this type of idea work though?
2.Return the number of 1 bits in Z
Here I kind of have no idea how to work this out. What I really need to know is how to work on just Z with the operators, but I m not sure exactly how to.
Like I said I am new to this, so any help is appreciated.

Problem 1
You’re right on the money. I’d do an & and a >> so that you get either a nice 0 or 1.
result = (z & 0x08) >> 3;
However, this may not be strictly necessary. For example, if you’re trying to check whether the bit is set as part of an if conditional, you can exploit C’s definition of anything nonzero as true.
if (z & 0x08)
do_stuff();
Problem 2
There are a whole variety of ways to do this. According to that page, the following methodology dates from 1960, though it wasn’t published in C until 1988.
for (result = 0; z; result++)
z &= z - 1;
Exactly why this works might not be obvious at first, but if you work through a few examples, you’ll quickly see why it does.
It’s worth noting that this operation – determining the number of 1 bits in a number – is sufficiently important to have a name (population count or Hamming weight) and, on recent Intel and AMD processors, a dedicated instruction. If you’re using GCC, you can use the __builtin_popcount intrinsic.

Problem 1 looks right, except you should finish it by shifting the result right by 4 to get that bit after the mask.
To implement the mask, you need to know what integer is represented by a single 5th bit. That number is incidentally 2^5 = 32. So you can just AND z with 32 and shift it right by 4.
Problem 2:
int answer = 0;
while (z != 0){ //stop when there are no more 1 bits in z
//the following masks the lowest bit in z and adds it into answer
//if z ends with a 0, nothing is added, otherwise 1 is added
answer += (z & 1);
//this shifts z right by 1 to get the next higher bit
z >>= 1;
}
return answer;

To find out the value of the fifth bit, you don't care about the bottom bits so you can get rid of them:
unsigned int answer = z >> 4;
The fifth bit becomes the bottom bit, so you can strip it off with a bitwise-AND:
answer = answer & 1;
To find the number of 1-bits in a number you can apply stakSmashr's solution. You could optimise this further if you know you need to count the number of bits in a lot of integers - precompute the number of bits in every possible 8-bit number and store it in a table. There will only be 256 entries in the table so it won't use much memory. Then, you can loop over your data one byte at a time and find the answer from the table. This lookup will be quicker than looping again over each bit.

What is '^' operator used in C other than to check if two numbers are equal?

What are the purposes of ^ operator used in C other than to check if two numbers are equal? Also, why is it used for equality in stead of == in the first place?

The ^ operator is the bitwise XOR operator. Although I have never seen it's use for checking equaltity.
x ^ y will evaluate to 0 exatly when x == y.

The XOR operator is used in cryptography (en- and decrypting text using a pseudo-random bit stream), random number generators (like the Mersenne Twister) and in inline-swap and other bit twiddling hacks:
int a = ...;
int b = ...;
// swap a and b
a ^= b;
b ^= a;
a ^= b;
(useful if you don't have space for another variable like on CPUs with few registers).

^ is the Bitwise XOR.
A bitwise operation operates on one or more bit patterns or binary numerals at the level of their individual bits. It is a fast, primitive action directly supported by the processor, and is used to manipulate values for comparisons and calculations. (source: Bitwise Operation)
The XOR Operator has two operands and it returns 1 if only one of the operands is set to 1.
So a Bitwise XOR Operation of two numbers is the resulting of these bit by bit operations.
For exemple:
00000110 // A = 6
00001010 // B = 10
00001100 // A ^ B = 12

^ is a bit-wise XOR operator in C. It can be used in bits toggling and to swap two numbers;
x^=y, y^=x, x^=y;
and can be used to find max of two numbers;
int max(int x, int y)
{
return x ^ ((x ^ y) & -(x < y));
}

It can be used to selectively flip bits. (e.g. to toggle the value of bit #3 in an integer, you can say x = x ^ (1<<3) or, more compactly, x = x^0x08 or even x^=8. (although now that I look at it, the last form looks like some sort of obscene emoticon and should probably be avoided. :)
It should never be used in a test for equality (in C), except in tricky code meant to test undergrads' understanding of the ^ operator. (In assembly, there may be speed advantages on some architectures.)

It it's the exclusive or operator. It will do bitwise exclusive or on the two arguments. If the numbers are equal, this will result in 0, while if they're not equal, the bits that differed between the two arguments will be set.
You generally wouldn't use it inserted of ==, you would use it only when you need to know which bits are different.

Two real usage examples from an embedded system I worked on:
In a status message generating function, where one of the words was supposed to be a passthrough of an external device's status word. There was an disconnect between the device behavior and the message spec - one thought bit0 meant 'error' while the other thought it meant 'OK'.
statuswords[3] = devicestatus ^ 1; //invert B0
The 16-bit target processor was terribly slow to branch, so in an inner loop if (sign(A)!=sign(B) B=0; was coded as:
B*=~(A^B)>>15;
which took 4 cycles rather than 8, and does the same thing: sets B to 0 iff the sign bits are different.

in many general cases we might use '^' as a replacement for'==' but that doesn't exactly give the result for being equal or not.Instead - it checks the given variables bit by bit and sets a result for each bit individually and finally displays a result summed up with the resulting bits as a bulk.

Question about round_up macro

#define ROUND_UP(N, S) ((((N) + (S) - 1) / (S)) * (S))
With the above macro, could someone please help me on understanding the "(s)-1" part, why's that?
and also macros like:
#define PAGE_ROUND_DOWN(x) (((ULONG_PTR)(x)) & (~(PAGE_SIZE-1)))
#define PAGE_ROUND_UP(x) ( (((ULONG_PTR)(x)) + PAGE_SIZE-1) & (~(PAGE_SIZE-1)) )
I know the "(~(PAGE_SIZE-1)))" part will zero out the last five bits, but other than that I'm clueless, especially the role '&' operator plays.
Thanks,

The ROUND_UP macro is relying on integer division to get the job done. It will only work if both parameters are integers. I'm assuming that N is the number to be rounded and S is the interval on which it should be rounded. That is, ROUND_UP(12, 5) should return 15, since 15 is the first interval of 5 larger than 12.
Imagine we were rounding down instead of up. In that case, the macro would simply be:
#define ROUND_DOWN(N,S) ((N / S) * S)
ROUND_DOWN(12,5) would return 10, because (12/5) in integer division is 2, and 2*5 is 10. But we're not doing ROUND_DOWN, we're doing ROUND_UP. So before we do the integer division, we want to add as much as we can without losing accuracy. If we added S, it would work in almost every case; ROUND_UP(11,5) would become (((11+5) / 5) * 5), and since 16/5 in integer division is 3, we'd get 15.
The problem comes when we pass a number that's already rounded to the multiple specified. ROUND_UP(10, 5) would return 15, and that's wrong. So instead of adding S, we add S-1. This guarantees that we'll never push something up to the next "bucket" unnecessarily.
The PAGE_ macros have to do with binary math. We'll pretend we're dealing with 8-bit values for simplicity's sake. Let's assume that PAGE_SIZE is 0b00100000. PAGE_SIZE-1 is thus 0b00011111. ~(PAGE_SIZE-1) is then 0b11100000.
A binary & will line up two binary numbers and leave a 1 anywhere that both numbers had a 1. Thus, if x was 0b01100111, the operation would go like this:
0b01100111 (x)
& 0b11100000 (~(PAGE_SIZE-1))
------------
0b01100000
You'll note that the operation really only zeroed-out the last 5 bits. That's all. But that was exactly that operation needed to round down to the nearest interval of PAGE_SIZE. Note that this only worked because PAGE_SIZE was exactly a power of 2. It's a bit like saying that for any arbitrary decimal number, you can round down to the nearest 100 simply by zeroing-out the last two digits. It works perfectly, and is really easy to do, but wouldn't work at all if you were trying to round to the nearest multiple of 76.
PAGE_ROUND_UP does the same thing, but it adds as much as it can to the page before cutting it off. It's kinda like how I can round up to the nearest multiple of 100 by adding 99 to any number and then zeroing-out the last two digits. (We add PAGE_SIZE-1 for the same reason we added S-1 above.)
Good luck with your virtual memory!

Using integer arithmetic, dividing always rounds down. To fix that, you add the largest possible number that won't affect the result if the original number was evenly divisible. For the number S, that largest possible number is S-1.
Rounding to a power of 2 is special, because you can do it with bit operations. A multiple of 2 will aways have a zero in the bottom bit, a multiple of 4 will always have zero in the bottom two bits, etc. The binary representation of a power of 2 is a single bit followed by a bunch of zeros; subtracting 1 will clear that bit, and set all the bits to the right. Inverting that value creates a bit mask with zeros in the places that need to be cleared. The & operator will clear those bits in your value, thus rounding the value down. The same trick of adding (PAGE_SIZE-1) to the original value causes it to round up instead of down.

The page rounding macros assume that `PAGE_SIZE is a power of two, such as:
0x0400 -- 1 KiB
0x0800 -- 2 KiB`
0x1000 -- 4 KiB
The value of PAGE_SIZE - 1, therefore, is all one bits:
0x03FF
0x07FF
0x0FFF
Therefore, if integers were 16 bits (instead of 32 or 64 - it saves me some typing), then the value of ~(PAGE_SIZE-1) is:
0xFC00
0xFE00
0xF000
When you take the value of x (assuming, implausibly for real life, but sufficient for the purposes of exposition, that ULONG_PTR is an unsigned 16-bit integer) is 0xBFAB, then
PAGE_SIZE PAGE_ROUND_DN(0xBFAB) PAGE_ROUND_UP(0xBFAB)
0x0400 --> 0xBC00 0xC000
0x0800 --> 0xB800 0xC000
0x1000 --> 0xB000 0xC000
The macros round down and up to the nearest multiple of a page size. The last five bits would only be zeroed out if PAGE_SIZE == 0x20 (or 32).

Based on the current draft standard (C99) this macro is not entirely correct however, note that for negative values of N the result will almost certainly be incorrect.
The formula:
#define ROUND_UP(N, S) ((((N) + (S) - 1) / (S)) * (S))
Makes use of the fact that integer division rounds down for non-negative integers and uses the S - 1 part to force it to round up instead.
However, integer division rounds towards zero (C99, Section 6.5.5. Multiplicative operators, item 6). For negative N, the correct way to 'round up' is: 'N / S', nothing more, nothing less.
It gets even more involved if S is also allowed to be a negative value, but let's not even go there... (see: How can I ensure that a division of integers is always rounded up? for a more detailed discussion of various wrong and one or two right solutions)

The & makes it so.. well ok, lets take some binary numbers.
(with 1000 being page size)
PAGE_ROUND_UP(01101b)=
01101b+1000b-1b & ~(1000b-1b) =
01101b+111b & ~(111b) =
01101b+111b & ...11000b = (the ... means 1's continuing for size of ULONG)
10100b & 11000b=
10000b
So, as you can see(hopefully) This rounds up by adding PAGE_SIZE to x and then ANDing so it cancels out the bottom bits of PAGE_SIZE that are not set

This is what I use:
#define SIGN(x) ((x)<0?-1:1)
#define ROUND(num, place) ((int)(((float)(num) / (float)(place)) + (SIGN(num)*0.5)) * (place))
float A=456.456789
B=ROUND(A, 50.0f) // 450.0
C=ROUND(A, 0.001) // 456.457

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight