distinguishes between signed and unsigned in machine code - c

I was reading a text book saying:
It is important to note how machine code distinguishes between signed
and unsigned values. Unlike in C, it does not associate a data type
with each program value. Instead, it mostly uses the same
(assembly)instructions for the two cases, because many arithmetic
operations have the same bit-level behavior for unsigned and
two’s-complement arithmetic.
I don't understand what it means, could anyone provide me an example?

For example, this code:
int main() {
int i = -1;
if(i < 9)
i++;
unsigned u = -1; // Wraps around to UINT_MAX value
if(u < 9)
u++;
}
gives following output on x86 GCC:
main:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], -1 ; i = -1
cmp DWORD PTR [rbp-4], 8 ; i comparison
jg .L2 ; i comparison
add DWORD PTR [rbp-4], 1 ; i addition
.L2:
mov DWORD PTR [rbp-8], -1 ; u = -1
cmp DWORD PTR [rbp-8], 8 ; u comparison
ja .L3 ; u comparison
add DWORD PTR [rbp-8], 1 ; u addition
.L3:
mov eax, 0
pop rbp
ret
Notice how it uses the same instructions on intialization (mov) and increment (add) for variables i and u. This is because the bit pattern changes identically for unsigned and 2's complement.
Comparison also uses the same instruction cmp, but jump decision has to be different, because values where the highest bit is set are different on the types: jg (jump if greater) on signed, and ja (jump if above) on unsigned.
What instructions are chosen, depends on the architecture and the compiler.

On Intel Processors (x86 family) and others that have FLAGS, you get bits in those FLAGS that tell you how the last operation worked. The name of the FLAGS vary a little between processors, but in general you have two important ones in regard to arithmetic: CF and OF.
CF is the Carry bit (often called C on other processors).
OF is the Overflow bit (often called V on other processors).
More or less, CF represents an unsigned overflow and OF represents a signed overflow. When the processors does the ADD operation, it has one extra bit, which is CF. So, if you add two 64 bit numbers, the result without wrapping may need 65 bits. That is the carry. The OF flag is set to the highest bit (so bit 63 in a 64 bit number), using 3 logical operations against that bit in the two sources and the destination.
There is an example of how CF works with 4 bit registers:
R1 = 1010
R2 = 1101
R3 = R1 + R2 = 1 0111
^
+---- carry (CF)
The extra 1 doesn't fit in R3 so it gets put in the CF bit instead. As a side note, the MIPS processor does not have any FLAGS. It's up to you to determine whether a carry is generated (which you can do using XOR and such on the two sources and the destination).
However, in C (and C++), there is no verification of overflow on your integer types (at least not by default.) So in other words, the CF and OF flags are ignored for all your operations except the four compare operators (<, <=, >, >=).
As shown in the example presented by #user694733, the difference is whether a jg or ja will be used. Each of the 16 jump instructions will test various flags to know whether to jump or not. That combination is really what makes the difference.
Another interesting aspect is the difference between ADC and ADD. In one case you add with the carry and the other you don't. It's probably not used as much now that we have 64 bit computers, but to add two 64 bit numbers with a 32 bit processor, it would add the lower 32 bits as unsigned 32 bit numbers and then add the upper 32 bit numbers (signed or unsigned as may be the case) plus the carry from the first operation.
Say you have two 64 bit numbers in 32 bit registers (ECX:EAX and EDX:EBX), you would add them like this:
ADD EAX, EBX
ADC ECX, EDX
Here the EDX and the carry are added to ECX if EAX + EBX had an unsigned overflow (carry--meaning that adding EAX and EBX properly should be represented by 33 bits now because the result doesn't fit 32 bits, the CF flag is that 33rd bit).
To be noted, the Intel processors have:
A Zero bit: ZF (whether the result is zero or not,)
CF is called "Borrow" when subtracting (for SBC, SBB,) and
Also the AF bit which is used for "decimal number operations" (which no one in their right mind uses.) That AF bit tells you that there is an overflow in the decimal operation. Something like that. I never used that one. I find their use too complicated/cumbersome. Also, the bit is sill there in amd64 but the instructions setting it were removed (see DAA for example).

The beauty of twos complement is that for addition (and as a result subtraction since that uses an adder, again part of the beauty of twos complement). That the add operation itself does not care about signed vs unsigned the same bit patterns added together produce the same result 0xFE + 0x01 = 0xFF, -2 + 1 = 1 also 126 + 1 = 127. same input bits same result pattern.
Twos complement helps for only a percentage. Not all. add/subtract but not necessarily multiply and divide. Bitwise of course bits is bits. But (right) shifts desire a difference, but does C deliver?
The comparisons are very sensitive. The equal and not equal, zero and not zero those are single flag tests and will work. But unsigned less than and signed less than are not the same set of flags that are used/tested. The less than and greater than with or without equal applied to them do not work the same way with unsigned vs signed. Likewise signed overflow and unsigned overflow (often just called the carry bit) are computed differently from each other. And some instruction sets the carry bit is inverted when the operand is a subtract, but not always, so for comparisons you need to know whether or not it is a borrow bit on subtract or always just the carry out unmodified.
Multiplication and likely division are "it depends". An N bit times N bit equals N bit result signed and unsigned both work, but N bit times N bit equals 2*Nbit (the only really useful hardware multiply) requires a signed and unsigned version to have the hardware/instruction do all the work, otherwise you have to break the operands up into parts if you don't have both flavors. A simple paper and pencil grade school will show why, leave that to the reader to figure out.
You don't need us at all you can easily provide your own example and see from the compiler output when there is a difference and when there isn't.
int32_t fun0 ( int32_t a, int32_t b ) { return a+b; }
int32_t fun1 ( int32_t a, int32_t b ) { return a*b; }
int32_t fun2 ( int32_t a, int32_t b ) { return a^b; }
uint32_t fun3 ( uint32_t a, uint32_t b ) { return a+b; }
uint32_t fun4 ( uint32_t a, uint32_t b ) { return a*b; }
uint32_t fun5 ( uint32_t a, uint32_t b ) { return a^b; }
uint32_t fun6 ( uint64_t a, uint64_t b ) { return a+b; }
uint32_t fun7 ( uint64_t a, uint64_t b ) { return a*b; }
uint32_t fun8 ( uint64_t a, uint64_t b ) { return a^b; }
uint64_t fun9 ( uint64_t a, uint64_t b ) { return a*b; }
int64_t fun10 ( int64_t a, int64_t b ) { return a*b; }
uint64_t fun11 ( uint32_t a, uint32_t b ) { return a*b; }
int64_t fun12 ( int32_t a, int32_t b ) { return a*b; }
int32_t comp0 ( int32_t a, int32_t b ) { return a<b; }
uint32_t comp1 ( uint32_t a, uint32_t b ) { return a<b; }
plus other operators and combinations.
EDIT
Okay the real answer...rather than making you do the work.
I want to add -2 and +1
11111110
+ 00000001
============
finish it
00000000
11111110
+ 00000001
============
11111111
-2 + 1 = -1
What about 127 + 1
00000000
11111110
+ 00000001
============
11111111
hmmm...same bits in same bits out, but how I interpret those bits as a programmer varies widely.
You can try as many legal values as you want (ones that don't overflow the result) and you will see that the addition result does not know nor care about signed vs unsigned operants. Part of the beauty of twos complement.
Subtraction is just addition in logic, some may have learned "invert and add one" want to know what the bit pattern 11111111 is you invert 00000000 and add 1 00000001 so 11111111 is -1. But how does addition really work with two operands as shown above you really need a three bit adder three bits in and two bits out the result and carry out, so there is a carry in, two operand bits a result and carry out. What if we go back to grade school as well...
-32 - 3 = (-32) + (-3) apply the invert and add one to the -3 and we get (-32) + (~3) + 1
1
11100000
+ 11111100
==============
and thats how a computer does that math, inverts the carry in and the second operand. SOME invert the carry out because a 1 on carry out when the adder is used as a subtractor means no borrow, but a 0 means a borrow happened. so some instruction sets will invert the carry out some will not. this is hugely important for this topic.
Likewise the carry out bit is computed based on the addition of the msbits of the operands and the carry in to that position, it is the carry out of that addition.
abcxxxxxx
dxxxxxxx
+ exxxxxxx
============
f
a the carry out is the carry out when adding bits b+d+e. This is also known as the unsigned overflow flag when this is an addition operation and the operands are considered to be unsigned values. But the signed overflow flag is determined by whether b and a are equal or not equal.
In what situations does this happen.
bde af
000 00
001 01
010 01
011 10 <--
100 01 <--
101 10
110 10
111 11
so you can read that is carry in is not equal to carry out for the msbit there is a signed overflow. At the same time you can say if the msbit of the operands are equal and the msbit of the result is not equal to those operand bits then signed overflow is true. If you generate a table of signed numbers and their results and which overflow this will start to be clear, you don't have to do 8 bit by 8 bit 256 * 256 combinations, take 3 or 4 bit numbers synthesize your own addition routines that or 3 or 4 bits and that smaller number of combinations will be enough.
So while addition and subtraction themselves as far as the result bits go do not know signed from unsigned the flags if you have a processor that uses them the C or carry flag the V or overflow flag have a signed based use case. The carry flag itself can have of two definitions when produced by a subtract depending on the instruction set and since comparisons are generally done with a subtraction that carry definition matters to how the flags are then used.
Greater than or less than while using a subtract to determine how they are used and the result itself is not affected by signedness how the flags are interpreted very much are.
Take some four bit positive numbers.
1101 - 1100 (13 - 12)
1100 - 1100 (12 - 12)
1011 - 1100 (11 - 12)
11111
1101
+ 0011
=======
0001
carry out 1, zero flag 0, v = 0, n = 0
11111
1100
+ 0011
========
0000
carry out 1, zero flag 1, v = 0, n = 0
00111
1011
+ 0011
========
1111
carry out 0, zero flag 0, v = 0, n = 1
(n is the msbit of the result, the sign bit 1 means signed negative number, zero means signed positive number)
cz
10 greater than but not equal
11 equal
00 less than but not equal
same bit patterns
1101 - 1100 (-3 - -4)
1100 - 1100 (-4 - -4)
1011 - 1100 (-5 - -4)
cz
10 greater than but not equal
11 equal
00 less than but not equal
so far nothing changed.
but if I examine all the combinations
#include <stdio.h>
int main ( void )
{
unsigned int ra;
unsigned int rb;
unsigned int rc;
unsigned int rx;
unsigned int v;
unsigned int n;
int sa,sb;
for(ra=0;ra<0x10;ra++)
for(rb=0;rb<0x10;rb++)
{
for(rx=8;rx;rx>>=1) if(rx&ra) printf("1"); else printf("0");
printf(" - ");
for(rx=8;rx;rx>>=1) if(rx&rb) printf("1"); else printf("0");
rc=ra-rb;
printf(" = ");
for(rx=8;rx;rx>>=1) if(rx&rb) printf("1"); else printf("0");
printf(" c=%u",(rc>>4)&1);
printf(" n=%u",(rc>>3)&1);
n=(rc>>3)&1;
if((rc&0xF)==0) printf(" z=1"); else printf(" z=0");
v=0;
if((ra&8)==(rb&8))
{
if((ra&8)==(rc&8)) v=1;
}
printf(" v=%u",v);
printf(" (%2u - %2u)",ra,rb);
sa=ra;
if(sa&8) sa|=0xFFFFFFF0;
sb=rb;
if(sb&8) sb|=0xFFFFFFF0;
printf(" (%+2d - %+2d)",sa,sb);
if(rc&0x10) printf(" C ");
if(n==v) printf(" NV ");
printf("\n");
}
}
you can find fragments within the output that show the problem.
0000 - 0110 = 0110 c=1 n=1 z=0 v=0 ( 0 - 6) (+0 - +6) C
0000 - 0111 = 0111 c=1 n=1 z=0 v=0 ( 0 - 7) (+0 - +7) C
0000 - 1000 = 1000 c=1 n=1 z=0 v=0 ( 0 - 8) (+0 - -8) C
0000 - 1001 = 1001 c=1 n=0 z=0 v=0 ( 0 - 9) (+0 - -7) C NV
0000 - 1010 = 1010 c=1 n=0 z=0 v=0 ( 0 - 10) (+0 - -6) C NV
0000 - 1011 = 1011 c=1 n=0 z=0 v=0 ( 0 - 11) (+0 - -5) C NV
For unsigned 0 is less than 6,7,8,9... so the carry out is set so that means greater than. But the same bit patterns signed 0 is less than 6 and 7 but greater than -8 -7 -6 ...
What is not obvious necessarily until you stare at it a lot or just cheat and look at ARMs documentation for signed if N == V it is a signed greater than or equal. for N != V it is a signed less than. don't need to examine the carry out. particularly the signed bit pattern problems 0000 and 1000 don't work with the carry like other bit patterns.
Hmm, I wrote this all up in other questions before. Anyway, multiply both does and doesn't care about unsigned and signed.
Using your calculator 0xF * 0xF = 0xE1. The biggest 4 bit number times the biggest 4 bit number gives an 8 bit number, we need twice as many bits to cover all the bit patterns.
1111
* 1111
=================
1111
1111
1111
+ 1111
=================
11100001
so we see the addition that results is at least 2n-1 bits, if you end up with a carry off that last bit then you end up with 2n bits.
but, what is -1 * -1? its equal to 1 right? what are we missing?
unsigned has implied zeros
00001111
* 1111
=================
00001111
00001111
00001111
+00001111
=================
00011100001
but signed the sign is extended
11111111
* 1111
=================
11111111
11111111
11111111
+11111111
=================
00000000001
so sign matters with multiply?
0xC * 0x3 = 0xF4 or 0x24.
#include <stdio.h>
int main ( void )
{
unsigned int ra;
unsigned int rb;
unsigned int rc;
unsigned int rx;
int sa;
int sb;
int sc;
for(ra=0;ra<0x10;ra++)
for(rb=0;rb<0x10;rb++)
{
sa=ra;
if(ra&8) sa|=0xFFFFFFF0;
sb=rb;
if(rb&8) sb|=0xFFFFFFF0;
rc=ra*rb;
sc=sa*sb;
if((rc&0xF)!=(sc&0xF))
{
for(rx=8;rx;rx>>1) if(rx&ra) printf("1"); else printf("0");
printf(" ");
for(rx=8;rx;rx>>1) if(rx&rb) printf("1"); else printf("0");
printf("\n");
}
}
}
and there is no output. as expected. the bits abcd * 1111
abcd
1111
===============
aaaaabcd
aaaaabcd
aaaaabcd
aaaaabcd
================
four bits in on each operand if I only care about the lower four bits out
abcd
1111
===============
abcd
bcd
cd
d
================
how the operand sign extends does not matter as far as the result is concerned
Now knowing that a significant portion of the possible combinations of n bit times n bit equals n bit overflow it doesnt help you much to do such a thing in any code you want to be useful.
int a,b,c;
c = a * b;
not very useful except for smaller numbers.
But the reality is as far as multiply if the result is the same size as the operands then signed vs unsigned does not matter, if the result is the proper twice the size of the operands then you need a separate signed multiply instruction/operation and an unsigned. You can certainly cascade/synthesize the nn=2n with an nn=n instruction as you will see in some instruction sets.
bitwise operands, xor, or, and, these are bitwise they dont/cant care about sign.
shift left start with abcd shift one bcd0, shift two cd00 and so on. not very interesting. Shift right though desires to have separate arithmetic and logical shift right where arithmetic the msbit is duplicated as the shift in bit, and logical a zero shifts in arithmetic abcd aabc aaab aaaa, logical abcd 0abc 00ab 000a 0000
But we dont have two kinds of shift right in C. But when doing addition and subtraction directly, bits is bits, the beauty of twos complement. When doing a comparison which is a subtract then the flags used are different for signed vs unsigned for a number of the comparisons, get the older ARM architectural reference manual, I think they call it the armv5 one, even though it goes back to the armv4 and up to the armv6.
There is a section called "The condition field" and a table, this very nicely shows at least for the ARM flags the flag combinations for both unsigned this and that, signed this and that and the ones that dont care about signedness (equal, not equal, etc) wont say anything.
Understand/remember that some instruction sets not only invert the carry in bit and second operand on a subtract but will also invert the carry out bit. so if a carry bit is used on something signed then it is inverted. the stuff I did above where I tried to use the term carry out instead of carry flag, the carry flag would be inverted for some other instruction sets and the unsigned greater than and less than table flips over.
Division is not as easy to show, you have to do long division, etc. I will leave that one to the reader.
Not all documentation is as good as the table I am referring to in ARMs docs. Other processor documentation may or may not make the unsigned vs signed, they might just say jump if greater than and you may have to experimentally figure out what that means. Now that you now all of this you may have already figured out you dont for example need a branch if unsigned or equal. That just means branch if not less than so you can
cmp r0,r1
or
cmp r1,r0
and just use branch if carry to cover the unsigned less than, unsigned less than or equal, unsigned greater than, unsigned greater than or equal cases. Although you might upset some programmers doing that because you were trying to save some bits in the instruction.
Saying ALL of that, the processor never distinguishes signed from unsigned. These are concepts that only mean something to the programmer, processors are very stupid. Bits is bits, the processor doesnt know if these bits are an address, if they are a variable if they are a character in a string, a floating point number (being implemented with a soft float library in fixed point), these interpretations are only meaningful to the programmer not the processor. The processor does not "distinguish between unsigned and signed in machine code", the programmer has to properly place bits that are meaningful to the programmer and then select the right instructions and sequences of instructions to perform the task the programmer wants performed. Some 32 bit number in a register is only an address when those bits are used to address something with a load or store, once that one clock cycle where they are sampled to be delivered to an address bus they are an address, before and after that they are just bits. When you increment that pointer in your program they are not an address they are just bits you are adding some other bits to. You can certainly build a MIPS like instruction set with no flags, and only N bit to N bit multiplies, only have a jump if two registers are equal or not equal instruction no other greater than or less than type instructions and still be able to make useful programs just like instruction sets that go overboard with those things unsigned this flag and signed that flag, unsigned this instruction and signed that instruction.
A not so popular but sometimes talked about in school, maybe there was a real instruction set or many that did this is a non-twos complement solution and that pretty much means sign and magnitude a sign bit and an unsigned value so +3 is 0011 and -3 is 1011 for a four bit register that burns one bit for sign when doing signed math. You then as with twos complement have to sit down with pencil and paper and work through the math operations, grade school style, then implement those in logic. Does this result in a separate unsigned and signed add? twos complement 4 bit registers we can do 0-15 and -8 to +7 for sign magnitude we can declare unsigned is 0 - 15 but signed is -7 to +7. An exercise for the reader, the question/quote had to do with twos complement.

Check out Two's Complement and its arithmetic operations, it is signed numbers in binary.
Two's complement is the most common method of representing signed
integers on computers. In this scheme, if the binary number 010(2)
encodes the signed integer 2(10), then its two's complement, 110(2),
encodes the inverse: -2(10). In other words, to reverse the sign of any
integer in this scheme, you can take the two's complement of its
binary representation.
That way it is possible to have arithmetic operations between positive and negative binary values.
Two's Complement Python code snippet:
def twos_complement(input_value, num_bits):
'''Calculates a two's complement integer from the given input value's bits'''
mask = 2**(num_bits - 1)
return -(input_value & mask) + (input_value & ~mask)

Related

CarryFrom operation on arm processors

The Arm Architecture manual says for the ADC instruction to set the C (carry) flag in the CPSR if the S-Flag is set and a carry "occured". From the book (page 155):
C Flag = CarrryFrom(Rn + shifer_operand + C Flag)
And according to the glossar the CarryFrom is defined as follows:
CarryFrom
Returns 1 if the addition specified as its parameter caused a carry (true result is bigger
than 2^(32)−1, where the operands are treated as unsigned integers), and returns 0 in all other cases.
This delivers further information about an addition which occurred earlier in the pseudo-code. The addition is not repeated.
Now I'm wondering if the CarryForm operation is the same as an overflow check. Can anyone explain me, how I can "emulate" the CarryFrom operation or how it works?
Simply binary addition, x is the carry in to the operation and y is the carry out. For a normal add carry in is a 0 and for a normal subtract carry in is a 1. (adders are used to do subtraction, one of the features of twos complement)
y x
1111
+ 0001
======
11110
1111
+ 0001
======
0000
So the result is 0000, the carry out is a 1. Some architectures (of all of them x86, arm, mips, pdp11, 6502, ...)(yep, I know about mips in this context) invert the carry out for a subtract and leave it not inverted for addition. In this case you are asking about ADC, so that is addition so it should not be modified by any architecture.
And 4 bits or 40 does not matter it all works the same.
So if you want to add 0x0F and 0x01 but you only have a 4 bit adder (again think 64 bits and 32 instead of 8 and 4, it all works the same).
We start with normal addition of the lower bits
11110
1111
+ 0001
======
0000
Then we do an add with carry and use the carry out of the prior addition as the carry in to the second (or next one since you can do this for as much code/memory space as you have)
1
0000
+ 0000
======
00001
0000
+ 0000
======
0001
And the end result is 0x10. 0x0F + 0x01 = 0x10
The first add here happens to have an unsigned overflow as indicated by a non-zero carry out/carry flag. If you focus only on that. If the adc also had an unsigned overflow then the whole result is bad as it won't fit in the number of bits. (if the programmer considered these to be unsigned values, if signed then you look at the V bit for overflow but the Carry still cascades from the first ADD to the ADC and then from each ADC to the next until you have covered the width of the higher level operation).

Why is my right shift operator functioning logically rather than arithmetically in c?

I’m currently playing around with bit manipulation in C, and I’m noticing that the right shift operator is not behaving as I anticipated. From what I understand, left shifts leave 0s behind as it shifts form the LSB, and right shifts leave 1s as it shifts from the MSB.
So I tried making a simple bit mask that looks like this:
110000
By creating an int 32>>1.
When I type in 32>>1, I don’t get 48 as expected, I get 16. Why?
The arithmetic right shift only shifts in ones if the high bit is currently 1.
Consider this program:
main() {
int a, b;
a = 32;
b = a>>1;
printf("a is %d 0x%08x; b is %d 0x%08x\n",a,(unsigned int)a,
b,(unsigned int)b);
a = -10;
b = a>>1;
printf("a is %d 0x%08x; b is %d 0x%08x\n",a,(unsigned int)a,
b,(unsigned int)b);
}
If you run this, you will see:
a is 32 0x00000020; b is 16 0x00000010
a is -10 0xfffffff6; b is -5 0xfffffffb
The most significant bit (MSB) that an arithmetic right-shift replicates is the high bit in the type being shifted, not the high bit in the value being shifted.
Thus, if you have a 32-bit int with value 32, the most significant bit of its value (1000002) is the bit in the 25 position. But the computer works with all 32 bits (000000000000000000000000001000002), and the most significant bit is the 0 in the 231 position.
That said, the behavior of >> with a negative left operand is implementation-defined. It may be arithmetic right shift, but it may be something else.

How to create mask with least significat bits set to 1 in C

Can someone please explain this function to me?
A mask with the least significant n bits set to 1.
Ex:
n = 6 --> 0x2F, n = 17 --> 0x1FFFF // I don't get these at all, especially how n = 6 --> 0x2F
Also, what is a mask?
The usual way is to take a 1, and shift it left n bits. That will give you something like: 00100000. Then subtract one from that, which will clear the bit that's set, and set all the less significant bits, so in this case we'd get: 00011111.
A mask is normally used with bitwise operations, especially and. You'd use the mask above to get the 5 least significant bits by themselves, isolated from anything else that might be present. This is especially common when dealing with hardware that will often have a single hardware register containing bits representing a number of entirely separate, unrelated quantities and/or flags.
A mask is a common term for an integer value that is bit-wise ANDed, ORed, XORed, etc with another integer value.
For example, if you want to extract the 8 least significant digits of an int variable, you do variable & 0xFF. 0xFF is a mask.
Likewise if you want to set bits 0 and 8, you do variable | 0x101, where 0x101 is a mask.
Or if you want to invert the same bits, you do variable ^ 0x101, where 0x101 is a mask.
To generate a mask for your case you should exploit the simple mathematical fact that if you add 1 to your mask (the mask having all its least significant bits set to 1 and the rest to 0), you get a value that is a power of 2.
So, if you generate the closest power of 2, then you can subtract 1 from it to get the mask.
Positive powers of 2 are easily generated with the left shift << operator in C.
Hence, 1 << n yields 2n. In binary it's 10...0 with n 0s.
(1 << n) - 1 will produce a mask with n lowest bits set to 1.
Now, you need to watch out for overflows in left shifts. In C (and in C++) you can't legally shift a variable left by as many bit positions as the variable has, so if ints are 32-bit, 1<<32 results in undefined behavior. Signed integer overflows should also be avoided, so you should use unsigned values, e.g. 1u << 31.
For both correctness and performance, the best way to accomplish this has changed since this question was asked back in 2012 due to the advent of BMI instructions in modern x86 processors, specifically BLSMSK.
Here's a good way of approaching this problem, while retaining backwards compatibility with older processors.
This method is correct, whereas the current top answers produce undefined behavior in edge cases.
Clang and GCC, when allowed to optimize using BMI instructions, will condense gen_mask() to just two ops. With supporting hardware, be sure to add compiler flags for BMI instructions:
-mbmi -mbmi2
#include <inttypes.h>
#include <stdio.h>
uint64_t gen_mask(const uint_fast8_t msb) {
const uint64_t src = (uint64_t)1 << msb;
return (src - 1) ^ src;
}
int main() {
uint_fast8_t msb;
for (msb = 0; msb < 64; ++msb) {
printf("%016" PRIx64 "\n", gen_mask(msb));
}
return 0;
}
First, for those who only want the code to create the mask:
uint64_t bits = 6;
uint64_t mask = ((uint64_t)1 << bits) - 1;
# Results in 0b111111 (or 0x03F)
Thanks to #Benni who asked about using bits = 64. If you need the code to support this value as well, you can use:
uint64_t bits = 6;
uint64_t mask = (bits < 64)
? ((uint64_t)1 << bits) - 1
: (uint64_t)0 - 1
For those who want to know what a mask is:
A mask is usually a name for value that we use to manipulate other values using bitwise operations such as AND, OR, XOR, etc.
Short masks are usually represented in binary, where we can explicitly see all the bits that are set to 1.
Longer masks are usually represented in hexadecimal, that is really easy to read once you get a hold of it.
You can read more about bitwise operations in C here.
I believe your first example should be 0x3f.
0x3f is hexadecimal notation for the number 63 which is 111111 in binary, so that last 6 bits (the least significant 6 bits) are set to 1.
The following little C program will calculate the correct mask:
#include <stdarg.h>
#include <stdio.h>
int mask_for_n_bits(int n)
{
int mask = 0;
for (int i = 0; i < n; ++i)
mask |= 1 << i;
return mask;
}
int main (int argc, char const *argv[])
{
printf("6: 0x%x\n17: 0x%x\n", mask_for_n_bits(6), mask_for_n_bits(17));
return 0;
}
0x2F is 0010 1111 in binary - this should be 0x3f, which is 0011 1111 in binary and which has the 6 least-significant bits set.
Similarly, 0x1FFFF is 0001 1111 1111 1111 1111 in binary, which has the 17 least-significant bits set.
A "mask" is a value that is intended to be combined with another value using a bitwise operator like &, | or ^ to individually set, unset, flip or leave unchanged the bits in that other value.
For example, if you combine the mask 0x2F with some value n using the & operator, the result will have zeroes in all but the 6 least significant bits, and those 6 bits will be copied unchanged from the value n.
In the case of an & mask, a binary 0 in the mask means "unconditionally set the result bit to 0" and a 1 means "set the result bit to the input value bit". For an | mask, an 0 in the mask sets the result bit to the input bit and a 1 unconditionally sets the result bit to 1, and for an ^ mask, an 0 sets the result bit to the input bit and a 1 sets the result bit to the complement of the input bit.

What do these C operators mean?

I'm reading the book "Programming Challenges: The Programming Contest Training Manual" and are implementing a problem where I do not understand the use of operators c>>1 and the comparison if (n&1), someone could help me to know they mean?
this is the example code
#include <stdio.h>
#define MAX_N 300
#define MAX_D 150
long cache[MAX_N/2][2];
void make_cache(int n,int d,int mode)
{
long tmp[MAX_D];
int i,count;
for(i=0;i<MAX_D;i++) tmp[i]=0;
tmp[0]=1;count=0;
while(count<=n)
{
count++;
for(i=(count&1);i<=d;i+=2)
{
if(i)
tmp[i] = tmp[i-1] + tmp[i+1];
else if(!mode)
tmp[0]=tmp[1];
else
tmp[0]=0;
}
if((count&1)==(d&1))
cache[count>>1][mode]=tmp[d];
}
}
int main()
{
int n,d,i;
long sum;
while(1)
{
scanf("%d %d",&n,&d);
if(n&1)
sum=0;
else if(d==1)
sum=1;
else if(n<(d<<1))
sum=0;
else if(n==(d<<1))
sum=1;
else
{
make_cache(n,d,0);
make_cache(n,d,1);
sum=0;
for(i=0;i<=(n>>1);i++)
sum+=cache[i][0]*cache[(n>>1)-i][1];
}
printf("%ld\n",sum);
}
return 0;
}
>> shifts the bits to the right n number of bits. So this:
1011 0101
shifted down 1 becomes:
0101 1010
The & operator does a bitwise and, so again take:
1011 0101
& with 1 you get (and means both have to be 1, else it's 0):
1011 0101
&0000 0001
----------
0000 0001
Hopefully this helps answer your question!
c >> 1 is to divide it by 2 (as integer), and n & 0x1 is often to test whether a number is odd or not.
there are some articles here:
http://irc.essex.ac.uk/www.iota-six.co.uk/c/e5_bitwise_shift_operators.asp
http://irc.essex.ac.uk/www.iota-six.co.uk/c/e4_bitwise_operators_and_or_xor.asp
c >> 1 means right shift the variable c by 1 bit which effectively is same as dividing it by 2. '&' is a bitwise AND operator used for testing whether a particular bit is set or not. When you do n & 1 it is same as n & 0x0001 which checks whether the least significant bit of the variable is set or not. It will result in true if set false otherwise.
c>>1 shifts the bits in C one to the "right" which ends up being the same as doing an integer division by 2 for unsigned or positive integers. i.e. 5/2 = 2 == 0101>>1 = 0010.
n&1 performing a binary AND between n and 1. if (n&1) is checking for whether a number is odd, since odd numbers will have LSB of 1 and even numbers don't.
Such "tricks" are cute and is of little value in general (since the compiler should be doing these kind of tricks). It is doubly useless in a programming competition where the foremost goal is to produce a correct solution: such "tricks" will only get in the way of having easy to read source code thus making debugging harder.
Those are bitwise operators. << and >> shift bits left and right, respectively. The '&' is the AND operator is a single ampersand. When you AND two bits, the result is 1 if both bits are 1, but 0 if both or either one of the bits is 0. A good way to think about it is both bits must be "set" for this to equal 1.
I wrote a tutorial on various Bit Twiddling.
These operators are used in comparing the Odd and Even Numbers.
The least Significant bit of any odd number is One always (i.e) 010(1)
So if (Oddnumber&1)=1 and (evennumber&1=0) by default.
As noted in other answers, these are bitwise operators. They may be unfamiliar because they are very "close to the hardware" operations. They are tied to the particular way that computers store numbers (binary) which is why they aren't taught in standard math classes. The reason they are exposed to the programmer is that they are very fast in hardware so some algorithms can be significantly optimized with their use.
Note that >> behaves differently on unsigned types (whether char, short, long, or long long) than on signed types. In both cases, it right shifts, but for the unsigned types, the "new" bits on the left are all 0 while for signed types, they are 0 or 1 depending on the original high bit value. So a signed character:
1011 0101
shifted down 1 becomes
1101 1010
This makes it consistent as a divide-by-power-of-2 operation; -75 / 2 = -37.5, rounded down to -38.

Need help understanding "getbits()" method in Chapter 2 of K&R C

In chapter 2, the section on bitwise operators (section 2.9), I'm having trouble understanding how one of the sample methods works.
Here's the method provided:
unsigned int getbits(unsigned int x, int p, int n) {
return (x >> (p + 1 - n)) & ~(~0 << n);
}
The idea is that, for the given number x, it will return the n bits starting at position p, counting from the right (with the farthest right bit being position 0). Given the following main() method:
int main(void) {
int x = 0xF994, p = 4, n = 3;
int z = getbits(x, p, n);
printf("getbits(%u (%x), %d, %d) = %u (%X)\n", x, x, p, n, z, z);
return 0;
}
The output is:
getbits(63892 (f994), 4, 3) = 5 (5)
I get portions of this, but am having trouble with the "big picture," mostly because of the bits (no pun intended) that I don't understand.
The part I'm specifically having issues with is the complements piece: ~(~0 << n). I think I get the first part, dealing with x; it's this part (and then the mask) that I'm struggling with -- and how it all comes together to actually retrieve those bits. (Which I've verified it is doing, both with code and checking my results using calc.exe -- thank God it has a binary view!)
Any help?
Let's use 16 bits for our example. In that case, ~0 is equal to
1111111111111111
When we left-shift this n bits (3 in your case), we get:
1111111111111000
because the 1s at the left are discarded and 0s are fed in at the right. Then re-complementing it gives:
0000000000000111
so it's just a clever way to get n 1-bits in the least significant part of the number.
The "x bit" you describe has shifted the given number (f994 = 1111 1001 1001 0100) right far enough so that the least significant 3 bits are the ones you want. In this example, the input bits you're requesting are there, all other input bits are marked . since they're not important to the final result:
ff94 ...........101.. # original number
>> p+1-n [2] .............101 # shift desired bits to right
& ~(~0 << n) [7] 0000000000000101 # clear all the other (left) bits
As you can see, you now have the relevant bits, in the rightmost bit positions.
I would say the best thing to do is to do a problem out by hand, that way you'll understand how it works.
Here is what I did using an 8-bit unsigned int.
Our number is 75 we want the 4 bits starting from position 6.
the call for the function would be getbits(75,6,4);
75 in binary is 0100 1011
So we create a mask that is 4 bits long starting with the lowest order bit this is done as such.
~0 = 1111 1111
<<4 = 1111 0000
~ = 0000 1111
Okay we got our mask.
Now, we push the bits we want out of the number into the lowest order bits so
we shift binary 75 by 6+1-4=3.
0100 1011 >>3 0000 1001
Now we have a mask of the correct number of bits in the low order and the bits we want out of the original number in the low order.
so we & them
0000 1001
& 0000 1111
============
0000 1001
so the answer is decimal 9.
Note: the higher order nibble just happens to be all zeros, making the masking redundant in this case but it could have been anything depending on the value of the number we started with.
~(~0 << n) creates a mask that will have the n right-most bits turned on.
0
0000000000000000
~0
1111111111111111
~0 << 4
1111111111110000
~(~0 << 4)
0000000000001111
ANDing the result with something else will return what's in those n bits.
Edit: I wanted to point out this programmer's calculator I've been using forever: AnalogX PCalc.
Nobody mentioned it yet, but in ANSI C ~0 << n causes undefined behaviour.
This is because ~0 is a negative number and left-shifting negative numbers is undefined.
Reference: C11 6.5.7/4 (earlier versions had similar text)
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. [...] If E1 has a signed
type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
In K&R C this code would have relied on the particular class of system that K&R developed on, naively shifting 1 bits off the left when performing left-shift of a signed number (and this code also relies on 2's complement representation), but some other systems don't share those properties so the C standardization process did not define this behaviour.
So this example is really only interesting as a historical curiosity, it should not be used in any real code since 1989 (if not earlier).
Using the example:
int x = 0xF994, p = 4, n = 3;
int z = getbits(x, p, n);
and focusing on this set of operations
~(~0 << n)
for any bit set (10010011 etc) you want to generate a "mask" that pulls only the bits you want to see. So 10010011 or 0x03, I'm interested in xxxxx011. What is the mask that will extract that set ? 00000111 Now I want to be sizeof int independent, I'll let the machine do the work i.e. start with 0 for a byte machine it's 0x00 for a word machine it's 0x0000 etc. 64 bit machine would represent by 64 bits or 0x0000000000000000
Now apply "not" (~0) and get 11111111
shift right (<<) by n and get 11111000
and "not" that and get 00000111
so 10010011 & 00000111 = 00000011
You remember how boolean operations work ?
In ANSI C ~0 >> n causes undefined behavior
// the post about left shifting causing a problem is wrong.
unsigned char m,l;
m = ~0 >> 4; is producing 255 and its equal to ~0 but,
m = ~0;
l = m >> 4; is producing correct value 15 same as:
m = 255 >> 4;
there is no problem with left shifting negative ~0 << whatsoever

Resources