Need help understanding "getbits()" method in Chapter 2 of K&R C - c

In chapter 2, the section on bitwise operators (section 2.9), I'm having trouble understanding how one of the sample methods works.
Here's the method provided:
unsigned int getbits(unsigned int x, int p, int n) {
return (x >> (p + 1 - n)) & ~(~0 << n);
}
The idea is that, for the given number x, it will return the n bits starting at position p, counting from the right (with the farthest right bit being position 0). Given the following main() method:
int main(void) {
int x = 0xF994, p = 4, n = 3;
int z = getbits(x, p, n);
printf("getbits(%u (%x), %d, %d) = %u (%X)\n", x, x, p, n, z, z);
return 0;
}
The output is:
getbits(63892 (f994), 4, 3) = 5 (5)
I get portions of this, but am having trouble with the "big picture," mostly because of the bits (no pun intended) that I don't understand.
The part I'm specifically having issues with is the complements piece: ~(~0 << n). I think I get the first part, dealing with x; it's this part (and then the mask) that I'm struggling with -- and how it all comes together to actually retrieve those bits. (Which I've verified it is doing, both with code and checking my results using calc.exe -- thank God it has a binary view!)
Any help?

Let's use 16 bits for our example. In that case, ~0 is equal to
1111111111111111
When we left-shift this n bits (3 in your case), we get:
1111111111111000
because the 1s at the left are discarded and 0s are fed in at the right. Then re-complementing it gives:
0000000000000111
so it's just a clever way to get n 1-bits in the least significant part of the number.
The "x bit" you describe has shifted the given number (f994 = 1111 1001 1001 0100) right far enough so that the least significant 3 bits are the ones you want. In this example, the input bits you're requesting are there, all other input bits are marked . since they're not important to the final result:
ff94 ...........101.. # original number
>> p+1-n [2] .............101 # shift desired bits to right
& ~(~0 << n) [7] 0000000000000101 # clear all the other (left) bits
As you can see, you now have the relevant bits, in the rightmost bit positions.

I would say the best thing to do is to do a problem out by hand, that way you'll understand how it works.
Here is what I did using an 8-bit unsigned int.
Our number is 75 we want the 4 bits starting from position 6.
the call for the function would be getbits(75,6,4);
75 in binary is 0100 1011
So we create a mask that is 4 bits long starting with the lowest order bit this is done as such.
~0 = 1111 1111
<<4 = 1111 0000
~ = 0000 1111
Okay we got our mask.
Now, we push the bits we want out of the number into the lowest order bits so
we shift binary 75 by 6+1-4=3.
0100 1011 >>3 0000 1001
Now we have a mask of the correct number of bits in the low order and the bits we want out of the original number in the low order.
so we & them
0000 1001
& 0000 1111
============
0000 1001
so the answer is decimal 9.
Note: the higher order nibble just happens to be all zeros, making the masking redundant in this case but it could have been anything depending on the value of the number we started with.

~(~0 << n) creates a mask that will have the n right-most bits turned on.
0
0000000000000000
~0
1111111111111111
~0 << 4
1111111111110000
~(~0 << 4)
0000000000001111
ANDing the result with something else will return what's in those n bits.
Edit: I wanted to point out this programmer's calculator I've been using forever: AnalogX PCalc.

Nobody mentioned it yet, but in ANSI C ~0 << n causes undefined behaviour.
This is because ~0 is a negative number and left-shifting negative numbers is undefined.
Reference: C11 6.5.7/4 (earlier versions had similar text)
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. [...] If E1 has a signed
type and nonnegative value, and E1 × 2E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined.
In K&R C this code would have relied on the particular class of system that K&R developed on, naively shifting 1 bits off the left when performing left-shift of a signed number (and this code also relies on 2's complement representation), but some other systems don't share those properties so the C standardization process did not define this behaviour.
So this example is really only interesting as a historical curiosity, it should not be used in any real code since 1989 (if not earlier).

Using the example:
int x = 0xF994, p = 4, n = 3;
int z = getbits(x, p, n);
and focusing on this set of operations
~(~0 << n)
for any bit set (10010011 etc) you want to generate a "mask" that pulls only the bits you want to see. So 10010011 or 0x03, I'm interested in xxxxx011. What is the mask that will extract that set ? 00000111 Now I want to be sizeof int independent, I'll let the machine do the work i.e. start with 0 for a byte machine it's 0x00 for a word machine it's 0x0000 etc. 64 bit machine would represent by 64 bits or 0x0000000000000000
Now apply "not" (~0) and get 11111111
shift right (<<) by n and get 11111000
and "not" that and get 00000111
so 10010011 & 00000111 = 00000011
You remember how boolean operations work ?

In ANSI C ~0 >> n causes undefined behavior
// the post about left shifting causing a problem is wrong.
unsigned char m,l;
m = ~0 >> 4; is producing 255 and its equal to ~0 but,
m = ~0;
l = m >> 4; is producing correct value 15 same as:
m = 255 >> 4;
there is no problem with left shifting negative ~0 << whatsoever

Related

Value of x when s = x >> 31 and x = (s & ~x) | (~s & x);

If x was to equal 12 in a 32 bit scenario, x = multiple 0's into the lsb 0000 1100. If the above scenario were to run, I believe I would get 0000 1100. Am I wrong?
Along with that, what if I was to use x=-1? Wouldn't s = 1, but then does (s & ~x) look like (0001 & 0000) and (1110 & 1111)? Thanks
I thought that x=-1 would mean x>>31 would be like 0001 (output 1), but I don't know if the above is correct.
The typical implementation of a right shift of a signed integer is an arithmetic shift. Different implementations are unfortunately still allowed, though rare, and they're not relevant to understanding this code (it ignores such possibilities anyway). Two's complement integers are now mandatory (in C23: "The sign representation defined in this document is called two’s complement. Previous revisions of this document
additionally allowed other sign representation") so I'm not going to do the usual consideration of hypothetical integer representations that haven't been seen since the stone age.
By assumption the number of bits in an int is 32, so shifting an int right by 31 makes every bit of the result a copy of the sign bit. So if x was negative, s would be -1.
x = (s & ~x) | (~s & x) is a verbose way to spell out x ^= s. XORing x by 0 leaves it the same as before, XORing it by -1 inverts all the bits. Taking into account that s = x < 0 ? -1 : 0, effectively the computation does this:
if (x < 0)
x = ~x; // equivalent to: x = -x - 1;

Trying to perform bitwise operation on a negative number to make 0s in bits 31-16 (32 bit number) in C

I have a number, -3, that I would like to use as a 16 bit integer and use an | (or) operator with a 32 bit integer.
So ideally, I need it so that -3, in binary two's complement, will be 0000 0000 0000 0000 1111 1111 1111 1101
I've tried bit wise operation using
int x = -3;
x = x << 16;
x = x >> 16;
I thought this would ideally put 0's in the bits 31-16 (0 indexed), but it doesn't seem to work. Any help to achieve this would be appreciated.
Two ways to get the two’s complement of 3 in 16 bits, with higher bits clear, are:
Include <stdint.h> and use (uint16_t) -3.
Use -3u & 0xffffu.
Shifting a negative value to the left is undefined behavior.
int x = -3;
x = x << 16; // UB!
Shifting a negative value to the right is implementation-defined behavior.
Best to use unsigned type (or small postive values) when shifting.
to perform bitwise operation on a negative number to make 0s in bits 31-16 (32 bit number) i
Use & to mask and retain only the least significant 16 bits.
int32_t x = -3;
x &= 0xFFFF;

Analysing program bitshifting

I'll start with the code rightaway:
#include <stdio.h>
int main()
{
unsigned char value = 0xAF;
printf("%02x\n", value);
value = (value << 4) | (value >> 4);
printf("%02x\n", value);
return 0;
}
Firstly I thought you can't store numbers in chars and that you would need to make that an int. Appearently not. Then, if I did the bitshifting mats:
value << 4 = 101011110
value >> 4 = 1010111
101011110
| 1010111
=101011111
and that would be 0x15f.
If I compile that code it prints
af
fa
Can anyone explain to me where I'm thinking wrong?
Bit shifting 4 shifts 4 binary digits, not 2 as you seem to be showing. It also shifts 1 hex digit. So if you have 0xAF, shifting left 4 gives you 0xF0. Because it is a char, it only has 8 bits and the A is cut off. Shifting right 4 similarly yields 0xA. 0x0A | 0xF0 == 0xFA.
Start with the baseline, 0xaf is 1010-111116 (and we're assuming an eight-bit char here based on the code though it's not mandated by the standard).
The expression value << 4 will left-shift that by four bits (not one as you seem to think), giving 1010-1111-000016 and, yes, it's more than an eight-bit char because of integer promotions (both operands of a << expression are promoted to int as per ISO C11 6.5.7 and also in earlier iterations of the standard).
The expression value >> 4 will right-shift it by four bits, giving 101016.
When you bitwise-or those together, you get:
1010-1111-0000
1010
==============
1010-1111-1010
and when you finally try to shoe-horn that back into the eight-bit value, it lops off the upper bits, giving 1111-101016, which is 0xFA.
You might have messed up the bit representations in your calculation.
Ok. I will try to explain according to the code you have provided.
value 0XAF = 10101111
value << 4 = 11110000
value >> 4 = 00001010
11110000
|00001010 = 11111010 and hence the 0XFA.
Explanation:
1. Representation is in binary 8 bit.
2. When you left/right shift by a number, I think you are considering it in terms of multiplication and division, but in 8-bit binary representation it just gets shifted by 4 places and the bits get replaced by 0.
Hope this helps.
because sizeof(unsigned char) is equal to 1.its a 8bit data.
the range of "value" is from 0x0 to 0xFF, that's the valid bit is from bit0 to bit7.
so when assign 0x15F to "value" after bitshifting, only the data from bit 0 to bit7 are assigned to variable "value", bit8 is cut off.
0x15f ---binarization---> 0001 0101 1111
variable "value" is a 8bit data, so, only 0101 1111 is assigned to it.
value ---binarization---> 0101 1111

Why does the following bitwise operation return an unintended result?

3 bits can hold up to a maximum number of 7 (4 + 2 + 1). I'm trying to calculate this using a bitwise operation.
3 is 0b011
~3 is 0b100
Doing a bitwise OR I would expect 0b111 (i.e. 7). Instead I get
int result = (~3) | 3;
printf("%i\n", result);
-1
What am I doing wrong?
You are doing everything right: N | ~N results in a number with binary representation consisting of all ones. Such number is interpreted as -1 in two's compliment representation of negative numbers.
How many bits wide is an int? You seem to think it's three bits wide. Certainly not correct! Guess again. What is ~0u? Try printf("%u\n", ~0u);. What about ~1u? ... and ~2u? Do you notice a pattern?
Note the u suffix, which tells the compiler that it's an unsigned literal. You can't work with signed integer types with the ~ operator... Well, you can, but you might run into trap representations and negative zeros, according to 6.2.6.2 of n1570.pdf. Using a trap representation is undefined behaviour. That might work on your system, but only by coincidence. Do you want to rely upon coincidence?
Similarly, I suggest using the %u directive to print unsigned values, as %d would produce undefined behaviour according to 7.21.6.1p29 of n1570.pdf.
When you do ~3 you are inverting the bits that make up 3 - so you turn 0000 0000 0000 0000 0000 0000 0000 0011 into 1111 1111 1111 1111 1111 1111 1111 1100. Since the high bit is set, this is interpreted as a negative number - all 1s is -1, one less than that is -2, one less -3 and so on. This number is the signed 32 bit integer for -4.
If you binary OR this with 3, you get all 1s (by definition) - which is the signed 32 bit integer for -1.
Your only problem is that you think you are working with 3 bit numbers, but you are actually working with 32 bit numbers.
After doing this in the code
int result = (~3) | 3;
Add this line
result= result & 0x07
This will give you the answer that you expect.
#include <stdio.h>
int main (){
unsigned d3 = 0b011;
unsigned invd3 = ~d3;
unsigned d4 = 0b100;
unsigned result = d3 | invd3;
printf("%X\n", result);//FFFFFFFF
result = d3 | d4;
printf("%X\n", result);//7
return 0;
}

How to create mask with least significat bits set to 1 in C

Can someone please explain this function to me?
A mask with the least significant n bits set to 1.
Ex:
n = 6 --> 0x2F, n = 17 --> 0x1FFFF // I don't get these at all, especially how n = 6 --> 0x2F
Also, what is a mask?
The usual way is to take a 1, and shift it left n bits. That will give you something like: 00100000. Then subtract one from that, which will clear the bit that's set, and set all the less significant bits, so in this case we'd get: 00011111.
A mask is normally used with bitwise operations, especially and. You'd use the mask above to get the 5 least significant bits by themselves, isolated from anything else that might be present. This is especially common when dealing with hardware that will often have a single hardware register containing bits representing a number of entirely separate, unrelated quantities and/or flags.
A mask is a common term for an integer value that is bit-wise ANDed, ORed, XORed, etc with another integer value.
For example, if you want to extract the 8 least significant digits of an int variable, you do variable & 0xFF. 0xFF is a mask.
Likewise if you want to set bits 0 and 8, you do variable | 0x101, where 0x101 is a mask.
Or if you want to invert the same bits, you do variable ^ 0x101, where 0x101 is a mask.
To generate a mask for your case you should exploit the simple mathematical fact that if you add 1 to your mask (the mask having all its least significant bits set to 1 and the rest to 0), you get a value that is a power of 2.
So, if you generate the closest power of 2, then you can subtract 1 from it to get the mask.
Positive powers of 2 are easily generated with the left shift << operator in C.
Hence, 1 << n yields 2n. In binary it's 10...0 with n 0s.
(1 << n) - 1 will produce a mask with n lowest bits set to 1.
Now, you need to watch out for overflows in left shifts. In C (and in C++) you can't legally shift a variable left by as many bit positions as the variable has, so if ints are 32-bit, 1<<32 results in undefined behavior. Signed integer overflows should also be avoided, so you should use unsigned values, e.g. 1u << 31.
For both correctness and performance, the best way to accomplish this has changed since this question was asked back in 2012 due to the advent of BMI instructions in modern x86 processors, specifically BLSMSK.
Here's a good way of approaching this problem, while retaining backwards compatibility with older processors.
This method is correct, whereas the current top answers produce undefined behavior in edge cases.
Clang and GCC, when allowed to optimize using BMI instructions, will condense gen_mask() to just two ops. With supporting hardware, be sure to add compiler flags for BMI instructions:
-mbmi -mbmi2
#include <inttypes.h>
#include <stdio.h>
uint64_t gen_mask(const uint_fast8_t msb) {
const uint64_t src = (uint64_t)1 << msb;
return (src - 1) ^ src;
}
int main() {
uint_fast8_t msb;
for (msb = 0; msb < 64; ++msb) {
printf("%016" PRIx64 "\n", gen_mask(msb));
}
return 0;
}
First, for those who only want the code to create the mask:
uint64_t bits = 6;
uint64_t mask = ((uint64_t)1 << bits) - 1;
# Results in 0b111111 (or 0x03F)
Thanks to #Benni who asked about using bits = 64. If you need the code to support this value as well, you can use:
uint64_t bits = 6;
uint64_t mask = (bits < 64)
? ((uint64_t)1 << bits) - 1
: (uint64_t)0 - 1
For those who want to know what a mask is:
A mask is usually a name for value that we use to manipulate other values using bitwise operations such as AND, OR, XOR, etc.
Short masks are usually represented in binary, where we can explicitly see all the bits that are set to 1.
Longer masks are usually represented in hexadecimal, that is really easy to read once you get a hold of it.
You can read more about bitwise operations in C here.
I believe your first example should be 0x3f.
0x3f is hexadecimal notation for the number 63 which is 111111 in binary, so that last 6 bits (the least significant 6 bits) are set to 1.
The following little C program will calculate the correct mask:
#include <stdarg.h>
#include <stdio.h>
int mask_for_n_bits(int n)
{
int mask = 0;
for (int i = 0; i < n; ++i)
mask |= 1 << i;
return mask;
}
int main (int argc, char const *argv[])
{
printf("6: 0x%x\n17: 0x%x\n", mask_for_n_bits(6), mask_for_n_bits(17));
return 0;
}
0x2F is 0010 1111 in binary - this should be 0x3f, which is 0011 1111 in binary and which has the 6 least-significant bits set.
Similarly, 0x1FFFF is 0001 1111 1111 1111 1111 in binary, which has the 17 least-significant bits set.
A "mask" is a value that is intended to be combined with another value using a bitwise operator like &, | or ^ to individually set, unset, flip or leave unchanged the bits in that other value.
For example, if you combine the mask 0x2F with some value n using the & operator, the result will have zeroes in all but the 6 least significant bits, and those 6 bits will be copied unchanged from the value n.
In the case of an & mask, a binary 0 in the mask means "unconditionally set the result bit to 0" and a 1 means "set the result bit to the input value bit". For an | mask, an 0 in the mask sets the result bit to the input bit and a 1 unconditionally sets the result bit to 1, and for an ^ mask, an 0 sets the result bit to the input bit and a 1 sets the result bit to the complement of the input bit.

Resources