Why doesn't my bit minipulation construct work in C - c

What I'm trying to do is make a mask with a 1 bit all the way to the left side of the set of bits with the rest being zero, irrespective of variable size. I tried the following:
unsigned char x = ~(~0 >> 1);
which, to me, should work whether it's done on a char or an int, but it doesn't!
To me, the manipulation looks like this:
||||||||
0|||||||
|0000000
This is what it appears it should look like, and on a 16-bit integer:
|||||||| ||||||||
0||||||| ||||||||
|0000000 00000000
Why doesn't this construct work? It's giving me zero whether I try to assign it to an unsigned char, or an int.
I'm on like 50 page of K&R, so I'm pretty new. I don't know what a literal means, I'm not sure what an "arithmetic" shift is, I don't know how to use suffix', and I damn sure can't use a structure.

~0 is the int zero with all bits inverted, which is the int consisting of all ones. On a 2s complement machine, this is a -1. Right shifting a -1 will cause sign extension, so ~0 >> 1 is still all ones.
What you want is to right shift an unsigned quantity, which will not invoke sign extension.
~0u >> 1
is an unsigned integer with the high order bit zero and all others set to 1, so
~(0u >> 1)
is an unsigned integer with the high order bit of one and all others set to zero.
Now getting this to work for all data sizes is nontrivial because C converts the operands of integer arithmetic to int or unsigned int beforehand. For example,
~(unsigned char)0 >> 1
produces an int result of -1 because the unsigned char is "promoted" to int before the ~ is applied.
So to get what you want with all data types, the only way I can see is to use sizeof to see how many bytes (or octets) are in the data.
#include <stdio.h>
#include <limits.h>
#define LEADING_ONE(X) (1 << (CHAR_BIT * sizeof(X) - 1))
int main(void) {
printf("%x\n", LEADING_ONE(char));
printf("%x\n", LEADING_ONE(int));
return 0;
}

The general rule for C is that expressions are evaluated in a common type, in this case (signed) integer. The evaluation of (~0) and (~0 >> 1) are signed integers and the shift is an arithmetic shift. In your case that is being implemented with sign extension, so:
(0xffffffff >> 1) => (0xffffffff)
A logical shift will inject the zero on the left that you were expecting, so your problem is how to make the compiler do a logical shift. Try:
unsigned char a = ~0;
unsigned char b = a >> 1; // this should do a logical shift
unsigned char c = ~b;
There are better ways to do what you are trying, but this should get you over the current problem.

There are two things that are giving you the unexpected result.
You are starting out with 0, which is treated as a signed int.
The intermediate results get converted to int.
If you work with unsigned char at strategic points, you should be OK.
unsigned char c = ((unsigned char)~0 >> 1);
c = ~c;

Related

How to find the most significant bit of a signed integer in C

I need to find the most significant bit of signed int N and save it in signBitN. I want to do this using bitwise only operations.
Also, how would I make signBitN extend so that all its bits are equal to its significant bit.
i.e. if it's significant bit was zero, how would I extend that to be 00000...00?
The closest I've gotten is signBitN=1&(N>>(sizeof(int)-1));
Portable expression:
1 & (x >> (CHAR_BIT * sizeof(int) - 1))
Latest C standards put 3 standards on representation of ints.
sign and magnitude
one complement
two complement
See section 6.2.6.2 Integer types of C11 standard.
Only the third option is relevant in practice for modern machines.
As specified in 6.2.6.1:
Values stored in non-bit-field objects of any other object type
consist of n x CHAR_BIT bits, where n is the size of an object of that type,
in bytes.
Therefore int will consist of sizeof(int) * CHAR_BIT bits, likely 32.
Thus the highest bit of int can be read by shifting right by sizeof(int) * CHAR_BIT - 1 bits and reading the last bit with bitwise & operator.
Note that the exact value of the int after the shift is implementation defined as stated in 6.5.7.5.
On sane machines it would be:
int y = x < 0 ? -1 : 0;
The portable way would be casting between int and an array of unsigned char and setting all bytes to -1.
See 6.3.1.3.2:
if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum
value that can be represented in the new type until the value
is in the range of the new type.
And 6.2.6.1.2
Values stored in unsigned bit-fields and objects of type
unsigned char shall be represented using a pure binary notation.
You can use memset() for that.
int x;
memset(&x, (x < 0 ? -1 : 0), sizeof x);
If the question is how to check the MSB bit of the integer (for example 31st bit of the 32 bit integer) ten IMO this is portable.
#define MSB(i) ((i) & (((~0U) >> 1) ^ (~0U)))
#define issetMSB(i) (!!MSB(i))
int main(void)
{
printf("%x\n", MSB(-1));
printf("%x\n", issetMSB(-1));
}

What happens exactly with ~(char)((unsigned char) ~0 >> 1)?

I don't understand what happens in this combination of unary operators exactly. I know that when you type it in, it will end up producing the smallest signed char value, what I don't understand is HOW exactly.
What I think the solution is
========================================================================
~ is a unary operator that effectively means the same as the logic operand 'NOT' right?
So, Not char means what? Everything in char is reduced to 0?
Is the char not being cast to the unsigned char?
then we cast the char to unsigned but everything that is not a 0 is moved over by 2^1 since >> 1 is the same thing as doing 2^1, right?
========================================================================
#include <stdio.h>
int main(){
printf("signed char min = %d\n", ~(char)((unsigned char) ~0 >> 1));
return 0;
}
It produces the smallest signed char, which works, but I just want to know what's happening under the hood because it's not completely clear.
~ is not the logical NOT, it is the bitwise NOT, so it flips every bit independently.
~0 is all 1 bits, casting it to unsigned char and shifting once to the right makes the first bit 0. Casting to signed char and applying bitwise NOT makes the first bit 1 and the rest 0, which is the minimum value of a two's complement integer (this assumes that two's complement is used here, which isn't guaranteed by the standard).
The casts are needed to ensure that the shift will fill in the first bit with a 0 since on signed integers it is possible that an arithmetic shift is used which fills in the leading bits with the sign bit of the number (1 in this case).
This code looked familiar, so I searched my ancient mail and found this macro:
#define MSB(type) (~(((unsigned type)-1)>>1))
in the .signature of Mark S. Brader.
It returns a value of the specified integer type whose bits are all zero except for the Most Significant Bit.
(unsigned char)-1 produces #FF
(#FF)>>1 produces #7F
~(#7F) produces #80
It so happens that #80 happens to be the smallest negative value for the given type (though in theory C isn't required to use 2s complement to store negative integers).
EDIT:
The original MSB(type) was intended to produce a value that would be assigned to a variable of type "type".
As #chux points out, if it's used in other contexts, it could extend extra bits to the left.
A more correct version is:
#define MSB(type) ((unsigned type)~(((unsigned type)-1)>>1))

Is there a generic "isolate a single byte" bit mask for all systems, irrespective of CHAR_BIT?

If CHAR_BIT == 8 on your target system (most cases), it's very easy to mask out a single byte:
unsigned char lsb = foo & 0xFF;
However, there are a few systems and C implementations out there where CHAR_BIT is neither 8 nor a multiple thereof. Since the C standard only mandates a minimum range for char values, there is no guarantee that masking with 0xFF will isolate an entire byte for you.
I've searched around trying to find information about a generic "byte mask", but so far haven't found anything.
There is always the O(n) solution:
unsigned char mask = 1;
size_t i;
for (i = 0; i < CHAR_BIT; i++)
{
mask |= (mask << i);
}
However, I'm wondering if there is any O(1) macro or line of code somewhere that can accomplish this, given how important this task is in many system-level programming scenarios.
The easiest way to extract an unsigned char from an integer value is simply to cast it to unsigned char:
(unsigned char) SomeInteger
Per C 2018 6.3.1.3 2, the result is the remainder of SomeInteger modulo UCHAR_MAX+1. (This is a non-negative remainder; it is always adjusted to be greater than or equal to zero and less than UCHAR_MAX+1.)
Assigning to an unsigned char has the same effect, as assignment performs a conversion (and initializing works too):
unsigned char x;
…
x = SomeInteger;
If you want an explicit bit mask, UCHAR_MAX is such a mask. This is so because unsigned integers are pure binary in C, and the maximum value of an unsigned integer has all value bits set. (Unsigned integers in general may also have padding bit, but unsigned char may not.)
One difference can occur in very old or esoteric systems: If a signed integer is represented with sign-and-magnitude or one’s complement instead of today’s ubiquitous two’s complement, then the results of extracting an unsigned char from a negative value will differ depending on whether you use the conversion method or the bit-mask method.
On review (after accept) , #Eric Postpischil answer's part about UCHAR_MAX makes for a preferable mask.
#define BYTE_MASK UCHAR_MAX
The value UCHAR_MAX shall equal 2CHAR_BIT − 1. C11dr §5.2.4.2.1 2
As unsigned char cannot have padding. So UCHAR_MAX is always the all bits set pattern in a character type and hence in a C "byte".
some_signed & some_unsigned is a problem on non-2's complement as the some_signed is convert to unsigned before the & thus changing the bit pattern on negative vales. To avoid, the all ones mask needs to be signed when masking signed types. The is usually the case with foo & UINT_MAX
Conclusion
Assume: foo is of some integer type.
If only 2's complement is of concern, use a cast - it does not change the bit pattern.
unsigned char lsb = (unsigned char) foo;
Otherwise with any integer encoding and CHAR_MAX <= INT_MAX
unsigned char lsb = foo & UCHAR_MAX;
Otherwise TBD
Shifting an unsigned 1 by CHAR_BIT and then subtracting 1 will work even on esoteric non-2's complement systems. #Some programmer dude. Be sure to use unsigned math.
On such systems, this preserves the bit patten unlike (unsigned char) cast on negative integers.
unsigned char mask = (1u << CHAR_BIT) - 1u;
unsigned char lsb = foo & mask;
Or make a define
#define BYTE_MASK ((1u << CHAR_BIT) - 1u)
unsigned char lsb = foo & BYTE_MASK;
To also handle those pesky cases where UINT_MAX == UCHAR_MAX where 1u << CHAR_BIT would be UB, shift in 2 steps.
#define BYTE_MASK (((1u << (CHAR_BIT - 1)) << 1u) - 1u)
UCHAR_MAX does not have to be equal to (1U << CHAR_BIT) - 1U
you need actually to and with that calculated value not with the UCHAR_MAX
value & ((1U << CHAR_BIT) - 1U).
Many real implementations (for example TI) define UCHAR_MAX as 255 and emit the code which behaves like the one on the machines having 8 bits bytes. It is done to preserve compatibility with the code written for other targets.
For example
unsigned char x;
x++;
will generate the code which checks in the value of x is larger than UCHAR_MAX and if it the truth zeroing the 'x'

Bitshift on assignment has no effect on the variable

I thought I'd found something similar in this answer but in that case they weren't assigning the result of the expression to the variable. In my case I am assigning it but the bitshift part of the expression has no effect.
unsigned leftmost1 = ((~0)>>20);
printf("leftmost1 %u\n", leftmost1);
Returns
leftmost1 4294967295
Whereas
unsigned leftmost1 = ~0;
leftmost1 = leftmost1 >> 20;
printf("leftmost1 %u\n", leftmost1);
Gives me
leftmost1 4095
I would expect separating the logic into two lines would have no impact, why are the results different?
In the first case, you are doing a signed right shift, because ~0 results in a signed value. The exact behavior of signed right shifts is implementation-defined, but most platforms, including yours, extend the sign bit, so the shift is a no-op for your input of "all ones".
In the second case, you are doing an unsigned right shift, since leftmost1 is an unsigned value. So you shift in zeros from the left.
If you wanted to do an unsigned shift without the intermediate assignmetn, you can do:
(~0u) >> 20
Where the u suffix indicates an unsigned literal.
~0 is an int. So your first piece of code isn't equivalent to the second, it's equivalent to
int tmp = ~0;
tmp = tmp >> 20;
unsigned leftmost1 = tmp;
You're seeing the results of sign extension when you right-shift a negative number.
0 has type int. ~0 is -1 on a typical two's complement machine. Right-shifting a negative number has implementation-defined results, but a common choice is to shift in 1 bits, which for -1 leaves the number unchanged (i.e. -1 >> anything is -1 again).
You can fix this by writing 0u (which is a literal of type unsigned int). This forces the operations to be done in unsigned int, as in your second example:
unsigned leftmost1 = ~0;
This line is equivalent to unsigned leftmost1 = -1, which implicitly converts -1 (a signed int) to UINT_MAX. The following operation (leftmost1 >> 20) then uses unsigned arithmetic.
Try casting like this. ~0 is promoted to int which is signed so it's carrying the sign bit when you shift
unsigned leftmost1 = ((unsigned)(~0)>>20);
printf("leftmost1 %u\n", leftmost1);

What is the most portable way to read and write the highest bit of an integer in C?

What is the most portable way to read and write the highest bit of an integer in C?
This is a Bloomberg interview question. I didn’t give best answer at that time. Can anyone please answer it?
If the type is unsigned, it's easy:
(type)-1-(type)-1/2
For signed values, I know no way. If you find a way, it would answer several unanswered questions on SO:
C question: off_t (and other signed integer types) minimum and maximum values
Is there any way to compute the width of an integer type at compile-time?
Maybe others.
First, note that there's no portable way to access the top bit if we're talking about signed integers; there's simply no single portable representation defined in the standard, so the meaning of 'top bit' can in principle vary. Additionally, C does not allow direct access to the bitwise representation; you can access the int as a char buffer, but you have no idea where the 'top bit' is located.
If we're only concerned with the non-negative range of a signed integer, and assuming said range has a size that is a power of two (if not, then we need to care about the signed representation again):
#define INT_MAX_BIT (INT_MAX - (INT_MAX >> 1))
#define SET_MAX_BIT(x) (x | INT_MAX_BIT)
#define CLEAR_MAX_BIT(x) (x & ~INT_MAX_BIT)
A similar approach can be used with unsigned ints, where it can be used to get the true top bit.
Here's a silly one, using:
Built-in Function: int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in x, starting at the most
significant bit position. If x is 0, the result is undefined.
First attempt:
int get_msb(int x) { return x ? __buildin_clz(x) == 0 : 0; }
Note: it's a quirk of C that functions specifying int or unsigned int parameters can be called with the other type without warning. But, this probably involves a conversion - the C++ Standard 4.7.2 says:
If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [Note: In a two's complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). ]
Which implies that the bit pattern may be changed if it's not a two's complement representation, which would stop this "solution" working reliably too. :-(
Chris's comment below provides a solution (incorporated here as a function rather than preprocessor macro):
int get_msb(int x) { return x ? __buildin_clz(*(unsigned*)&x) == 0 : 0; }
What's wrong with this one?
int get_msb(int n){
return ((unsigned)n) >> (sizeof(unsigned) * CHAR_BIT - 1);
// or, optionally
return n < 0;
};
int set_msb(int n, int msb){
if (msb)
return ((unsigned)n) | (1ULL << (sizeof(unsigned) * CHAR_BIT - 1));
else return ((unsigned)n) & ~(1ULL << (sizeof(unsigned) * CHAR_BIT - 1));
};
It takes care of endianness, number of bits in a byte, and works also on 1's complement.
#define HIGH_BIT(inttype) (((inttype)1) << (CHAR_BIT * sizeof(inttype) - 1))
example usage:
ptrdiff_t i = 4711;
i |= HIGH_BIT(ptrdiff_t); /* set high bit */
i &= ~HIGH_BIT(ptrdiff_t); /* clear high bit */

Resources