I know there is another post here that is how to clear a single bit, but how about a whole byte?
For example, if I had 00000000 00000000 00101100 11000100 and I wanted to clear the second chunk so it was now 00000000 00000000 00000000 11000100, how would I go about doing that? I know I need to use bitwise &.
To clear specific bits within a memory area you can use the bitwise And operator, &, along with a mask. To create the mask, rather than thinking of which bits you want to clear, you should instead think of which bits you want to keep and create the mask using those bits.
The mask should be the same type of variable as the type of variable stored in the memory area. So if you are wanting to clear bits in a long variable then you would create a mask in another long variable which indicates the bits you want to keep.
Assume you have a variable of type unsigned long which contains the bit pattern 00000000 00000000 00101100 11000100 and you want to clear the bits in the second byte from the right, the eight bits of 00101100 while keeping the other bits.
First you create a mask that has the bits you want to keep as in 11111111 11111111 00000000 11111111 or 0xffff00ff. If you want to clear all other bits except for the least significant byte then you could use 0xff for your mask.
Next you use the bitwise And operator & to and the variable with the mask. Many times you would use the &= operator with the variable containing the value with bits to clear on one side of the operator and the mask indicating which bits to keep on the other.
unsigned long x1 = 0x2cc4; // variable to mask.
unsigned long mask = 0xffff00ff; // mask to zero all bits in the second byte from the right
x1 &= mask; // keep only the bits we want to keep
Or you could hard code the mask as in:
unsigned long x1 = 0x2cc4;
x1 &= 0xffff00ff; // keep only the bits we want to keep
Or if you don't want to modify the original variable something like:
unsigned long x1 = 0x2cc4; // variable to mask.
unsigned long x2 = 0; // variable for new, masked value of x1
unsigned long mask = 0xffff00ff; // mask to zero all bits in the second byte from the right
x2 = x1 & mask; // keep only the bits we want to keep and put into a new variable
The bitwise And operator does an And operation on each bit and the resulting output bit for each bit is calculated using an And truth table. 0 And either 0 or 1 results in a 0 and only if both bits are 1 will a 1 result.
0 | 1
-----------
0 | 0 | 0
1 | 0 | 1
As a side note you can also set the bits you want to clear in the mask and then use the bitwise Not operator, ~, and turn the mask from a mask of bits you want to clear to a mask of bits you want to keep. However this is not the way that it is usually done so may be confusing to other programmers reading the code.
unsigned long x1 = 0x2cc4;
unsigned long mask = ~0xff00; // mask to zero all bits in the second byte from the right
x1 &= mask; // And to keep only the bits we want to keep
Note also that the C Standard is a bit loose about how many bits are actually included in the most commonly used variable types such as int, short, long, etc. There are guarantees about the minimum size but not necessarily the maximum size. This was done in order to provide backward compatibility as well as to allow portability of source code and flexibility for compilers targeting specific hardware platforms. However it means that if you depend on the implementation of int as 64 bit in source code that is then moved to 32 bit hardware you may be surprised by the behavior.
The header file stdint.h specifies a number of exact width integer types such as int8_t and others. See stdint.h — Integer types however not all compiler vendors will provide support though most up to date compilers do.
As an aside, using hard coded numeric values such as 0xff for a bit mask that is intended to be used in several places and has a specific meaning (e.g. buffer size field) is generally frowned upon. The normal practice is to use a #define macro with a descriptive label, usually within an include file if the constant is needed in multiple source files, to name the bit mask in C programs. Creating a defined constant allows there to be a single point of definition for the bit mask constant thereby providing consistency of use, it will be right everywhere or wrong everywhere. Doing this also provides a unique, searchable identifier for the bit mask constant should you need to find where it is being used. In C++ a const variable such as const unsigned long BitMaskA = 0x00ff; is normally used instead of a #define as such a variable defaults to internal linkage in C++, though it defaults to external linkage in C, and not depending on the Preprocessor is encouraged for C++ (see https://stackoverflow.com/a/12043198/1466970)
Note as well that using bitfield variable types is a different mechanism than using the bitwise operators with masks. For instance see When to use bit-fields in C? as well as this answer which talks to the portability problem https://stackoverflow.com/a/54054552/1466970
Suppose you want to clear the bits from s(start) to e(end) in the bit vector BV. And total number of bits in a number is b. Proceed as follows:
mask = ~0;
This will set all the bits. Something like 11111111
mask = mask >> (b - (e - s + 1));
This will give something like 00000111, with number of set bits equal to number of bits you want to clear.
mask = mask << s;
This will move the set bits to the desired positions. Something like 00111000
mask = ~mask;
This will give 11000111
Now perform & with the bit vector
B &= mask;
This will clear the bits from s to e.
Related
I am tasked with making a function that returns whether or not an int x fits into a short in C (return 1 if it does, 0 otherwise). This would normally be a fairly simple solution, but I'm constrained to using only the following bitwise operators:! ~ & ^ | + << >>.
Here are some more rules:
I am only allowed to use a maximum of 8 of these operators.
No external libraries are to be included.
I'm not allowed to use any conditional statements (so no ifs, whiles,
etc).
The only variables I can work with are those of type int and I cannot
define constants like 0x0000ffff. However, I can use 0x0and 0xff.
It is safe to assume ints are 32 bits and shorts are 16 bits.
I understand the basic functionalities of these operators, but am confused on the logic of implementation. Any ideas?
Supposing two’s complement, arithmetic right-shift, left-shift that discards overflowing bits, and 32-bit int, then:
x<<16>>16 ^ x
is zero if and only if x fits in a 16-bit short.
Since we are asked to return zero for does-not-fit and one for does-fit, we can return:
! (x<<16>>16 ^ x)
Proof:
If x fits in a short, then x•216 fits in an int, so x<<16 produces that with no overflow, and x<<16>>16 restores x (since arithmetic shift right that only removes zeros is effectively a division with no remainder), after which x ^ x is zero.
If x exceeds a short, then x<<16 overflows (which we assume results in discarding the high bits). Furthermore, its value is one of the values produced by y<<16 for some y that is a short. Then x<<16>>16 must produce the same value as y<<16>>16, which is y, which differs from x, and therefore x<<16>>16 ^ x cannot be zero.
Assuming 2 complements, your original idea to test that all leading but 15 bits are all 0 or all 1 does work.
11111111 11111111 1xxxxxxx xxxxxxxx /* for negative 16 bits int */
00000000 00000000 0xxxxxxx xxxxxxxx /* for positive 16 bits int */
Assuming arithmetic right shift, you can shift right by 15 to eliminate the unknown bits, then only 1s or 0s will remain (if it fits on 15 bits)
Thus x>>15 should then be either 0 or -1.
If it is 0, then !(x>>15) is true.
If it is -1, then !(~(x>>15)) is true.
You can thus test !(x>>15) | !(~(x>>15))
Or you can also write it like that: !(x>>15) | !(~x>>15)
Note that above expressions don't assume 32bits x, and would also work for testing if an int64 would fit into an int16...
There are many other ways...
Since you can also use +, (x>>15)+1 is 0 or 1.
You can then clear the last bit with: ((x>>15)+1)>>1.
x does fit in 16bits int if above expression is all 0 (false), thus you want:
! (((x>>15)+1)>>1)
Up to you to find more expressions...
I am writing code that may run on architectures of different word size (32-bit, 64-bit, etc) and I want to clear the low byte of a value. There is a macro (MAX) that is set to the maximum value of a word. So, for example, on a 32-bit system MAX = 0xFFFFFFFF and on a 64-bit system MAX = 0xFFFFFFFFFFFFFFFF for unsigned values. If I have a word-sized variable that may be signed or unsigned, how can I clear the low byte of the variable with a single expression (no branching)?
My first idea was:
value & ~( MAX - 0xFF )
but this does not appear to work for signed values. My other thought was:
value = value - (value & 0xFF)
which has the disadvantage that it requires a stack operation.
To clear low byte, when not knowing the integer type width can result in incorrect code. So code should be careful.
Consider the below where value is wider than int/unsigned. 0xFF is an int constant with the value 255. ~0xFF is then that value with its bit inverted. With common 2's complemented, that would be -256 with its upper bits set as FF...FF00. -256 converted to a wider signed type retains its value and pattern FF...FF00. -256 converted to a wider unsigned type becomes Uxxx_MAX + 1 - 256, agian with the bit pattern FF...FF00. In both cases, the & will retain the uppers bits and clear the lower 8.
value_low_8bits_cleared = value & ~0xFF;
An alternative is to do all masking operation with unsigned math to avoid unexpected properties of int math and int encodings.
The below has no concerns about sign extension, int overflow. An optimizing compiler will certainly emit efficient code with a simply and mask. Further, there is no need to code the correct matching max value corresponding to value.
value_low_8bits_cleared = (value | 0xFFu) ^ 0xFFu;
here is the easy way to clear the low order 8 bits:
value &= ~0xFF;
I am writing code that may run on architectures of different word size
(32-bit, 64-bit, etc) and I want to clear the low byte of a value.
There is a macro (MAX) that is set to the maximum value of a word. So,
for example, on a 32-bit system MAX = 0xFFFFFFFF and on a 64-bit
system MAX = 0xFFFFFFFFFFFFFFFF for unsigned values.
Although C is designed so that implementations can take machine word size into account, the language itself has no inherent sense of machine words. C cares instead about types, and that makes a difference.
Anyway, I take you exactly at your word that you arrange for the replacement text of macro MAX to be one of the two alternatives you give, depending on the architecture of the machine. Note well that when that replacement text is interpreted as an integer constant, its type may vary between C implementations, and maybe even depending on compiler options.
If I have a
word-sized variable that may be signed or unsigned, how can I clear
the low byte of the variable with a single expression (no branching)?
The only reason I see for needing a single expression that cannot take the actual type of value explicitly into account is that you want to use the expression in a macro itself. In that case, you need to take great care around type conversions, especially when you have to account for signed types. This makes your MAX macro uncomfortable to work with for your purpose.
I'm inclined to suggest a different approach:
(value | 0xFF) ^ 0xFF
The constant 0xFF will be interpreted as a (signed) int with a positive value. Provided that value's type is not smaller than int, both appearances of 0xFF will be converted to that type without change in value, whether that type is signed or unsigned. Furthermore, the result of each operation and of the overall expression then has the same type as value, so no unexpected conversions occur.
How about
value & ~((intptr_t)0xFF)
First you want a mask that has all bits on, but those of the lower order byte
MAX ^ 0xFF
This converts 0xFF to the same type as MAX and then does the exclusive or with that value. Because MAX has all low order bits 1 these then become 0 and the high order bits stay as they are, that is 1.
Then you have to pull that mask over the value that interests you
value & ( MAX ^ 0xFF )
I have some C code that has variables which are either ints, or cast to int for a period of time for easy use (what we care about is the bit value). Int will always be 32 bit in this case. At one point some of them are assigned to a 64 bit variable, some implicitly and some explicitly:
long long 64bitfoo = 32bitbar;
long long 64bitfoo = (long long)32bitbar;
This has not been a problem in the past, but recently I ran into a case where after this conversion the top 32 bits of the 64 bit variable are not 0. It seems that some specific version of events can more or less populate the top bits with garbage (or just choose a previously used memory location and not clear it out correctly). This won't do, so I'm looking at solutions.
I can obviosuly do something like this:
long long 64bitfoo = 32bitbar;
64bitfoo &= ~0xFFFFFFFF00000000;
to clear out the top bits, and this should work for what I need, but I feel like there are better options. So far this has only shown up on values that use the implicit casting, so I'm curious if there is a difference between implicit and explicit casting that would allow explicit casting to handle this itself?(unfortunately I currently can't just add the explicit casting and do a test, the conditions to trigger this are complex and not easily replicated, so code changes need to be pretty firm and not guesses).
I'm sure there might be other options as well, doing something instead of just using = to set the value, a different way to clear the top 32 bits that is better, or some way of setting the initial 64 bit value to guarantee the top bits stay clear if only the bottom bits are set (the 64 bit variable sometimes gets other 64 bit variables assigned to it, so it can't have the top bits forced to 0 at all times). Wasn't finding a lot when searching, this doesn't seem to be something that comes up much.
edit: I forgot to mention that there is instances where it being signed doesn't seem like the problem. One example is the initial value was 0xF8452370, then the long long value was shown as -558965697074093200, which is 0xF83E27A8F8452370. So the bottom 32 bits are the same, but the top 32 bits are not just 1's, but a scattering of 1's and 0's. As far as I understand, there's no reason signed vs unsigned would do this (all 1's sure), but I could definitely be mistaken.
Also, the 64 bit variable I think needs to be signed, as at other instances it takes in values that need to be either negative or positive (actual integers) vs in these instances where it just needs to keep track of the bit values. It is a very multi-use variable and I do not have the ability to make it not multi-use.
edit2: Its very possible I am asking the wrong question here, trying to keep an eye on that. But I am working within restrictions, so the actual problem might be something else, and I might just be stuck adding a bandaid for now. The quick rundown is this:
There is a 64 bit variable that is long long (or __int64 on certain systems, but in the instances I am running into it should always be long long). I can not change what it is, or make it unsigned.
I have a function returning a 32 bit memory address. I need to assign that 32 bit memory address (not as a pointer, but as the actual value of the memory location) to this 64 bit variable.
In these cases I need the top 32 bits of the 64 bit variable to be 0, and the bottom 32 bits to be the same as the original value. Sometimes they are not 0, but they aren't always 1.
Because I can't change the 64 bit variable to unsigned I think my best option, with what I have, is to manually clear the top 32 bits, and am looking for the best way to do that.
You're running into sign extension -- casting a negative signed value to a larger type will "extend" the sign bit of the original value to all the upper bits of the new type, so that the numeric value is preserved. For instance, (int8_t) 0xFC = -4 converts to (int16_t) 0xFFFC = -4. The extra bits aren't "garbage"; they have a very specific purpose and meaning.
If you want to avoid this, cast through an unsigned type. For example:
long long sixtyfourbits = (unsigned int) thirtytwobits;
As a side point, I'd advise that you use the <stdint.h> integer types throughout your code if you care about their size -- for instance, use int64_t instead of long long, and uint32_t instead of unsigned int. The names will more clearly indicate your intent, and there are some platforms which use different sizes for standard C types. (For instance, AVR microcontrollers use a 16-bit int.)
what we care about is the bit value
Then you should stay away from signed types and always use unsigned.
When a signed (or unsigned) type is converted to a bigger size of the same type, the value is preserved, i.e. 19 becomes 19 and -19 becomes -19.
But signed types doesn't always preserve the binary pattern by adding zeros in the front when going from a smaller type to a bigger type whereas unsigned types do.
For 2's complement (the most common representation of signed types), all negative values will be signed extended which simply means that ones are added in front instead of zero
SIGNED:
8 bit: -3 -> FD
16 bit: -3 -> FFFD
32 bit: -3 -> FFFFFFFD
64 bit: -3 -> FFFFFFFFFFFFFFFD
UNSIGNED:
8 bit: 253 -> FD
16 bit: 253 -> 00FD
32 bit: 253 -> 000000FD
64 bit: 253 -> 00000000000000FD
It seems that some specific version of events can more or less populate the top bits with garbage
No, either the new extra bits will be all zeros or they will be all ones.
If that isn't the case your system doesn't comply to the C standard.
'Union' is the nature way in C language for this question.
typedef union {
struct {
int low; // low 32 bits
int high; // high 32 bits
} e; // 32bits mimic x86 CPU eax
__int64 r; // 64bits mimic x86 CPU rax
} Union64;
Union64 data;
data.r = 0x1122334455667788;
Then data.e.high will be 0x11223344
and data.e.low will be 0x55667788
Vise versa
data.e.high = 0xaabbccdd;
data.e.low = 0x99eeff00;
Then data.r will be 0xaabbccdd99eeff00
In your case
data.r = 0; // guarantee data.e.high is cleared
data.e.low = 32bitbar; // say 0x11223344
Then data.r will be 0x0000000011223344;
This is exactly what union is for.
I am kinda new to bit operations. I am trying to store information in an int64_t variable like this:
int64_t u = 0;
for(i=0;i<44;i++)
u |= 1 << i;
for(;i<64;i++)
u |= 0 << i;
int t = __builtin_popcountl(u);
and what I intended with this was to store 44 1s in variable u and make sure that the remaining positions are all 0, so "t" returns 44. However, it always returns 64. With other variables, e.g. int32, it also fails. Why?
The type of an expression is generally determined by the expression itself, not by the context in which it appears.
Your variable u is of type int64_t (incidentally, uint64_t would be better since you're performing bitwise operations).
In this line:
u |= 1 << i;
since 1 is of type int, 1 << i is also of type int. If, as is typical, int is 32 bits, this has undefined behavior for larger values of i.
If you change this line to:
u |= (uint64_t)1 << i;
it should do what you want.
You could also change the 1 to 1ULL. That gives it a type of unsigned long long, which is guaranteed to be at least 64 bits but is not necessarily the same type as uint64_t.
__builtin_popcountl takes unsigned long as its paremeter, which is not always 64-bit integer. I personally use __builtin_popcountll, which takes long long. Looks like it's not the case for you
Integers have type 'int' by default, and by shifting int by anything greater or equal to 32 (to be precise, int's size in bits), you get undefined behavior. Correct usage: u |= 1LL << i; Here LL stands for long long.
Oring with zero does nothing. You can't just set bit to a particular value, you should either OR with mask (if you want to set some bits to 1s) or AND with mask's negation (if you want to set some bits to 0s), negation is done by tilda (~).
When you shift in the high bit of the 32-bit integer and and convert to 64-bit the sign bit will extend through the upper 32 bits; which you will then OR in setting all 64 bits, because your literal '1' is a signed 32 bit int by default. The shift will also not effect the upper 32 bits because the value is only 32 bit; however the conversion to 64-bit will when the the value being converted is negative.
This can be fixed by writing your first loop like this:
for(i=0;i<44;i++)
u |= (int64_t)1 << i;
Moreover, this loop does nothing since ORing with 0 will not alter the value:
for(;i<64;i++)
u |= 0 << i;
I'm not so good with bitwise operators so please excuse the question but how would I clear the lower 16 bits of a 32-bit integer in C/C++?
For example I have an integer: 0x12345678 and I want to make that: 0x12340000
To clear any particular set of bits, you can use bitwise AND with the complement of a number that has 1s in those places. In your case, since the number 0xFFFF has its lower 16 bits set, you can AND with its complement:
b &= ~0xFFFF; // Clear lower 16 bits.
If you wanted to set those bits, you could instead use a bitwise OR with a number that has those bits set:
b |= 0xFFFF; // Set lower 16 bits.
And, if you wanted to flip those bits, you could use a bitwise XOR with a number that has those bits set:
b ^= 0xFFFF; // Flip lower 16 bits.
Hope this helps!
To take another path you can try
x = ((x >> 16) << 16);
One way would be to bitwise AND it with 0xFFFF0000 e.g. value = value & 0xFFFF0000
Use an and (&) with a mask that is made of the top 16 bit all ones (that will leave the top bits as they are) and the bottom bits all zeros (that will kill the bottom bits of the number).
So it'll be
0x12345678 & 0xffff0000
If the size of the type isn't known and you want to mask out only the lower 16 bits you can also build the mask in another way: use a mask that would let pass only the lower 16 bits
0xffff
and invert it with the bitwise not (~), so it will become a mask that kills only the lower 16 bits:
0x12345678 & ~0xffff
int x = 0x12345678;
int mask = 0xffff0000;
x &= mask;
Assuming the value you want to clear bits from has an unsigned type not of "small rank", this is the safest, most portable way to clear the lower 16 bits:
b &= -0x10000;
The value -0x10000 will be promoted to the type of b (an unsigned type) by modular arithmetic, resulting in all high bits being set and the low 16 bits being zero.
Edit: Actually James' answer is the safest (broadest use cases) of all, but the way his answer and mine generalize to other similar problems is a bit different and mine may be more applicable in related problems.