Which is better, double negation or bitshift? - c

I need a boolean-type function which determines if a bit, in a variable's bit representation, is set or not.
So if the fourth bit of foo is what I want to inspect, I could make the function return
!!(foo & 0x8) //0x8 = 1000b
or
(foo & 0x8) >> 3
to get either 0 or 1.
Which one is more desirable in terms of performance or portability? I'm working on a small embedded system so a little but detectable cost difference still matters.

This solution
return (foo & 0x8) >> 3;
is the worst. If for example the magic constant 0x8 will be changed then you also need to change the magic constant 3. And moreover it can occur such a way that applying the operator >> will be impossible for example when you need to check more than one bit.
If you want to return either 1 (logical true) or 0 (logical false) I think that it will look more clear if to write
return (foo & 0x8) != 0;
or
return (foo & 0x8) == 0x8;
For example if instead of the magic constant 0x8 you will use a named constant (or variable) as for example MASK then this return statement
return ( foo & MASK ) == MASK;
will not depend on the value of MASK.
Pay attention to that these two return statements
return (foo & MASK) != 0;
and
return ( foo & MASK ) == MASK;
are not equivalent. The first return statement means that at least one bit is set in the variable foo while the second return statement means that exactly all bits corresponding to bits in MASK are set.
If the return type of the function is _Bool (or bool defined in <stdbool.h>) and you need to check whether at least one bit is set according to the bit mask then you can just write
return foo & MASK;

Which one is more desirable in terms of performance or portability?
For performance, it all depends on how your compiler optimizes the functions. For example, compiling these two functions:
#include <stdint.h>
uint32_t not_not(uint32_t foo)
{
return !!(foo & 0x8);
}
uint32_t bit_shift(uint32_t foo)
{
return (foo & 0x8) >> 3;
}
Compile to the exact same assembly with x64 GCC 11.1 -O3 (link):
not_not:
mov eax, edi
shr eax, 3
and eax, 1
ret
bit_shift:
mov eax, edi
shr eax, 3
and eax, 1
ret
So you should check the assembly generated by whatever compiler you're using at your preferred optimization level to see if any one of them is faster than the other.
As for portability, considering that in some case you may have to change the bitmask to some other value, !! might be the safer option since changing the bitmask won't force you to change the shift amount as well. You could also use the suggestions Vlad from Moscow suggested.

Both will generate a very similar code so performance-wise it is exactly (or almost exactly) the same.
Double negation was used by the programmers for many years and (IMO) it clearly indicates the programmer intentions. If you have the second shift you need to spend a bit more time reading the code. It is also more error-prone as humans make stupid mistakes (typos for example). Can you easily tell me if (x & 0x8000000000000) >> 52 is correct or not without consideration?
(x & 0x1000000000000) >> 48) is not as clear as !!(x & 0x1000000000000)

Performance-wise there won't be much difference if any. Readability-wise, you should use neither form.
... if the fourth bit of foo is what I want to inspect...
...then you should be using the fourth bit as input. Bit 3 being the fourth bit since bits are counted from 0 and upwards:
uint32_t n = 3;
if(foo & (1u << n))
This is by far the most common way to mask bits in C. If you need the value 1 or 0, then you could use !! or just _Bool b = foo & (1u << n);.

I have no idea about the performance (benchmark maybe?), but here are two more ideas:
(x >> BITNUM) & 1
This uses one bit shift and one binary AND. As a bonus, you specify the NUMBER of the bit you want to see, and no other magical constants. Pretty easy to use.
(x & MASK) != 0
This uses one binary AND and a comparison with zero. AFAIK comparing with zero is a special operation on most processors, so it should be cheap. It's even possible that this result is automatically calculated by the processor as a byproduct of the binary AND and stored in a CPU flag. If so, then the comparison with zero might get optimized out entirely leaving you with just one bitwise AND (depends on the CPU, compiler and the rest of your code though).
Last but not least, if you're just using this for IF statements then maybe you don't really need to coerce it to 1 and 0? I mean, in C anything non-0 is truthy, so this:
if (x & MASK)
would work just fine and produce optimal code.

Related

Is there a better way to define a preprocessor macro for doing bit manipulation?

Take macro:
GPIOxMODE(gpio,mode,port) ( GPIO##gpio->MODER = ((GPIO##gpio->MODER & ~((uint32_t)GPIO2BITMASK << (port*2))) | (mode << (port * 2))) )
Assuming that the reset value of the register is 0xFFFF.FFFF, I want to set a 2 bit width to an arbitrary value. This was written for an STM32
MCU that has 15 pins per port. GPIO2BITMASK is defined as 0x3. Is there a better way for clearing and setting a random 2 bits in anywhere in the
32-bit wide register.
Valid range for port 0 - 15
Valid range for mode 0 - 3
The method I came up with is to bit shift the mask, invert it, logically AND it with the existing register value, logically OR the result with a bit shifted new value.
I am looking to combine the mask and new value to reduce the number of logical operations bit shift operations. The goal is also keep the process generic enough so that I can use for bit operations of 1,2,3 or 4 bit widths.
Is there a better way?
In the long and sort of it, is there a better way is really an opened question. I am looking specifically for a method that will reduce the number of logical operations and bit shift operations, while being a simple one lined statement.
The answer is NO.
You MUST do reset/set to ensure that the bit field you are writing to has the desired value.
The answers received can be better (in a matter of opinion/preference/philosophy/practice) in that they aren't necessary a macros and have have parameter checking. Also pit falls of this style have been pointed out in both the comments and responses.
This kind of macros should be avoided as a plaque for many reasons:
They are not debuggable
They are hard to find error prone
and many other reasons
The same result you can archive using inline functions. The resulting code will be the same effective
static inline __attribute__((always_inline)) void GPIOMODE(GPIO_TypeDef *gpio, unsigned mode, unsigned pin)
{
gpio -> MODER &= ~(GPIO_MODER_MODE0_Msk << (pin * 2));
gpio -> MODER |= mode << (pin * 2);
}
but if you love macros
#define GPIOxMODE(gpio,mode,port) {volatile uint32_t *mdr = &GPIO##gpio->MODER; *mdr &= ~(GPIO_MODER_MODE0_Msk << (port*2)); *mdr |= mode << (port * 2);}
I am looking to combine the mask and new value to reduce the number of
logical operations bit shift operations.
you cant. You need to reset and then set the bits.
The method I came up with is to bit shift the mask, invert it,
logically AND it with the existing register value, logically OR the
result with a bit shifted new value.
That or an equivalent is the way to do it.
I am looking to combine the mask and new value to reduce the number of
logical operations bit shift operations. The goal is also keep the
process generic enough so that I can use for bit operations of 1,2,3
or 4 bit widths.
Is there a better way?
You must accomplish two basic objectives:
ensure that the bits that should be off in the affected range are in fact off, and
ensure that the bits that should be on in the affected range are in fact on.
In the general case, those require two separate operations: a bitwise AND to force bits off, and a bitwise OR (or XOR, if the bits are first cleared) to turn the wanted bits on. There may be ways to shortcut for specific cases of original and target values, but if you want something general-purpose, as you say, then your options are limited.
Personally, though, I think I would be inclined to build it from multiple pieces, separating the GPIO selection from the actual computation. At minimum, you can separate out a generic macro for setting a range of bits:
#define SETBITS32(x,bits,offset,mask) ((((uint32_t)(x)) & ~(((uint32_t)(mask)) << (offset))) | (((uint32_t)(bits)) << (offset)))
#define GPIOxMODE(gpio,mode,port) (GPIO##gpio->MODER = SETBITS32(GPIO##gpio->MODER, mode, port * 2, GPIO2BITMASK)
But do note that there appears to be no good way to avoid such a macro evaluating some of its arguments more than once. It might therefore be safer to write SETBITS32 as a function instead. The compiler will probably inline such a function in any case, but you can maximize the likelihood of that by declaring it static and inline:
static inline uint32_t SETBITS32(uint32_t x, uint32_t bits, unsigned offset, uint32_t mask) {
return x & ~(mask << offset) | (bits << offset);
}
That's easier to read, too, though it, like the macro, does assume that bits has no set bits outside the mask region.
Of course there are other, similar formulations. For instance, if you do not need to support discontinuous bit ranges, you might specify a bit count instead of a bit mask. This alternative does that, protects against the user providing bits outside the specified range, and also has some parameter validation:
static inline uint32_t set_bitrange_32(uint32_t x, uint32_t bits, unsigned width,
unsigned offset) {
if (width + offset > 32) {
// error: invalid parameters
return x;
} else if (width == 0) {
return x;
}
uint32_t mask = ~(uint32_t)0 >> (32 - width);
return x & ~(mask << offset) | ((bits & mask) << offset);
}

Is there any way in C to check at compile time if you are on an architecture where multiplication is fast?

Is there any way for C code to tell whether it is being compiled on an architecture where multiplication is fast? Is there some macro __FAST_MULT__ or something which is defined on those architectures?
For example, assume you are implementing a function to determine the Hamming weight of a 64-bit integer via the shift-and-add method*. There are two optimal algorithms for that: one requires 17 arithmetic operations, while the other requires only 12, but one of those is a multiplication operation. The second algorithm is thus 30% faster, if you are running on hardware where multiplication takes the same amount of time as addition - but much, much slower on a system where multiplication is implemented as repeated addition.
Thus, when writing such a function, it would be useful to be able to check at compile time whether this is the case, and switch between the two algorithms as appropriate:
unsigned int popcount_64(uint64_t x) {
x -= (x >> 1) & 0x5555555555555555; // put count of each 2 bits into those 2 bits
x = (x & 0x3333333333333333) + ((x >> 2) & 0x3333333333333333); // put count of each 4 bits into those 4 bits
x = (x + (x >> 4)) & 0x0f0f0f0f0f0f0f0f; // put count of each 8 bits into those 8 bits
#ifdef __FAST_MULT__
return (x * 0x0101010101010101)>>56; // returns left 8 bits of x + (x<<8) + (x<<16) + (x<<24) + ...
#else // __FAST_MULT__
x += x >> 8; // put count of each 16 bits into their lowest 8 bits
x += x >> 16; // put count of each 32 bits into their lowest 8 bits
x += x >> 32; // put count of each 64 bits into their lowest 8 bits
return x & 0x7f;
#endif // __FAST_MULT__
}
Is there any way to do this?
* Yes, I am aware of the __builtin_popcount() functions; this is just an example.
Is there any way for C code to tell whether it is being compiled on an architecture where multiplication is fast? Is there some macro __FAST_MULT__ or something which is defined on those architectures?
No, standard C does not provide any such facility. It is possible that particular compilers provide such a thing as an extension, but I am not specifically aware of any that actually do.
This sort of thing can be tested during build configuration, for example via Autoconf or CMake, in which case you can provide the symbol yourself where appropriate.
Alternatively, some C compilers definitely do provide macros that indicate the architecture for which the code is being compiled. You can use that in conjunction with knowledge of the details of various machine architectures to choose between the two algorithms -- that's what such macros are intended for, after all.
Or you can rely on the person building the program to choose, by configuration option, by defining a macro, or whatever.
I don't believe there is a predefined macro that specifically addresses the fast multiplication feature.
There are, however, a lot of predefined compiler macros for different architectures so if you already know in advance what architectures or CPUs support the fast multiplication instruction, you can use those macros do define your own application-specific one that signifies the fast multiplication.
E.g.:
#if (defined __GNUC__ && defined __arm__ && defined __ARM_ARCH_'7'__) ||
(defined __CC_ARM && (__TARGET_ARCH_ARM == 7))
#define FAST_MULT
#endif

Performance of bitwise operators in C

What is the fastest way to make the last 2 bits of a byte zero?
x = x >> 2 << 2;
OR
x &= 252;
Is there a better way?
Depends on many factors, including the compiler, the machine architecture (ie processor).
My experience is that
x &= 252; // or...
x &= ~3;
are more efficient (and faster) than
x = x >> 2 << 2;
If your compiler is smart enough, it might replace
x = x >> 2 << 2;
by
x &= ~3;
The later is faster than the former, because the later is only one machine instruction, while the former is two. And all bit manipulation instructions can be expected to execute in precisely one cycle.
Note:
The expression ~3 is the correct way to say: A bit mask with all bits set but the last two. For a one-byte type, this is equivalent to using 252 as you did, but ~3 will work for all types up to int. If you need to specify such a bitmask for a larger type like a long, add the appropriate suffix to the number, ~3l in the case of a long.

how do I perform shifts in c without losing bits?

In C when you do something like this:
char var = 1;
while(1)
{
var = var << 1;
}
In the 8th iteration the "<<" operator will shift out the 1 and var will be 0. I need to perform a shift in order to mantain the bit shifting. In other words I need this:
initial ----- 00000001
1st shift -- 00000010
2nd shift - 00000100
3rd shift - 00001000
4th shift - 00010000
5th shift -- 00100000
6th shift -- 01000000
7th shift - 10000000
8th shift - 00000001 (At the 8th shift the one automatically start again)
Is there something equivalent to "<<" but to achieve this?
This is known as a circular shift, but C doesn't offer this functionality at the language level.
You will either have to implement this yourself, or resort to inline assembler routines, assuming your platform natively has such an instruction.
For example:
var = (var << 1) | (var >> 7);
(This is not well-defined for negative signed types, though, so you'd have to change your example to unsigned char.)
Yes, you can use a circular shift. (Although it isn't a built-in C operation, but it is a CPU instruction on x86 CPUs)
So you want to do a bit rotation, a.k.a. circular shift, then.
#include <limits.h> // Needed for CHAR_BIT
// positive numbits -> right rotate, negative numbits -> left rotate
#define ROTATE(type, var, numbits) ((numbits) >= 0 ? \
(var) >> (numbits) | (var) << (CHAR_BIT * sizeof(type) - (numbits)) : \
(var) << -(numbits) | (var) >> (CHAR_BIT * sizeof(type) + (numbits)))
As sizeof() returns sizes as multiples of the size of char (sizeof(char) == 1), and CHAR_BIT indicates the number of bits in a char (which, while usually 8, won't necessarily be), CHAR_BIT * sizeof(x) will give you the size of x in bits.
This is called a circular shift. There are intel x86 assembly instructions to do this but unless performance is REALLY REALLY A HUGE ISSUE you're better off using something like this:
int i = 0x42;
int by = 13;
int shifted = i << by | i >> ((sizeof(int) * 8) - by);
If you find yourself really needing the performance, you can use inline assembly to use the instructions directly (probably. I've never needed it badly enough to try).
It's also important to note that if you're going to be shifting by more places than the size of your data type, you need additional checks to make sure you're not overshifting. Using by = 48 would probably result in shifted receiving a value of 0, though this behavior may be platform specific (i.e. something to avoid like the plague) because if I recall correctly, some platforms perform this masking automatically and others do not.

How to perform rotate shift in C [duplicate]

This question already has answers here:
Best practices for circular shift (rotate) operations in C++
(16 answers)
Closed 5 years ago.
I have a question as described: how to perform rotate shift in C without embedded assembly. To be more concrete, how to rotate shift a 32-bit int.
I'm now solving this problem with the help of type long long int, but I think it a little bit ugly and wanna know whether there is a more elegant method.
Kind regards.
(warning to future readers): Wikipedia's code produces sub-optimal asm (gcc includes a branch or cmov). See Best practices for circular shift (rotate) operations in C++ for efficient UB-free rotates.
From Wikipedia:
unsigned int _rotl(unsigned int value, int shift) {
if ((shift &= 31) == 0)
return value;
return (value << shift) | (value >> (32 - shift));
}
unsigned int _rotr(unsigned int value, int shift) {
if ((shift &= 31) == 0)
return value;
return (value >> shift) | (value << (32 - shift));
}
This answer is a duplicate of what I posted on Best-practices for compiler-friendly rotates.
See my answer on another question for the full details.
The most compiler-friendly way to express a rotate in C that avoids any Undefined Behaviour seems to be John Regehr's implementation:
uint32_t rotl32 (uint32_t x, unsigned int n)
{
const unsigned int mask = (CHAR_BIT*sizeof(x)-1);
assert ( (n<=mask) &&"rotate by type width or more");
n &= mask; // avoid undef behaviour with NDEBUG. 0 overhead for most types / compilers
return (x<<n) | (x>>( (-n)&mask ));
}
Works for any integer type, not just uint32_t, so you could make multiple versions. This version inlines to a single rol %cl, reg (or rol $imm8, reg) on x86, because the compiler knows that the instruction already has the mask operation built-in.
I would recommend against templating this on the operand type, because you don't want to accidentally do a rotate of the wrong width, when you had a 16bit value stored in an int temporary. Especially since integer-promotion rules can turn the result of an expression involving a narrow unsigned type into and int.
Make sure you use unsigned types for x and the return value, or else it won't be a rotate. (gcc does arithmetic right shifts, shifting in copies of the sign-bit rather than zeroes, leading to a problem when you OR the two shifted values together.)
Though thread is old I wanted to add my two cents to the discussion and propose my solution of the problem. Hope it's worth a look but if I'm wrong correct me please.
When I was looking for efficient and safe way to rotate I was actually surprised that there is no real solution to that. I found few relevant threads here:
https://blog.regehr.org/archives/1063 (Safe, Efficient, and Portable Rotate in C/C++),
Best practices for circular shift (rotate) operations in C++
and wikipedia style (which involves branching but is safe):
uint32_t wikipedia_rotl(uint32_t value, int shift) {
if ((shift &= 31) == 0)
return value;
return (value << shift) | (value >> (32 - shift));
}
After little bit of contemplation I discovered that modulo division would fit the criteria as the resulting reminder is always lower than divisor which perfectly fits the condition of shift<32 without branching.
From mathematical point of view:
∀ x>=0, y: (x mod y) < y
In our case every (x % 32) < 32 which is exactly what we want to achieve. (And yes, I have checked that empirically and it always is <32)
#include <stdint.h>
uint32_t rotl32b_i1m (uint32_t x, uint32_t shift)
{
shift %= 32;
return (x<<shift) | (x>>(32-shift));
}
Additionally mod() will simplify the process because actual rotation of, let's say 100 bits is rotating full 32 bits 3 times, which essentially changes nothing and then 4 bits. So isn't it better to calculate 100%32==4 and rotate 4 bits? It takes single processor operation anyway and brings it to rotation of constant value plus one instruction, ok two as argument has to be taken from stack, but it's still better than branching with if() like in "wikipedia" way.
So, what you guys think of that?

Resources