I have a program that uses the following two functions 99.9999% of time:
unsigned int getBit(unsigned char *byte, unsigned int bitPosition)
{
return (*byte & (1 << bitPosition)) >> bitPosition;
}
void setBit(unsigned char *byte, unsigned int bitPosition, unsigned int bitValue)
{
*byte = (*byte | (1 << bitPosition)) ^ ((bitValue ^ 1) << bitPosition);
}
Can this be improved? The processing speed of the program mainly depends on the speed of these two functions.
UPDATE
I will do a benchmark for each provided answer bellow and write the timings I get. For the reference, the compiler used is gcc on Mac OS X platform:
Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
I compile without any specific arguments like: gcc -o program program.c
If you think I should set some optimizations, feel free to suggest.
The CPU is:
2,53 GHz Intel Core 2 Duo
While processing 21.5 MB of data with my originally provided functions it takes about:
Time: 13.565221
Time: 13.558416
Time: 13.566042
Time is in seconds (these are three tries).
-- UPDATE 2 --
I've used the -O3 optimization (gcc -O3 -o program program.c) option and now I'm getting these results:
Time: 6.168574
Time: 6.170481
Time: 6.167839
I'll redo the other benchmarks now...
If you want to stick with functions, then for the first one:
unsigned int getBit(unsigned char *byte, unsigned int bitPosition)
{
return (*byte >> bitPosition) & 1;
}
For the second one:
void setBit(unsigned char *byte, unsigned int bitPosition, unsigned int bitValue)
{
if(bitValue == 0)
*byte &= ~(1 << bitPosition);
else
*byte |= (1 << bitPosition);
}
However, I suspect that the function call/return overhead will swamp the actual bit-flipping. A good compiler might inline these function calls anyways, but you may get some improvement by defining these as macros:
#define getBit(b, p) ((*(b) >> (p)) & 1)
#define setBit(b, p, v) (*(b) = ((v) ? (*(b) | (1 << (p))) : (*(b) & (~(1 << (p))))))
#user694733 pointed out that branch prediction might be a problem and could cause a slowdown. As such it might be good to define separate setBit and clearBit functions:
void setBit(unsigned char *byte, unsigned int bitPosition)
(
*byte |= (1 << bitPosition);
}
void clearBit(unsigned char *byte, unsigned int bitPosition)
(
*byte &= ~(1 << bitPosition);
}
And their corresponding macro versions:
#define setBit(b, p) (*(b) |= (1 << (p)))
#define clearBit(b, p) (*(b) &= ~(1 << (p)))
The separate functions/macros would be useful if the calling code hard-codes the value passed for the bitValue argument in the original version.
Share and enjoy.
How about:
bool getBit(unsigned char byte, unsigned int bitPosition)
{
return (byte & (1 << bitPosition)) != 0;
}
No need to use a shift operator to "physically" shift the masked-out bit into position 0, just use a comparison operator and let the compiler deal with it. This should of course also be made inline if possible.
For the second one, it's complicated by the fact that it's basically "assignBit", i.e. it takes the new value of the indicated bit as a parameter. I'd try using the explicit branch:
unsigned char setBit(unsigned char byte, unsigned int bitPosition, bool value)
{
const uint8_t mask = 1 << bitPosition;
if(value)
return byte | mask;
return byte & ~mask;
}
Generally, these things are best left to the compiler's optimizer.
But why do you need functions for such trivial tasks? A C programmer should not get shocked when they encounter basic stuff like this:
x |= 1<<n; // set bit
x &= ~(1<<n); // clear bit
x ^= 1<<n; // toggle bit
y = x & (1<<n); // read bit
There is no real reason to hide simple things like these behind functions. You won't make the code more readable, because you can always assume that the reader of your code knows C. It rather seems like pointless wrapper functions to hide away "scary" operators that the programmer isn't familiar with.
That being said, the introduction of the functions may cause a lot of overhead code. To turn your functions back into the core operations shown above, the optimizer would have to be quite good.
If you for some reason persists in using the functions, any attempt of manual optimization is going to be questionable practice. The use of inline, register and such keywords are likely superfluous. The compiler with optimizer enabled should be far more capable to make the decision when to inline and when to put things in registers than the programmer.
As usual, it doesn't make sense to manually optimize code, unless you know more about the given CPU than the person who wrote the compiler port for it. Most often this is not the case.
What you can harmlessly do as manual optimization, is to get rid of unsigned char (you shouldn't be using the native C types for this anyhow). Instead use the uint_fast8_t type from stdint.h. Using this type means: "I would like to have an uint8_t, but if the CPU prefers a larger type for alignment/performance reasons, it can use that instead".
EDIT
There are different ways to set a bit to either 1 or 0. For maximum readability, you would write this:
uint8_t val = either_1_or_0;
...
if(val == 1)
byte |= 1<<n;
else
byte &= ~(1<<n);
This does however include a branch. Let's assume we know that the branch is a known performance bottleneck on the given system, to justify the otherwise questionable practice of manual optimization. We could then set the bit to either 1 or 0 without a branch, in the following manner:
byte = (byte & ~(1<<n)) | (val<<n);
And this is where the code is turning a bit unreadable. Read the above as:
Take the byte and preserve everything in it, except for the bit we want to set to 1 or 0.
Clear this bit.
Then set it to either 1 or 0.
Note that the whole right side sub-expression is pointless if val is zero. So on a "generic system" this code is possibly slower than the readable version. So before writing code like this, we would have to know that our CPU is very good at bit-flipping and not-so-good at branch prediction.
You can benchmark with the following variations and keep the best of all solutions.
inline unsigned int getBit(unsigned char *byte, unsigned int bitPosition)
{
const unsigned char mask = (unsigned char)(1U << bitPosition);
return !!(*byte & mask);
}
inline void setBit(unsigned char *byte, unsigned int bitPosition, unsigned int bitValue)
{
const unsigned char mask = (unsigned char)(1U << bitPosition);
bitValue ? *byte |= mask : *byte &= ~mask;
}
If your algorithm expects only zero v/s non zero result from getBit, you can remove !! from return. (To return 0 or 1, I found the version of #BobJarvis really clean)
If your algorithm can pass the bit mask to be set or reset to setBit function, you won't need to calculate mask explicitly.
So depending on the code calling these functions, it may be possible to cut on time.
Related
I just tried with this code:
void swapBit(unsigned char* numbA, unsigned char* numbB, short bitPosition)//bitPosition 0-x
{
unsigned char oneShift = 1 << bitPosition;
unsigned char bitA = *numbA & oneShift;
unsigned char bitB = *numbB & oneShift;
if (bitA)
*numbB |= bitA;
else
*numbB &= (~bitA ^ oneShift);
if (bitB)
*numbA |= bitB;
else
*numbA &= (~bitB ^ oneShift);
}
to swap bit position x of a and b but because of the if() I think there's something better.
Also when i see this:
*numbB &= (~bitA ^ oneShift);
I really think that there's an easier way to do it.
If you have something for me, i would take it :)
Thanks in advance
First you should set the corresponding position in a number to 0, and then OR it with the actual bit, removing all of the conditions:
*numbB &= ~oneShift; // Set the bit to `0`
*numbB |= bitA; // Set to the actual bit value
The same for the other number.
A single bit is no easier than an arbitrary bitmask, so lets just talk about that. You can always call this function with 1U << bitpos.
If a bit position is the same in both values, no change is needed in either. If it's opposite, they both need to invert.
XOR with 1 flips a bit; XOR with 0 is a no-op.
So what we want is a value that has a 1 everywhere there's a bit-difference between the inputs, and a 0 everywhere else. That's exactly what a XOR b does. Simply mask this to only swap some of the bits, and we have a bit-swap in 3 XORs + 1 AND.
// call with unsigned char mask = 1U << bitPosition; if you want
inline
void swapBit_char(unsigned char *A, unsigned char *B, unsigned char mask)
{
unsigned char tmpA = *A, tmpB = *B; // read into locals in case A==B
unsigned char bitdiff = tmpA ^ tmpB;
bitdiff &= mask; // only swap bits matching the mask
*A = tmpA ^ bitdiff;
*B = tmpB ^ bitdiff;
}
(Godbolt compiler explorer with gcc for x86-64 and ARM, includes a version with unsigned instead of unsigned char.)
You could consider if(bitdiff) { ... }, but unless you're going to avoid dirtying a cache line in memory by avoiding the assignments, it's probably not worth doing any conditional behaviour. With values in registers (after inlining), a branch to save two xor instructions is almost never worth it.
This is not an xor-swap. It does use temporary storage. As #chux's answer demonstrates, a masked xor-swap requires 3 AND operations as well as 3 XOR. (And defeats the only benefit of XOR-swap by requiring a temporary register or other storage for the & results.)
This version only requires 1 AND. Also, the last two XORs are independent of each other, so total latency from inputs to both outputs is only 3 operations. (Typically 3 cycles).
For an x86 asm example, see this code-golf Exchange capitalization of two strings in 14 bytes of x86-64 machine code (with commented asm source)
Form the mask
unsigned char mask = 1u << bitPosition;
And then earn the wrath of your peer group with XOR swap algorithm.
*numbA ^= *numbB & mask;
*numbB ^= *numbA & mask;
*numbA ^= *numbB & mask;
Note this fails when numbA == numbB.
How do I write to a single bit? I have a variable that is either a 1 or 0 and I want to write its value to a single bit in a 8-bit reg variable.
I know this will set a bit:
reg |= mask; // mask is (1 << pin)
And this will clear a bit:
reg &= ~mask; // mask is (1 << pin)
Is there a way for me to do this in one line of code, without having to determine if the value is high or low as the input?
Assuming value is 0 or 1:
REG = (REG & ~(1 << pin)) | (value << pin);
I use REG instead of register because as #KerrekSB pointed out in OP comments, register is a C keyword.
The idea here is we compute a value of REG with the specified bit cleared and then depending on value we set the bit.
Because you tagged this with embedded I think the best answer is:
if (set)
reg |= mask; // mask is (1 << pin)
else
reg &= ~mask; // mask is (1 << pin)
(which you can wrap in a macro or inline function). The reason being that embedded architectures like AVR have bit-set and bit-clear instructions and the cost of branching is not high compared to other instructions (as it is on a modern CPU with speculative execution). GCC can identify the idioms in that if statement and produce the right instructions. A more complex version (even if it's branchless when tested on modern x86) might not assemble to the best instructions on an embedded system.
The best way to know for sure is to disassemble the results. You don't have to be an expert (especially in embedded environments) to evaluate the results.
One overlooked feature of C is bit packing, which is great for embedded work. You can define a struct to access each bit individually.
typedef struct
{
unsigned char bit0 : 1;
unsigned char bit1 : 1;
unsigned char bit2 : 1;
unsigned char bit3 : 1;
unsigned char bit4 : 1;
unsigned char bit5 : 1;
unsigned char bit6 : 1;
unsigned char bit7 : 1;
} T_BitArray;
The : 1 tells the compiler that you only want each variable to be 1 bit long. And then just access the address that your variable reg sits on, cast it to your bit array and then access the bits individually.
((T_BitArray *)®)->bit1 = value;
® is the address of your variable. ((T_BitArray *)®) is the same address, but now the complier thinks of it as a T_BitArray address and ((T_BitArray *)®)->bit1 provides access to the second bit. Of course, it's best to use more descriptive names than bit1
//Through Macro we can do set resset Bit
#define set(a,n) a|=(1<<n);
#define reset(a,n) a&=(0<<n);
//toggle bit value given by the user
#define toggle(a,n) a^=(1<<n);
int a,n;
int main()
{
printf("Set Reset particular Bit given by User ");
scanf("%d %d",&a,&n);
int b =set(a,n) //same way we can call all the macro
printf("%d",b);
return 0;
}
I think what you're asking is if you can execute a write instruction on a single bit without first reading the byte that it's in. If so, then no, you can't do that. Has nothing to do with the C language, just microprocessors don't have instructions that address single bits. Even in raw machine code, if you want to set a bit you have to read the byte it's in, change the bit, then write it back. There's just no other way to do it.
Duplicate of how do you set, clear, and toggle a single bit and I'll repost my answer too as no-one's mentioned SET and CLEAR registers yet:
As this is tagged "embedded" I'll assume you're using a microcontroller. All of the above suggestions are valid & work (read-modify-write, unions, structs, etc.).
However, during a bout of oscilloscope-based debugging I was amazed to find that these methods have a considerable overhead in CPU cycles compared to writing a value directly to the micro's PORTnSET / PORTnCLEAR registers which makes a real difference where there are tight loops / high-frequency ISR's toggling pins.
For those unfamiliar: In my example, the micro has a general pin-state register PORTn which reflects the output pins, so doing PORTn |= BIT_TO_SET results in a read-modify-write to that register.
However, the PORTnSET / PORTnCLEAR registers take a '1' to mean "please make this bit 1" (SET) or "please make this bit zero" (CLEAR) and a '0' to mean "leave the pin alone". so, you end up with two port addresses depending whether you're setting or clearing the bit (not always convenient) but a much faster reaction and smaller assembled code.
Consider a variable unsigned int a; in C.
Now say I want to set any i'th bit in this variable to '1'.
Note that the variable has some value. So a=(1<<i) will not work.
a=a+(1<<i) will work,but I am looking for the fastest way. Anything??
Bitwise or it. e.g. a |= (1<<i)
Some useful bit manipulation macros
#define BIT_MASK(bit) (1 << (bit))
#define SET_BIT(value,bit) ((value) |= BIT_MASK(bit))
#define CLEAR_BIT(value,bit) ((value) &= ~BIT_MASK(bit))
#define TEST_BIT(value,bit) (((value) & BIT_MASK(bit)) ? 1 : 0)
The most common way to do this is:
a |= (1 << i);
This is only two operations - a shift and an OR. It's hard to see how this might be improved upon.
You should use bitwise OR for this operation.
a |= 1 << i;
You could probably use
a |= (1 << i)
But it won't make much of a difference. Performance-wise, you shouldn't see any difference.
You might be able to try building a table where you map i to a bit mask (like 2 => 0x0010 for 0000000000100), but that's a bit unnecessary.
You could use a bitwise OR:
a |= (1 << i);
Note that this does not have the same behavior as +, which will carry if there's already a 1 in the bit you're setting.
The way I implemented bit flags (to quote straight out of my codebase, you can use it freely for whatever purpose, even commercial):
void SetEnableFlags(int &BitFlags, const int Flags)
{
BitFlags = (BitFlags|Flags);
}
const int EnableFlags(const int BitFlags, const int Flags)
{
return (BitFlags|Flags);
}
void SetDisableFlags(int BitFlags, const int Flags)
{
BitFlags = (BitFlags&(~Flags));
}
const int DisableFlags(const int BitFlags, const int Flags)
{
return (BitFlags&(~Flags));
}
No bitwise shift operation needed.
You might have to tidy up or alter the code to use the particular variable set you're using, but generally it should work fine.
I would like to cast unsigned int (32bit) A to unsigned short int (16bit) B in a following way:
if A <= 2^16-1 then B=A
if A > 2^16-1 then B=2^16-1
In other words to cast A but if it is > of maximum allowed value for 16bit to set it as max value.
How can this be achieved with bit operations or other non branching method?
It will work for unsigned values:
b = -!!(a >> 16) | a;
or, something similar:
static inline unsigned short int fn(unsigned int a){
return (-(a >> 16) >> 16) | a;
};
Find minimum of two integers without branching:
http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax
On some rare machines where branching
is very expensive and no condition
move instructions exist, the above
expression might be faster than the
obvious approach, r = (x < y) ? x : y,
even though it involves two more
instructions. (Typically, the obvious
approach is best, though.)
Just to kick things off, here's a brain-dead benchmark. I'm trying to get a 50/50 mix of large and small values "at random":
#include <iostream>
#include <stdint.h>
int main() {
uint32_t total = 0;
uint32_t n = 27465;
for (int i = 0; i < 1000*1000*500; ++i) {
n *= 30029; // worst PRNG in the world
uint32_t a = n & 0x1ffff;
#ifdef EMPTY
uint16_t b = a; // gives the wrong total, of course.
#endif
#ifdef NORMAL
uint16_t b = (a > 0xffff) ? 0xffff : a;
#endif
#ifdef RUSLIK
uint16_t b = (-(a >> 16) >> 16) | a;
#endif
#ifdef BITHACK
uint16_t b = a ^ ((0xffff ^ a) & -(0xffff < a));
#endif
total += b;
}
std::cout << total << "\n";
}
On my compiler (gcc 4.3.4 on cygwin with -O3), NORMAL wins, followed by RUSLIK, then BITHACK, respectively 0.3, 0.5 and 0.9 seconds slower than the empty loop. Really this benchmark means nothing, I haven't even checked the emitted code to see whether the compiler's smart enough to outwit me somewhere. But I like ruslik's anyway.
1) With an intrinsic on a CPU that natively does this sort of convertion.
2) You're probably not going to like this, but:
c = a >> 16; /* previously declared as a short */
/* Saturate 'c' with 1s if there are any 1s, by first propagating
1s rightward, then leftward. */
c |= c >> 8;
c |= c >> 4;
c |= c >> 2;
c |= c >> 1;
c |= c << 1;
c |= c << 2;
c |= c << 4;
c |= c << 8;
b = a | c; /* implicit truncation */
First off, the phrase "non-branching method" doesn't technically make sense when discussing C code; the optimizer may find ways to remove branches from "branchy" C code, and conversely would be entirely within its rights to replace your clever non-branching code with a branch just to spite you (or because some heuristic said it would be faster).
That aside, the simple expression:
uint16_t b = a > UINT16_MAX ? UINT16_MAX : a;
despite "having a branch", will be compiled to some sort of (branch-free) conditional move (or possible just a saturate) by many compilers on many systems (I just tried three different compilers for ARM and Intel, and all generated a conditional move).
I would use that simple, readable expression. If and only if your compiler isn't smart enough to optimize it (or your target architecture doesn't have conditional moves), and if you have benchmark data that shows this to be a bottleneck for your program, then I would (a) find a better compiler and (b) file a bug against your compiler and only then look for clever hacks.
If you're really, truly devoted to being too clever by half, then ruslik's second suggestion is actually quite beautiful (much nicer than a generic min/max).
This question already has answers here:
Best practices for circular shift (rotate) operations in C++
(16 answers)
Closed 5 years ago.
I have a question as described: how to perform rotate shift in C without embedded assembly. To be more concrete, how to rotate shift a 32-bit int.
I'm now solving this problem with the help of type long long int, but I think it a little bit ugly and wanna know whether there is a more elegant method.
Kind regards.
(warning to future readers): Wikipedia's code produces sub-optimal asm (gcc includes a branch or cmov). See Best practices for circular shift (rotate) operations in C++ for efficient UB-free rotates.
From Wikipedia:
unsigned int _rotl(unsigned int value, int shift) {
if ((shift &= 31) == 0)
return value;
return (value << shift) | (value >> (32 - shift));
}
unsigned int _rotr(unsigned int value, int shift) {
if ((shift &= 31) == 0)
return value;
return (value >> shift) | (value << (32 - shift));
}
This answer is a duplicate of what I posted on Best-practices for compiler-friendly rotates.
See my answer on another question for the full details.
The most compiler-friendly way to express a rotate in C that avoids any Undefined Behaviour seems to be John Regehr's implementation:
uint32_t rotl32 (uint32_t x, unsigned int n)
{
const unsigned int mask = (CHAR_BIT*sizeof(x)-1);
assert ( (n<=mask) &&"rotate by type width or more");
n &= mask; // avoid undef behaviour with NDEBUG. 0 overhead for most types / compilers
return (x<<n) | (x>>( (-n)&mask ));
}
Works for any integer type, not just uint32_t, so you could make multiple versions. This version inlines to a single rol %cl, reg (or rol $imm8, reg) on x86, because the compiler knows that the instruction already has the mask operation built-in.
I would recommend against templating this on the operand type, because you don't want to accidentally do a rotate of the wrong width, when you had a 16bit value stored in an int temporary. Especially since integer-promotion rules can turn the result of an expression involving a narrow unsigned type into and int.
Make sure you use unsigned types for x and the return value, or else it won't be a rotate. (gcc does arithmetic right shifts, shifting in copies of the sign-bit rather than zeroes, leading to a problem when you OR the two shifted values together.)
Though thread is old I wanted to add my two cents to the discussion and propose my solution of the problem. Hope it's worth a look but if I'm wrong correct me please.
When I was looking for efficient and safe way to rotate I was actually surprised that there is no real solution to that. I found few relevant threads here:
https://blog.regehr.org/archives/1063 (Safe, Efficient, and Portable Rotate in C/C++),
Best practices for circular shift (rotate) operations in C++
and wikipedia style (which involves branching but is safe):
uint32_t wikipedia_rotl(uint32_t value, int shift) {
if ((shift &= 31) == 0)
return value;
return (value << shift) | (value >> (32 - shift));
}
After little bit of contemplation I discovered that modulo division would fit the criteria as the resulting reminder is always lower than divisor which perfectly fits the condition of shift<32 without branching.
From mathematical point of view:
∀ x>=0, y: (x mod y) < y
In our case every (x % 32) < 32 which is exactly what we want to achieve. (And yes, I have checked that empirically and it always is <32)
#include <stdint.h>
uint32_t rotl32b_i1m (uint32_t x, uint32_t shift)
{
shift %= 32;
return (x<<shift) | (x>>(32-shift));
}
Additionally mod() will simplify the process because actual rotation of, let's say 100 bits is rotating full 32 bits 3 times, which essentially changes nothing and then 4 bits. So isn't it better to calculate 100%32==4 and rotate 4 bits? It takes single processor operation anyway and brings it to rotation of constant value plus one instruction, ok two as argument has to be taken from stack, but it's still better than branching with if() like in "wikipedia" way.
So, what you guys think of that?