C speed of comparison: Equals "==" vs Bitwise and "&" - c

Suppose I have an integer that is a power of 2, eg. 1024:
int a = 1 << 10; //works with any power of 2 no.
Now I want to check whether another integer b is the same as a. Which is faster/better (especially on weak embedded systems):
if (b == a) {}
or
if (b & a) {}
?
Sorry if this is a noob question, but couldn't find an answer using the search.
edit: thanks for many insightful answers. I could select only one of them, but all of them are welcome.

These operations are not even equivalent, because a & b will be false when both a and b are 0.
So I'd suggest to express the semantics that you want (i.e. a == b) and let the compiler to the optimization.
If you then measuer that you have performance issues at that point, then you can start analyzing/optimizing...

The short answer is this - it depends on what sort of things you're comparing. However, in this case, I'll assume that you're comparing two variables to each other (as opposed to a variable and an immediate, etc.)
This website, although rather old, studied how many clock cycles different instructions took on the x86 platform. The two instructions we're interested in here are the "AND" instruction and the "CMP" instruction (which the compiler uses for & and == respectively). What we can see here is that both of these instructions take about 1/3 of a cycle - that is to say, you can execute 3 of them in 1 cycle on average. Compare this to the "DIV" instruction which (in 1996) took 23 cycles to execute.
However, this omits one important detail. An "AND" instruction is not sufficient to complete the behavior you're looking for. In fact, a brief compilation on x86_64 suggests that you need both an "AND" and a "TEST" instruction for the "&" version, while "==" simply uses the "CMP" instruction. Because all these instructions are otherwise equivalent in IPC, the "==" will in fact be slightly faster...as of 1996.
Nowadays, processors optimize so well at the bare metal layer that you're unlikely to notice a difference. That said, if you wanted to see for sure...simply write a test program and find out for yourself.
As noted above though, even in the case that you have a power of 2, these instructions are still not equivalent, since it doesn't work for 0. Well...I guess technically zero ISN'T a power of 2. :) However you want to spin it though, use "==".

An X86 CPU sets a flag according to how the result of any operation compares to zero.
For the ==, your compiler will either use a dedicated compare instruction or a subtraction, setting this flag in both cases. The if() is then implemented by a jump that is conditional on this bit.
For the &, another instructions is used, the logical bitwise and instruction. That too sets the flag appropriately. So, again, the next instruction will be the conditional branch.
So, the question boils down to: Is there a performance difference between a subtraction and a bitwise and instruction? And the answer is "no" on any sane architecture. Both instructions use the same ALU, both set the same flags, and this ALU is typically designed to perform a subtraction in a single clock cycle.
Bottom line: Write readable code, and don't try to microoptimize what cannot be optimized.

Related

Translate C program to other programming languages

I am trying to translate a C program. The destination language doesn't really matter, I am just trying to understand what every single part of the program is doing.
I cannot find any detail about:
variable=1;
while(variable);
I understand that this is a loop and that is true (I have read similar questions on stack overflow where a code was actually executed) but in this case there is no code related to this while. So I am wondering, is the program "sleeping" - while this while is executing?
Then, another part I don't understand is:
variable=0;
variable=variable^0x800000;
I believe that value should be 24bits but is this really needed in any other programming language that is not low level as C?
Many thanks
while(variable); implements a spin-lock; i.e. this thread will remain at this statement until variable is 0. I've introduced the term to help you search for a good technique in your new language.
It obviously burns the CPU, but can be quite an efficient way of doing this if only used for a few clock cycles. For it to work well, variable needs to be qualified with volatile.
variable = variable ^ 0x800000; is an XOR operation, actually a single bit toggle in this case. (I would have preferred to see variable ^= 0x800000 in multi-threaded code.) Its exact use is probably explainable from its context. Note that the arguments of the XOR are promoted to int if they are smaller than that. It's doubtful that variable^0x800000 is a 24 bit type unless int is that size on your platform (unlikely but possible).
I am trying to translate a C program.
Don't translate a C program, unless you are writing a compiler (sometimes called a transpiler - or source to source compiler -, if translating to some other programming language different of assembler) which would do such task. And you'll need a lot of work (at least several months for a naive compiler à la TinyCC, and more probably many dozens of years)
Think in C and try to understand its semantics (much more important than its syntax).
while(variable);
that loop has an empty body. It is more readable to make that empty body apparent (semantics remain the same):
while(variable) {};
Since the body (and the test) of the loop don't change variable (it has no observable side-effect) the loop will run indefinitely as soon as the initial value of variable is non-zero. This will heat your processor.
But you might have declared that variable as volatile and then have something external changing it.
variable=variable^0x800000;
The ^ is a bitwise XOR. You are toggling (replacing 0 with 1 and 1 with 0) a single bit (the 23rd one, IIRC)
To answer your second question:
variable=0;
variable=variable^0x800000;
This operation is a bitwise operation called XOR.
An XOR operation is usually used to toggle bits regardless of it's previous state:
0 ^ 1 = 1
1 ^ 1 = 0

Bitwise operation over a simple piece of code

Recently I came across a code that can compute the largest number given two numbers using XOR. While this looks nifty, the same thing can be achieved by a simple ternary operator or an if else. Not pertaining to just this example, but do bitwise operations have any advantage over normal code? If so, is this advantage in speed of computation or memory usage? I am assuming in bitwise operations the assembly code will look much simpler than normal code. On a related note, while programming embedded systems which is more efficient?
*Normal code refers to how you'd normally do it. For example a*2 is normal code and I can achieve the same thing with a<<1
do bitwise operations have any advantage over normal code?
Bitwise operations are normal code. Most compilers these days have optimizers that generate the same instruction for a << 1 as for a * 2. On some hardware, especially on low-powered microprocessors, shift operations take fewer CPU cycles than multiplication, but there is hardware on which this makes no difference.
In your specific case there is an advantage, though: the code with XOR avoids branching, which has a great potential of speeding up the code. When there is no branching, CPU can use pipelining to perform the same operations much faster.
when programming embedded systems which is more efficient?
Embedded systems often have less powerful CPUs, so bitwise operations do have an advantage. For example, on 68HC11 CPU multiplication takes 10 cycles, while shifting left takes only 3.
Note, however, that it does not mean that you should be using bitwise operations explicitly. Most compilers, including embedded ones, will convert multiplication by a constant to a sequence of shifts and additions in case it saves CPU cycles.
Bitwise operators generally have the advantage of being constant time, regardless of input values. Conditional moves and branches may be the target of timing attacks in certain applications, such as crypto libraries, while bitwise operations are not subject to such attacks. (Disregarding cache timing attacks, etc.)
Generally, if a processor is capable of pipelining, it would be more efficient to use bitwise operations than conditional moves or branches, bypassing the entire branch prediction problem. This may or may not speed up your resulting code.
You do have to be careful, though, as some operations constitute undefined behavior in C, such as shifting signed integers, etc. For this reason, it may be to your advantage to do things the "normal" way.
On some platforms branches are expensive, so finding a way to get the min(x,y) without branching has some merit. I think this is particularly useful in CUDA, where the pipelines in the hardware are long.
Of course, on other platforms (like ARM) with conditional execution and compilers that emit those op-codes, it boils down to a compare and a conditional move (two instructions) with no pipeline bubble. Almost certainly better than a compare, and a few logical operations.
Since the poster asks it with the Embedded tag listed, I will try to reflect primarily that in my answer.
In short, usually you shouldn't try to be "creative" with your coding, since it becomes harder to understand later! (The old saying, "premature optimization is the root of all evils")
So only do anything alike when you know what you are doing, precisely, and in any other case, try to write the most understandable C code.
Well, this was the general part, now lets get on what such tricks could do, how they could affect the execution time.
First thing, in embedded, it is good to check the disassembly listing. If you use a variant of GCC with -O2 optimizations, you can usually assume it is quite clever understanding what the code is meant to do, and will produce the result which is likely fine. It can even use such tricks by itself figuring out the code, if it "sees" that it will be faster on the target CPU, so you don't need to ruin the understandability of your code with tricks. With other compilers, results may vary, in doubt, the assembly listing should be observed to see if execution times could be improved utilizing such bit hack tricks.
On the usual embedded platform, especially at 8 bits, you don't need to care that much about pipeline (and related, branch mispredictions) since it is short (or nonexistent). So you usually gain nothing by eliminating a conditional at the cost of an arithmetic operation, and could actually ruin performance by utilizing some elaborate hacks.
On faster 32 bit CPUs usually there is a longer pipeline and branch predictor to eliminate flushes (costing many cycles), so eliminating conditionals may pay off. But only if they are of such nature that the branch predictor can not guess them right (such as comparisons on "random" data), otherwise the conditionals may still be the better, taking the most minimal time (single cycle or even "less" if the CPU is capable to process more than one operation per cycle) when they were predicted right.

Operator performance in loops

What is quicker in C: operator != or >?
I am asking because what if we have a large amount of loops and we have to use one of the above conditions (while(x!=-1) or while(x>0)).
Also what about other languages.
On most modern processors it will not make any difference.
This is usually compiled as a comparison instruction, which sets certain flags followed by a jump which branches on the combination of some of the flags. There is generally no timing difference between the relational operators.
Some optimizations might omit the branching jumps, but then it is not possible to tell which operator will be more performant, if any. It probably depends on the context.
Of course, if you really want to know for sure, you'll have to do a few test runs and/or profile the code.
hypothetical hardware, version one: (x!=-1)
cmp %r1, -1
jeq addr
same hypothetical hardware, version two: (x>0)
cmp %r1, 0
jle addr
unless we know the exact harware, we cant tell, but generally expect them to be same or similar
Either way, I would recommend the version that most clearly express intent.

Are bitwise operations still practical?

Wikipedia, the one true source of knowledge, states:
On most older microprocessors, bitwise
operations are slightly faster than
addition and subtraction operations
and usually significantly faster than
multiplication and division
operations. On modern architectures,
this is not the case: bitwise
operations are generally the same
speed as addition (though still faster
than multiplication).
Is there a practical reason to learn bitwise operation hacks or it is now just something you learn for theory and curiosity?
Bitwise operations are worth studying because they have many applications. It is not their main use to substitute arithmetic operations. Cryptography, computer graphics, hash functions, compression algorithms, and network protocols are just some examples where bitwise operations are extremely useful.
The lines you quoted from the Wikipedia article just tried to give some clues about the speed of bitwise operations. Unfortunately the article fails to provide some good examples of applications.
Bitwise operations are still useful. For instance, they can be used to create "flags" using a single variable, and save on the number of variables you would use to indicate various conditions. Concerning performance on arithmetic operations, it is better to leave the compiler do the optimization (unless you are some sort of guru).
They're useful for getting to understand how binary "works"; otherwise, no. In fact, I'd say that even if the bitwise hacks are faster on a given architecture, it's the compiler's job to make use of that fact — not yours. Write what you mean.
The only case where it makes sense to use them is if you're actually using your numbers as bitvectors. For instance, if you're modeling some sort of hardware and the variables represent registers.
If you want to perform arithmetic, use the arithmetic operators.
Depends what your problem is. If you are controlling hardware you need ways to set single bits within an integer.
Buy an OGD1 PCI board (open graphics card) and talk to it using libpci. http://en.wikipedia.org/wiki/Open_Graphics_Project
It is true that in most cases when you multiply an integer by a constant that happens to be a power of two, the compiler optimises it to use the bit-shift. However, when the shift is also a variable, the compiler cannot deduct it, unless you explicitly use the shift operation.
Funny nobody saw fit to mention the ctype[] array in C/C++ - also implemented in Java. This concept is extremely useful in language processing, especially when using different alphabets, or when parsing a sentence.
ctype[] is an array of 256 short integers, and in each integer, there are bits representing different character types. For example, ctype[;A'] - ctype['Z'] have bits set to show they are upper-case letters of the alphabet; ctype['0']-ctype['9'] have bits set to show they are numeric. To see if a character x is alphanumeric, you can write something like 'if (ctype[x] & (UC | LC | NUM))' which is somewhat faster and much more elegant than writing 'if ('A' = x <= 'Z' || ....'.
Once you start thinking bitwise, you find lots of places to use it. For instance, I had two text buffers. I wrote one to the other, replacing all occurrences of FINDstring with REPLACEstring as I went. Then for the next find-replace pair, I simply switched the buffer indices, so I was always writing from buffer[in] to buffer[out]. 'in' started as 0, 'out' as 1. After completing a copy I simply wrote 'in ^= 1; out ^= 1;'. And after handling all the replacements I just wrote buffer[out] to disk, not needing to know what 'out' was at that time.
If you think this is low-level, consider that certain mental errors such as deja-vu and its twin jamais-vu are caused by cerebral bit errors!
Working with IPv4 addresses frequently requires bit-operations to discover if a peer's address is within a routable network or must be forwarded onto a gateway, or if the peer is part of a network allowed or denied by firewall rules. Bit operations are required to discover the broadcast address of a network.
Working with IPv6 addresses requires the same fundamental bit-level operations, but because they are so long, I'm not sure how they are implemented. I'd wager money that they are still implemented using the bit operators on pieces of the data, sized appropriately for the architecture.
Of course (to me) the answer is yes: there can be practical reasons to learn them. The fact that nowadays, e.g., an add instruction on typical processors is as fast as an or/xor or an and just means that: an add is as fast as, say, an or on those processors.
The improvements in speed of instructions like add, divide, and so on, just means that now on those processors you can use them and being less worried about performance impact; but it is true now as in the past that you usually won't change every adds to bitwise operations to implement an add. That is, in some cases it may depend on which hacks: likely some hack now must be considered educational and not practical anymore; others could have still their practical application.

Is "faked subtraction" ever used in the real world?

I'm taking a Computer Systems class as a pre-req for my Masters and came across something I found fascinating and hard to see practical use of and that is "faking subtraction" and the fact that there doesn't need to be a subtraction instruction.
Something like:
x - y
Can be written as:
x + (~y + 1)
Now, that's all well and good but it seems like that is overly complicated for a simple subtraction, especially when you could just easily put "x - y". Are there situations where it would be necessary to do this, or is it just something that CAN be done but isn't.
This is often how it's done at the hardware level (i.e. inside the ALU).
At the software level, it's generally useless, as it can never be more efficient than the straightfoward subtraction (unless you have a truly bizarre compiler/platform combination).
The two's complement implementation is done in hardware, so you do not need to implement them like that for builtin datatypes.
If you are making an n-bit integer arithmetic library, then you need to emulate the integer addition, subtraction, multiplication and division etc operations, in which case such a technique might be implemented to add the n-bit length numbers, but using the carry flag to do so is a better implementation in my opinion.
It should be obvious that that is how substraction is done internally, so I'm not sure what you mean by "being used in the real world". This is why two's complement was chosen in the first place, because subtraction is just overflowing negative addition.
I do not see any reason to do it in your C code. Doing it in software is no faster than subtracting using the minus operator - and is a lot more unclear.
However, that is the way processors execute subtraction. I bet you have seen this code as an example of what hardware does, since it is easier to see how x + (~y + 1) will become a logic circuit.
So... no, you will not use this code in real world, but this operation is executed a lot of times in your processor.
I couldn't see the point of doing this. It is not anymore efficient. In fact if it's not optimised out by the compiler it ends up generating more opcodes.
Stuff like this was more common back before CPU's had billions of transistors to play with. A particular CPU might not implement a specific subtract opcode, and so a compiler (or assembly program) targeting it would have to know that trick.
These manipulations can also help you understand the internal implementation of CPU's. For example, CPU's division operations are sometimes accomplished by taking the reciprocal of the divisor and multiplying it by the dividend; the reciprocal is the only actual "division" being performed.

Resources