Related
Recently I came across a code that can compute the largest number given two numbers using XOR. While this looks nifty, the same thing can be achieved by a simple ternary operator or an if else. Not pertaining to just this example, but do bitwise operations have any advantage over normal code? If so, is this advantage in speed of computation or memory usage? I am assuming in bitwise operations the assembly code will look much simpler than normal code. On a related note, while programming embedded systems which is more efficient?
*Normal code refers to how you'd normally do it. For example a*2 is normal code and I can achieve the same thing with a<<1
do bitwise operations have any advantage over normal code?
Bitwise operations are normal code. Most compilers these days have optimizers that generate the same instruction for a << 1 as for a * 2. On some hardware, especially on low-powered microprocessors, shift operations take fewer CPU cycles than multiplication, but there is hardware on which this makes no difference.
In your specific case there is an advantage, though: the code with XOR avoids branching, which has a great potential of speeding up the code. When there is no branching, CPU can use pipelining to perform the same operations much faster.
when programming embedded systems which is more efficient?
Embedded systems often have less powerful CPUs, so bitwise operations do have an advantage. For example, on 68HC11 CPU multiplication takes 10 cycles, while shifting left takes only 3.
Note, however, that it does not mean that you should be using bitwise operations explicitly. Most compilers, including embedded ones, will convert multiplication by a constant to a sequence of shifts and additions in case it saves CPU cycles.
Bitwise operators generally have the advantage of being constant time, regardless of input values. Conditional moves and branches may be the target of timing attacks in certain applications, such as crypto libraries, while bitwise operations are not subject to such attacks. (Disregarding cache timing attacks, etc.)
Generally, if a processor is capable of pipelining, it would be more efficient to use bitwise operations than conditional moves or branches, bypassing the entire branch prediction problem. This may or may not speed up your resulting code.
You do have to be careful, though, as some operations constitute undefined behavior in C, such as shifting signed integers, etc. For this reason, it may be to your advantage to do things the "normal" way.
On some platforms branches are expensive, so finding a way to get the min(x,y) without branching has some merit. I think this is particularly useful in CUDA, where the pipelines in the hardware are long.
Of course, on other platforms (like ARM) with conditional execution and compilers that emit those op-codes, it boils down to a compare and a conditional move (two instructions) with no pipeline bubble. Almost certainly better than a compare, and a few logical operations.
Since the poster asks it with the Embedded tag listed, I will try to reflect primarily that in my answer.
In short, usually you shouldn't try to be "creative" with your coding, since it becomes harder to understand later! (The old saying, "premature optimization is the root of all evils")
So only do anything alike when you know what you are doing, precisely, and in any other case, try to write the most understandable C code.
Well, this was the general part, now lets get on what such tricks could do, how they could affect the execution time.
First thing, in embedded, it is good to check the disassembly listing. If you use a variant of GCC with -O2 optimizations, you can usually assume it is quite clever understanding what the code is meant to do, and will produce the result which is likely fine. It can even use such tricks by itself figuring out the code, if it "sees" that it will be faster on the target CPU, so you don't need to ruin the understandability of your code with tricks. With other compilers, results may vary, in doubt, the assembly listing should be observed to see if execution times could be improved utilizing such bit hack tricks.
On the usual embedded platform, especially at 8 bits, you don't need to care that much about pipeline (and related, branch mispredictions) since it is short (or nonexistent). So you usually gain nothing by eliminating a conditional at the cost of an arithmetic operation, and could actually ruin performance by utilizing some elaborate hacks.
On faster 32 bit CPUs usually there is a longer pipeline and branch predictor to eliminate flushes (costing many cycles), so eliminating conditionals may pay off. But only if they are of such nature that the branch predictor can not guess them right (such as comparisons on "random" data), otherwise the conditionals may still be the better, taking the most minimal time (single cycle or even "less" if the CPU is capable to process more than one operation per cycle) when they were predicted right.
I wrote an Ansi C compiler for a friend's custom 16-bit stack-based CPU several years ago but I never got around to implementing all the data types. Now I would like to finish the job so I'm wondering if there are any math libraries out there that I can use to fill the gaps. I can handle 16-bit integer data types since they are native to the CPU and therefore I have all the math routines (ie. +, -, *, /, %) done for them. However, since his CPU does not handle floating point then I have to implement floats/doubles myself. I also have to implement the 8-bit and 32-bit data types (bother integer and floats/doubles). I'm pretty sure this has been done and redone many times and since I'm not particularly looking forward to recreating the wheel I would appreciate it if someone would point me at a library that can help me out.
Now I was looking at GMP but it seems to be overkill (library must be absolutely huge, not sure my custom compiler would be able to handle it) and it takes numbers in the form of strings which would be wasteful for obvious reasons. For example :
mpz_set_str(x, "7612058254738945", 10);
mpz_set_str(y, "9263591128439081", 10);
mpz_mul(result, x, y);
This seems simple enough, I like the api... but I would rather pass in an array rather than a string. For example, if I wanted to multiply two 32-bit longs together I would like to be able to pass it two arrays of size two where each array contains two 16-bit values that actually represent a 32-bit long and have the library place the output into an output array. If I needed floating point then I should be able to specify the precision as well.
This may seem like asking for too much but I'm asking in the hopes that someone has seen something like this.
Many thanks in advance!
Let's divide the answer.
8-bit arithmetic
This one is very easy. In fact, C already talks about this under the term "integer promotion". This means that if you have 8-bit data and you want to do an operation on them, you simply pad them with zero (or one if signed and negative) to make them 16-bit. Then you proceed with the normal 16-bit operation.
32-bit arithmetic
Note: so long as the standard is concerned, you don't really need to have 32-bit integers.
This could be a bit tricky, but it is still not worth using a library for. For each operation, you would need to take a look at how you learned to do them in elementary school in base 10, and then do the same in base 216 for 2 digit numbers (each digit being one 16-bit integer). Once you understand the analogy with simple base 10 math (and hence the algorithms), you would need to implement them in assembly of your CPU.
This basically means loading the most significant 16 bit on one register, and the least significant in another register. Then follow the algorithm for each operation and perform it. You would most likely need to get help from overflow and other flags.
Floating point arithmetic
Note: so long as the standard is concerned, you don't really need to conform to IEEE 754.
There are various libraries already written for software emulated floating points. You may find this gcc wiki page interesting:
GNU libc has a third implementation, soft-fp. (Variants of this are also used for Linux kernel math emulation on some targets.) soft-fp is used in glibc on PowerPC --without-fp to provide the same soft-float functions as in libgcc. It is also used on Alpha, SPARC and PowerPC to provide some ABI-specified floating-point functions (which in turn may get used by GCC); on PowerPC these are IEEE quad functions, not IBM long double ones.
Performance measurements with EEMBC indicate that soft-fp (as speeded up somewhat using ideas from ieeelib) is about 10-15% faster than fp-bit and ieeelib about 1% faster than soft-fp, testing on IBM PowerPC 405 and 440. These are geometric mean measurements across EEMBC; some tests are several times faster with soft-fp than with fp-bit if they make heavy use of floating point, while others don't make significant use of floating point. Depending on the particular test, either soft-fp or ieeelib may be faster; for example, soft-fp is somewhat faster on Whetstone.
One answer could be to take a look at the source code for glibc and see if you could salvage what you need.
Wikipedia, the one true source of knowledge, states:
On most older microprocessors, bitwise
operations are slightly faster than
addition and subtraction operations
and usually significantly faster than
multiplication and division
operations. On modern architectures,
this is not the case: bitwise
operations are generally the same
speed as addition (though still faster
than multiplication).
Is there a practical reason to learn bitwise operation hacks or it is now just something you learn for theory and curiosity?
Bitwise operations are worth studying because they have many applications. It is not their main use to substitute arithmetic operations. Cryptography, computer graphics, hash functions, compression algorithms, and network protocols are just some examples where bitwise operations are extremely useful.
The lines you quoted from the Wikipedia article just tried to give some clues about the speed of bitwise operations. Unfortunately the article fails to provide some good examples of applications.
Bitwise operations are still useful. For instance, they can be used to create "flags" using a single variable, and save on the number of variables you would use to indicate various conditions. Concerning performance on arithmetic operations, it is better to leave the compiler do the optimization (unless you are some sort of guru).
They're useful for getting to understand how binary "works"; otherwise, no. In fact, I'd say that even if the bitwise hacks are faster on a given architecture, it's the compiler's job to make use of that fact — not yours. Write what you mean.
The only case where it makes sense to use them is if you're actually using your numbers as bitvectors. For instance, if you're modeling some sort of hardware and the variables represent registers.
If you want to perform arithmetic, use the arithmetic operators.
Depends what your problem is. If you are controlling hardware you need ways to set single bits within an integer.
Buy an OGD1 PCI board (open graphics card) and talk to it using libpci. http://en.wikipedia.org/wiki/Open_Graphics_Project
It is true that in most cases when you multiply an integer by a constant that happens to be a power of two, the compiler optimises it to use the bit-shift. However, when the shift is also a variable, the compiler cannot deduct it, unless you explicitly use the shift operation.
Funny nobody saw fit to mention the ctype[] array in C/C++ - also implemented in Java. This concept is extremely useful in language processing, especially when using different alphabets, or when parsing a sentence.
ctype[] is an array of 256 short integers, and in each integer, there are bits representing different character types. For example, ctype[;A'] - ctype['Z'] have bits set to show they are upper-case letters of the alphabet; ctype['0']-ctype['9'] have bits set to show they are numeric. To see if a character x is alphanumeric, you can write something like 'if (ctype[x] & (UC | LC | NUM))' which is somewhat faster and much more elegant than writing 'if ('A' = x <= 'Z' || ....'.
Once you start thinking bitwise, you find lots of places to use it. For instance, I had two text buffers. I wrote one to the other, replacing all occurrences of FINDstring with REPLACEstring as I went. Then for the next find-replace pair, I simply switched the buffer indices, so I was always writing from buffer[in] to buffer[out]. 'in' started as 0, 'out' as 1. After completing a copy I simply wrote 'in ^= 1; out ^= 1;'. And after handling all the replacements I just wrote buffer[out] to disk, not needing to know what 'out' was at that time.
If you think this is low-level, consider that certain mental errors such as deja-vu and its twin jamais-vu are caused by cerebral bit errors!
Working with IPv4 addresses frequently requires bit-operations to discover if a peer's address is within a routable network or must be forwarded onto a gateway, or if the peer is part of a network allowed or denied by firewall rules. Bit operations are required to discover the broadcast address of a network.
Working with IPv6 addresses requires the same fundamental bit-level operations, but because they are so long, I'm not sure how they are implemented. I'd wager money that they are still implemented using the bit operators on pieces of the data, sized appropriately for the architecture.
Of course (to me) the answer is yes: there can be practical reasons to learn them. The fact that nowadays, e.g., an add instruction on typical processors is as fast as an or/xor or an and just means that: an add is as fast as, say, an or on those processors.
The improvements in speed of instructions like add, divide, and so on, just means that now on those processors you can use them and being less worried about performance impact; but it is true now as in the past that you usually won't change every adds to bitwise operations to implement an add. That is, in some cases it may depend on which hacks: likely some hack now must be considered educational and not practical anymore; others could have still their practical application.
I'm taking a Computer Systems class as a pre-req for my Masters and came across something I found fascinating and hard to see practical use of and that is "faking subtraction" and the fact that there doesn't need to be a subtraction instruction.
Something like:
x - y
Can be written as:
x + (~y + 1)
Now, that's all well and good but it seems like that is overly complicated for a simple subtraction, especially when you could just easily put "x - y". Are there situations where it would be necessary to do this, or is it just something that CAN be done but isn't.
This is often how it's done at the hardware level (i.e. inside the ALU).
At the software level, it's generally useless, as it can never be more efficient than the straightfoward subtraction (unless you have a truly bizarre compiler/platform combination).
The two's complement implementation is done in hardware, so you do not need to implement them like that for builtin datatypes.
If you are making an n-bit integer arithmetic library, then you need to emulate the integer addition, subtraction, multiplication and division etc operations, in which case such a technique might be implemented to add the n-bit length numbers, but using the carry flag to do so is a better implementation in my opinion.
It should be obvious that that is how substraction is done internally, so I'm not sure what you mean by "being used in the real world". This is why two's complement was chosen in the first place, because subtraction is just overflowing negative addition.
I do not see any reason to do it in your C code. Doing it in software is no faster than subtracting using the minus operator - and is a lot more unclear.
However, that is the way processors execute subtraction. I bet you have seen this code as an example of what hardware does, since it is easier to see how x + (~y + 1) will become a logic circuit.
So... no, you will not use this code in real world, but this operation is executed a lot of times in your processor.
I couldn't see the point of doing this. It is not anymore efficient. In fact if it's not optimised out by the compiler it ends up generating more opcodes.
Stuff like this was more common back before CPU's had billions of transistors to play with. A particular CPU might not implement a specific subtract opcode, and so a compiler (or assembly program) targeting it would have to know that trick.
These manipulations can also help you understand the internal implementation of CPU's. For example, CPU's division operations are sometimes accomplished by taking the reciprocal of the divisor and multiplying it by the dividend; the reciprocal is the only actual "division" being performed.
1) I've got many constants in my C algo.
2) my code works both in floating-point and fixed-point.
Right now, these constants are initialized by a function, float2fixed, whereby in floating-point it does nothing, while in fixed-point, it finds their fixed-point representation. For instance, 0.5f stays 0.5f if working in floating-point, whereas it uses the pow() routine and becomes 32768 if working in fixed-point and the fixed-point representation is Qx.16.
That's easy to maintain, but it takes a lot of time actually to compute these constants in fixed-point (pow is a floatin-point function). In C++, I'd use some meta-programming, so the compiler computes these values at compile-time, so there's no hit at run-time. But in C, thats not possible. Or is it? Anybody knows of such a trick? Is any compiler clever enough to do that?
Looking forward to any answers.
A
Rather than using (unsigned)(x*pow(2,16)) to do your fixed point conversion, write it as (unsigned)(0.5f * (1 << 16))
This should be an acceptable as a compile-time constant expression since it involves only builtin operators.
When using fixed-point, can you write a program that takes your floating point values and converts them into correct, constant initializers for the fixed point type, so you effectively add a step to the compilation that generates the fixed point values.
One advantage of this will be that you can then define and declare your constants with const so that they won't change at run-time - whereas with the initialization functions, of course, the values have to be modifiable because they are calculated once.
I mean write a simple program that can scan for formulaic lines that might read:
const double somename = 3.14159;
it would read that and generate:
const fixedpoint_t somename = { ...whatever is needed... };
You design the operation to make it easy to manage for both notations - so maybe your converter always reads the file and sometimes rewrites it.
datafile.c: datafile.constants converter
converter datafile.constants > datafile.c
In plain C, there's not much you can do. You need to do the conversion at some point, and the compiler doesn't give you any access to call interesting user-provided functions at compile time. Theoretically, you could try to coax the preprocessor to do it for you, but that's the quick road to total insanity (i.e. you'd have to implement pow() in macros, which is pretty hideous).
Some options I can think of:
Maintain a persistent cache on disk. At least then it'd only be slow once, though you still have to load it, make sure it's not corrupt, etc.
As mentioned in another comment, use template metaprogramming anyway and compile with a C++ compiler. Most C works just fine (arguably better) with a C++ compiler.
Hmm, I guess that's about all I can think of. Good luck.
Recent versions of GCC ( around 4.3 ) added the ability to use GMP and MPFR to do some compile-time optimisations by evaluating more complex functions that are constant. That approach leaves your code simple and portable, and trust the compiler to do the heavy lifting.
Of course, there are limits to what it can do, and it would be hard to know if it's optimizing a given instance without going and looking at the assembly. But it might be worth checking out. Here's a link to the description in the changelog