Optimized way to find new bits on - c

We need to find new bits turned ON in interlock status received from the device compared to the last status read. This is for firing error codes for bits that are newly set. I am using the following statement.
bits_on =~last_status & new_status;
Is there any better ways to do this?

It's only 2 operations and an assignment, so the only way to improve it would be to do it in 1 operation and an assignment. It doesn't correspond to any of the simple C bit manipulation operators, so doing it in 1 operation is not possible.
However, depending on your architecture your compiler might actually already be compiling it to a single instruction.
ANDN (Logical AND NOT) is part of the BMI1 instruction set, and is equivalent to ~x & y.

Related

C speed of comparison: Equals "==" vs Bitwise and "&"

Suppose I have an integer that is a power of 2, eg. 1024:
int a = 1 << 10; //works with any power of 2 no.
Now I want to check whether another integer b is the same as a. Which is faster/better (especially on weak embedded systems):
if (b == a) {}
or
if (b & a) {}
?
Sorry if this is a noob question, but couldn't find an answer using the search.
edit: thanks for many insightful answers. I could select only one of them, but all of them are welcome.
These operations are not even equivalent, because a & b will be false when both a and b are 0.
So I'd suggest to express the semantics that you want (i.e. a == b) and let the compiler to the optimization.
If you then measuer that you have performance issues at that point, then you can start analyzing/optimizing...
The short answer is this - it depends on what sort of things you're comparing. However, in this case, I'll assume that you're comparing two variables to each other (as opposed to a variable and an immediate, etc.)
This website, although rather old, studied how many clock cycles different instructions took on the x86 platform. The two instructions we're interested in here are the "AND" instruction and the "CMP" instruction (which the compiler uses for & and == respectively). What we can see here is that both of these instructions take about 1/3 of a cycle - that is to say, you can execute 3 of them in 1 cycle on average. Compare this to the "DIV" instruction which (in 1996) took 23 cycles to execute.
However, this omits one important detail. An "AND" instruction is not sufficient to complete the behavior you're looking for. In fact, a brief compilation on x86_64 suggests that you need both an "AND" and a "TEST" instruction for the "&" version, while "==" simply uses the "CMP" instruction. Because all these instructions are otherwise equivalent in IPC, the "==" will in fact be slightly faster...as of 1996.
Nowadays, processors optimize so well at the bare metal layer that you're unlikely to notice a difference. That said, if you wanted to see for sure...simply write a test program and find out for yourself.
As noted above though, even in the case that you have a power of 2, these instructions are still not equivalent, since it doesn't work for 0. Well...I guess technically zero ISN'T a power of 2. :) However you want to spin it though, use "==".
An X86 CPU sets a flag according to how the result of any operation compares to zero.
For the ==, your compiler will either use a dedicated compare instruction or a subtraction, setting this flag in both cases. The if() is then implemented by a jump that is conditional on this bit.
For the &, another instructions is used, the logical bitwise and instruction. That too sets the flag appropriately. So, again, the next instruction will be the conditional branch.
So, the question boils down to: Is there a performance difference between a subtraction and a bitwise and instruction? And the answer is "no" on any sane architecture. Both instructions use the same ALU, both set the same flags, and this ALU is typically designed to perform a subtraction in a single clock cycle.
Bottom line: Write readable code, and don't try to microoptimize what cannot be optimized.

Translate C program to other programming languages

I am trying to translate a C program. The destination language doesn't really matter, I am just trying to understand what every single part of the program is doing.
I cannot find any detail about:
variable=1;
while(variable);
I understand that this is a loop and that is true (I have read similar questions on stack overflow where a code was actually executed) but in this case there is no code related to this while. So I am wondering, is the program "sleeping" - while this while is executing?
Then, another part I don't understand is:
variable=0;
variable=variable^0x800000;
I believe that value should be 24bits but is this really needed in any other programming language that is not low level as C?
Many thanks
while(variable); implements a spin-lock; i.e. this thread will remain at this statement until variable is 0. I've introduced the term to help you search for a good technique in your new language.
It obviously burns the CPU, but can be quite an efficient way of doing this if only used for a few clock cycles. For it to work well, variable needs to be qualified with volatile.
variable = variable ^ 0x800000; is an XOR operation, actually a single bit toggle in this case. (I would have preferred to see variable ^= 0x800000 in multi-threaded code.) Its exact use is probably explainable from its context. Note that the arguments of the XOR are promoted to int if they are smaller than that. It's doubtful that variable^0x800000 is a 24 bit type unless int is that size on your platform (unlikely but possible).
I am trying to translate a C program.
Don't translate a C program, unless you are writing a compiler (sometimes called a transpiler - or source to source compiler -, if translating to some other programming language different of assembler) which would do such task. And you'll need a lot of work (at least several months for a naive compiler à la TinyCC, and more probably many dozens of years)
Think in C and try to understand its semantics (much more important than its syntax).
while(variable);
that loop has an empty body. It is more readable to make that empty body apparent (semantics remain the same):
while(variable) {};
Since the body (and the test) of the loop don't change variable (it has no observable side-effect) the loop will run indefinitely as soon as the initial value of variable is non-zero. This will heat your processor.
But you might have declared that variable as volatile and then have something external changing it.
variable=variable^0x800000;
The ^ is a bitwise XOR. You are toggling (replacing 0 with 1 and 1 with 0) a single bit (the 23rd one, IIRC)
To answer your second question:
variable=0;
variable=variable^0x800000;
This operation is a bitwise operation called XOR.
An XOR operation is usually used to toggle bits regardless of it's previous state:
0 ^ 1 = 1
1 ^ 1 = 0

bitwise - how are the bitmasks operations implemented?

Context
I am using a lot of bitwise operations but I don't even know how they are implemented at the lowest level possible.
I would like to see how the intel/amd devs achieve to implement such operations. Not to replace them in my code, that would be dumb.. But to get a broader catch on what is going on.
I tried to find some info but most of the time, people ask about its use or to replace it with other bitwise operations, which is not the case here.
Questions
Is it doing basic iterations in assembly(sse) over the 32 bits and compare ?
Are there some tricks to get it up to speed ?
Thanks
Most all are implemented directly on the CPU, as basic, native instructions, not part of SSE. These are the oldest, most basic operations on the CPU register.
As to how and, or, xor, etc. are implemented, if you are really interested, look up digital logic design, or discrete math. Lookup up Flip-flops, AND gates, or NAND / NOR / XOR gates
https://en.wikipedia.org/wiki/NAND_logic
Also lookup K-maps (Karnaugh maps), these are what you can use to implement a logic circuit by hand.
https://en.wikipedia.org/wiki/Karnaugh_map
If you really enjoy the reading, you can signup for a digital logic design class if you have access to an engineering or computer science university. You will get to build logic circuits with large ICs on a breadboard, but nowadays most CPUs are "written" with code, like software, and "printed" on a silicon wafer.
Of particular interest is NAND and NOR due to their functional completeness (you can use NAND or NOR to construct any truth table).
NAND (logical symbol looks like =Do-)
A
=Do- Q is Q = NOT(A AND B)
B
Truth table
A B Q
0 0 1
0 1 1
1 0 1
1 1 0
You can rewrite any logic with NAND.
As you can also see, its pretty efficient, you can't get any lower level than a single gate with binary (though there is ternary / tri-state logic), so its a single clock state change. So for a 64-bit CPU register, you'll need 64 of these babies side by side, PER register... PER core... PER instruction. And that is only the "logical" registers. Because advanced processors (like Intel Core) do register renaming, you have more physical registers in silicon than logically available to you by name.
AND, OR, XOR, and NOT operations are implemented quite efficiently in silicon, and so are generally a single-cycle native instruction on most processors. That is, for a 16-bit processor, whole 16-bit registers are ANDed at once; on a 32-bit processor, 32 bits at once, etc. The only performance issue you might want to be aware of is alignment: on an ARM processor, for example, if a 32-bit value starts at a memory address that is a multiple of 4, then a read-modify-write can be done in two or three cycles. If it's at an odd address, it has to do two reads at the neighboring aligned addresses and two writes, and so is slower.
Bit shifting in some older processors may involve looping over single shifts. That is, 1 << 5 will take longer than 1 << 2. But most modern processors have what is called a "barrel shifter" that equalizes all shifts up to the register size, so on a Pentium, 1 << 31 takes no longer than 1 << 2.
Addition and subtraction are fast primitives as well. Multiplication and division are tricky: these are mostly implemented as microcode loops. Multiplication can be sped up by unrolling the loops into huge swaths of silicon in a high-end processor, but division cannot, so generally division is the slowest basic operation in a microprocessor.
Bitwise operations are what processors are made of, so it is natural to expose those operations with instructions. Operations like AND, OR, XOR, NOR, NAND and NOT can be performed by the ALU with only a few logic gates per bit. Importantly, each bit of the result only relies on two bits of the input (unlike multiplication or addition), so the entire operation can proceed in parallel without any complication.
As you know, data in computers is represented in a binary format.
For example, if you have the integer 13 it's represented as 1101b (where b means binary). This works out to (1) * 8 + (1) * 4 + (0) * 2 + (1) * 1 = 13, just like (1) * 10 + (3) * 1 = 13 -- different bases.
However, for basic operations computers need to know how much data you're working with. A typical integer size is 32 bits. So it's not just 1101b, it's 00000000000000000000000000001101b -- 32 bits, most of them unused.
Bitwise operations are just that -- they operate only on a bit level. Adding, multiplying, and other operations consider multiple bits at a time to perform their function, but bitwise operators do not. For example:
What's 12 bitwise-and 7? (in C vernacular, 12 & 7)
1010b 12 &
0111b 7
----- =
0010n 2
Why? Think vertically! Look at the left set of digits -- 1 and 0 is 0. Then, 0 and 1 is 0. Then, 1 and 1 is 1. Finally, 0 and 1 is 0.
This is based on the and truth table that states these rules -- that only true (aka 1) and true (aka 1) results in false (aka 0). All other resultant values are false (aka 0).
Likewise, the or truth table states that all results are true (aka 1) except for false (aka 0) and false (aka 0) which results in false (aka 0).
Let's do the same example, but this time let's computer 12 bitwise-or 7. (Or in C vernacular, 12 | 7)
1010b 12 |
0111b 7
----- =
1111n 15
And finally, let's consider one other principal bitwise operator: not. This is a unary operator where you simply flip each bit. Let's compute bitwise-not 7 (or in C vernacular, ~7)
0111b ~7
----- =
1000b 8
But wait.. What about all those leading zeroes? Well, yes, before I was omitting them because they weren't important, but now they surely are:
00000000000000000000000000000111b ~7
--------------------------------- =
11111111111111111111111111111000b ... big number?
If you're instructing the computer to treat the result as an unsigned integer (32-bit), that's a really big number. (Little less than 4 billion). If you're instructing the computer to treat the result as a signed integer (32-bit) that's -8.
As you may have guessed, since the logic is really quite simple for all these operations, there's not much you can do to make them individually faster. However, bitwise operations obey the same logic as boolean logic, and thus you can use boolean logic reduction techniques to reduce the number of bitwise operations you may need.
e.g. (A & B) | (A & C) results in the same as A & (B | C)
However, that's a much larger topic. Karnaugh maps are one technique, but boolean algebra is usually what I end up using while programming.

Are bitwise operations still practical?

Wikipedia, the one true source of knowledge, states:
On most older microprocessors, bitwise
operations are slightly faster than
addition and subtraction operations
and usually significantly faster than
multiplication and division
operations. On modern architectures,
this is not the case: bitwise
operations are generally the same
speed as addition (though still faster
than multiplication).
Is there a practical reason to learn bitwise operation hacks or it is now just something you learn for theory and curiosity?
Bitwise operations are worth studying because they have many applications. It is not their main use to substitute arithmetic operations. Cryptography, computer graphics, hash functions, compression algorithms, and network protocols are just some examples where bitwise operations are extremely useful.
The lines you quoted from the Wikipedia article just tried to give some clues about the speed of bitwise operations. Unfortunately the article fails to provide some good examples of applications.
Bitwise operations are still useful. For instance, they can be used to create "flags" using a single variable, and save on the number of variables you would use to indicate various conditions. Concerning performance on arithmetic operations, it is better to leave the compiler do the optimization (unless you are some sort of guru).
They're useful for getting to understand how binary "works"; otherwise, no. In fact, I'd say that even if the bitwise hacks are faster on a given architecture, it's the compiler's job to make use of that fact — not yours. Write what you mean.
The only case where it makes sense to use them is if you're actually using your numbers as bitvectors. For instance, if you're modeling some sort of hardware and the variables represent registers.
If you want to perform arithmetic, use the arithmetic operators.
Depends what your problem is. If you are controlling hardware you need ways to set single bits within an integer.
Buy an OGD1 PCI board (open graphics card) and talk to it using libpci. http://en.wikipedia.org/wiki/Open_Graphics_Project
It is true that in most cases when you multiply an integer by a constant that happens to be a power of two, the compiler optimises it to use the bit-shift. However, when the shift is also a variable, the compiler cannot deduct it, unless you explicitly use the shift operation.
Funny nobody saw fit to mention the ctype[] array in C/C++ - also implemented in Java. This concept is extremely useful in language processing, especially when using different alphabets, or when parsing a sentence.
ctype[] is an array of 256 short integers, and in each integer, there are bits representing different character types. For example, ctype[;A'] - ctype['Z'] have bits set to show they are upper-case letters of the alphabet; ctype['0']-ctype['9'] have bits set to show they are numeric. To see if a character x is alphanumeric, you can write something like 'if (ctype[x] & (UC | LC | NUM))' which is somewhat faster and much more elegant than writing 'if ('A' = x <= 'Z' || ....'.
Once you start thinking bitwise, you find lots of places to use it. For instance, I had two text buffers. I wrote one to the other, replacing all occurrences of FINDstring with REPLACEstring as I went. Then for the next find-replace pair, I simply switched the buffer indices, so I was always writing from buffer[in] to buffer[out]. 'in' started as 0, 'out' as 1. After completing a copy I simply wrote 'in ^= 1; out ^= 1;'. And after handling all the replacements I just wrote buffer[out] to disk, not needing to know what 'out' was at that time.
If you think this is low-level, consider that certain mental errors such as deja-vu and its twin jamais-vu are caused by cerebral bit errors!
Working with IPv4 addresses frequently requires bit-operations to discover if a peer's address is within a routable network or must be forwarded onto a gateway, or if the peer is part of a network allowed or denied by firewall rules. Bit operations are required to discover the broadcast address of a network.
Working with IPv6 addresses requires the same fundamental bit-level operations, but because they are so long, I'm not sure how they are implemented. I'd wager money that they are still implemented using the bit operators on pieces of the data, sized appropriately for the architecture.
Of course (to me) the answer is yes: there can be practical reasons to learn them. The fact that nowadays, e.g., an add instruction on typical processors is as fast as an or/xor or an and just means that: an add is as fast as, say, an or on those processors.
The improvements in speed of instructions like add, divide, and so on, just means that now on those processors you can use them and being less worried about performance impact; but it is true now as in the past that you usually won't change every adds to bitwise operations to implement an add. That is, in some cases it may depend on which hacks: likely some hack now must be considered educational and not practical anymore; others could have still their practical application.

Embedded C: what does var = 0xFF; do?

I'm working with embedded C for the first time. Although my C is rusty, I can read the code but I don't really have a grasp on why certain lines are the way the are. For example, I want to know if a variable is true or false and send it back to another application. Rather than setting the variable to 1 or 0, the original implementor chose 0xFF.
Is he trying to set it to an address space? or else why set a boolean variable to be 255?
0xFF sets all the bits in a char.
The original implementer probably decided that the standard 0 and 1 wasn't good enough and decided that if all bits off is false then all bits on is true.
That works because in C any value other than 0 is true.
Though this will set all bytes in a char, it will also work for any other variable type, since any one bit being set in a variable makes it true.
If you are in desperate need of memory, you might want to store 8 booleans in one byte (or 32 in a long, or whatever)
This can easily be done by using a flag mask:
// FLAGMASK = ..1<<n for n in 0..7...
FLAGMASK = 0x10; // e.g. n=4
flags &= ~FLAGMASK; // clear bit
flags |= FLAGMASK; // set bit
flags ^= FLAGMASK; // flip bit
flags = (flags & ~FLAGMASK) | (booleanFunction() & FLAGMASK); // clear, then maybe set
this only works when booleanFunction() returns 0 (all bits clear) or -1 (all bits set).
0xFF is the hex representation of ~0 (i.e. 11111111)
In, for example, VB and Access, -1 is used as True.
These young guys, what do they know?
In one of the original embedded languages - PL/M (-51 yes as in 8051, -85, -86, -286, -386) - there was no difference between logical operators (!, &&, || in C) and bitwise (~, &, |, ^). Instead PL/M has NOT, AND, OR and XOR taking care of both categories. Are we better off with two categories? I'm not so sure. I miss the logical ^^ operator (xor) in C, though. Still, I guess it would be possible to construct programs in C without having to involve the logical category.
In PL/M False is defined as 0. Booleans are usually represented in byte variables. True is defined as NOT False which will give you 0ffh (PL/M-ese for C's 0xff).
To see how the conversion of the status flag carry took place defore being stored in a byte (boolean wasn't available as a type) variable, PL/M could use the assembly instruction "sbb al,al" before storing. If carry was set al would contain 0ff, if it wasn't it would contain 0h. If the opposite value was required, PL/M would insert a "cmc" before the sbb or append a "not al" after (actually xor - one or the other).
So the 0xff for TRUE is a direct compatibility port from PL/M. Necessary? Probably not, unless you're unsure of your skills (in C) AND playing it super safe.
As I would have.
PL/M-80 (used for the 8080, 8085 and Z80) did not have support for integers or floats, and I suspect it was the same for PL/M-51. PL/M-86 (used for the 8086, 8088, 80188 and 80186) added integers, single precision floating point, segment:offset pointers and the standard memory models small, medium, compact and large. For those so inclined there were special directives to create do-it-yourself hybrid memory models. Microsoft's huge memory model was equivalent to intel's large. MS also sported tiny, small, compact, medium and large models.
Often in embedded systems there is one programmer who writes all the code and his/her idiosyncrasies are throughout the source. Many embedded programmers were HW engineers and had to get a system running as best they could. There was no requirement nor concept of "portability". Another consideration in embedded systems is the compiler is specific for the CPU HW. Refer to the ISA for this CPU and check all uses of the "boolean".
As others have said, it's setting all the bits to 1. And since this is embedded C, you might be storing this into a register where each bit is important for something, so you want to set them all to 1. I know I did similar when writing in assembler.
What's really important to know about this question is the type of "var". You say "boolean", but is that a C++/C99's bool, or is it (highly likely, being an embedded C app), something of a completely different type that's being used as a boolean?
Also adding 1 to 0xff sets it to 0( assuming unsigned char) and the checking might have been in a loop with an increment to break.
Here's a likely reason: 0xff is the binary complement of 0. It may be that on your embedded architecture, storing 0xff into a variable is more efficient than storing, say, 1 which might require extra instructions or a constant stored in memory.
Or perhaps the most efficient way to check the "truth value" of a register in your architecture is with a "check bit set" instruction. With 0xff as the TRUE value, it doesn't matter which bit gets checked... they're all set.
The above is just speculation, of course, without knowing what kind of embedded processor you're using. 8-bit, 16-bit, 32-bit? PIC, AVR, ARM, x86???
(As others have pointed out, any integer value other than zero is considered TRUE for the purposes of boolean expressions in C.)

Resources