could logical negation have been implemented as bitwise negation in legacy compilers? - c

I just found legacy code which tests a flag like this:
if( some_state & SOME_FLAG )
So far, so good!
But further in code, I see an improper negation
if( ! some_state & SOME_FLAG )
My understanding is that it is interpreted as (! some_state) & SOME_FLAG which is probably a bug, and gcc logically barks with -Wlogical-not-parentheses...
Though it could eventually have worked in the past if ever !some_state was implemented as ~some_state by some legacy compiler. Does anyone know if it was possibly the case?
EDIT
sme_state is declared as int (presumably 32 bits, 2 complement on target achitecture).
SOME_FLAG is a constant set to a single bit 0x00040000, so SOME_FLAG & 1 == 0

Logical negation and bitwise negation have never been equivalent. No conforming compiler could have implemented one as the other. For example, the bitwise negation of 1 is not 0, so ~1 != !1.
It is true that the expression ! some_state & SOME_FLAG is equivalent to (! some_state) & SOME_FLAG because logical negation has higher precedence than bitwise and. That is indeed suspicious, but the original code is not necessarily in error. In any case, it is more likely that the program is buggy in this regard than that any C implementation evaluated the original expression differently than the current standard requires, even prior to standardization.
Since the expressions (! some_state) & SOME_FLAG and !(some_state & SOME_FLAG) will sometimes evaluate to the same value -- especially if SOME_FLAG happens to expand to 1 -- it is also possible that even though they are inequivalent, their differences do not manifest during actual execution of the program.

While there was no standard before 1989, and thus compilers could do things as they wished, no compiler to my knowledge has ever done this; changing the meaning of operators wouldn't be a smart call if you want people to use your compiler.
There's very little reason to write an expression like (!foo & FLAG_BAR); the result is just !foo if FLAG_BAR is odd or always zero if it is even. What you've found is almost certainly just a bug.

It would not be possible for a legacy compiler to implement ! as bitwise negation, because such approach would produce incorrect results in situations when the value being negated is outside the {0, 0xFF...FF} set.
Standard requires the result of !x to produce zero for any non-zero value of x. Hence, applying ! to, say, 1 would yield 0xFF..FFFE, which is non-zero.
The only situation when the legacy code would have worked as intended is when SOME_FLAG is set to 1.

Let's start with the most interesting (and least obvious) part: gcc logically barks with -Wlogical-not-parentheses. What does this mean?
C has two different operators that have similar looking characters (but different behaviour and intended for very different purposes) - the & which is a bitwise AND, and && which is a boolean AND. Unfortunately this led to typos, in the same way that typing = when you meant == can cause problems, so some compilers (GCC) decided to warn people about "& without parenthesis used as a condition" (even though it's perfectly legal) to reduce the risk of typos.
Now...
You're showing code that uses & (and not showing code that uses &&). This implies that some_state is not a boolean and is number. More specifically it implies that each bit in some_state may be completely independent and unrelated.
For an example of this, let's pretend that we're implementing a Pacman game and need a nice compact way to store the map for each level. We decide that each tile in the map might be a wall or not, might be a collected dot or not, might be power pill or not, and might be a cherry or not. Someone suggests that this can be an array of bytes, like this (assuming the map is 30 tiles wide and 20 tiles high):
#define IS_WALL 0x01
#define HAS_DOT 0x02
#define HAS_POWER_PILL 0x04
#define HAS_CHERRY 0x08
uint8_t level1_map[20][30] = { ..... };
If we want to know if a tile happens to be safe to move into (no wall) we could do this:
if( level1_map[y][x] & IS_WALL == 0) {
For the opposite, if we want to know if a tile is a wall we could do any of these:
if( level1_map[y][x] & IS_WALL != 0) {
if( !level1_map[y][x] & IS_WALL == 0) {
if( level1_map[y][x] & IS_WALL == IS_WALL) {
..because it makes no difference which one it is.
Of course (to avoid the risk of typos) GCC might (or might not) warn about some of these.

Related

C speed of comparison: Equals "==" vs Bitwise and "&"

Suppose I have an integer that is a power of 2, eg. 1024:
int a = 1 << 10; //works with any power of 2 no.
Now I want to check whether another integer b is the same as a. Which is faster/better (especially on weak embedded systems):
if (b == a) {}
or
if (b & a) {}
?
Sorry if this is a noob question, but couldn't find an answer using the search.
edit: thanks for many insightful answers. I could select only one of them, but all of them are welcome.
These operations are not even equivalent, because a & b will be false when both a and b are 0.
So I'd suggest to express the semantics that you want (i.e. a == b) and let the compiler to the optimization.
If you then measuer that you have performance issues at that point, then you can start analyzing/optimizing...
The short answer is this - it depends on what sort of things you're comparing. However, in this case, I'll assume that you're comparing two variables to each other (as opposed to a variable and an immediate, etc.)
This website, although rather old, studied how many clock cycles different instructions took on the x86 platform. The two instructions we're interested in here are the "AND" instruction and the "CMP" instruction (which the compiler uses for & and == respectively). What we can see here is that both of these instructions take about 1/3 of a cycle - that is to say, you can execute 3 of them in 1 cycle on average. Compare this to the "DIV" instruction which (in 1996) took 23 cycles to execute.
However, this omits one important detail. An "AND" instruction is not sufficient to complete the behavior you're looking for. In fact, a brief compilation on x86_64 suggests that you need both an "AND" and a "TEST" instruction for the "&" version, while "==" simply uses the "CMP" instruction. Because all these instructions are otherwise equivalent in IPC, the "==" will in fact be slightly faster...as of 1996.
Nowadays, processors optimize so well at the bare metal layer that you're unlikely to notice a difference. That said, if you wanted to see for sure...simply write a test program and find out for yourself.
As noted above though, even in the case that you have a power of 2, these instructions are still not equivalent, since it doesn't work for 0. Well...I guess technically zero ISN'T a power of 2. :) However you want to spin it though, use "==".
An X86 CPU sets a flag according to how the result of any operation compares to zero.
For the ==, your compiler will either use a dedicated compare instruction or a subtraction, setting this flag in both cases. The if() is then implemented by a jump that is conditional on this bit.
For the &, another instructions is used, the logical bitwise and instruction. That too sets the flag appropriately. So, again, the next instruction will be the conditional branch.
So, the question boils down to: Is there a performance difference between a subtraction and a bitwise and instruction? And the answer is "no" on any sane architecture. Both instructions use the same ALU, both set the same flags, and this ALU is typically designed to perform a subtraction in a single clock cycle.
Bottom line: Write readable code, and don't try to microoptimize what cannot be optimized.

AVR: if statement

I am new in AVR programming. I would like to control a variable (uint8_t received_msg) if it is equal to 0xFF. would it be correct to do:
if (!(received_msg ^ 0xFF))
or do I need to compare bit by bit
uint8_t test = 0;
test = received_msg ^ 0xFF
for (i =0; i<8; i++){
test = 0 & (1<<received_msg)
}
if(test==0)
If you want to know if a variable is equal to 0xff, just test for equality:
if (received_message == 0xff)
Your question had fairly little to do with the AVR but some mistaken ideas about how compilers and microcontrollers work. That's not a complaint that it's a bad question - any question that helps you learn is good!
(TLDR: "use bitwise operators" is only in contrast to AVR specific stuff, feel absolutely free to use all your normal operations.)
First, you've expressed what you want to do - an equality test - in English. The whole point of a programming language like C is to allow you to express computed operations in a fairly readable manner, so use the most obvious (and thus clear) translation of received_msg == 0xFF - it is the compiler's job to convert this into code for the specific computer (AVR), and even if it does a horrible job of it it will waste no more than a few microseconds. (It doesn't, but if you make the code convoluted enough it can fail to do an excellent job.)
Second, you've attempted to express the same operation - comparing every bit against a set value, and collecting the result to see if they were all equal - in two other manners. This gets tricky both to read and write, as is shown by the bugs in the second version, but more importantly the second version shows a misunderstanding of what C's bitwise operators do. Bitwise here means each bit of a value is processed independent of the other bits; they are still all processed. Therefore splitting it into a loop is not needed, and only makes the job of both programmer and compiler harder. The technique used to make bitwise operators only affect single bits, not to be confused with which they operate on, is known as masking; it relies on properties like "0 or n = n", "1 and n = n", and "0 xor n = n".
I'm also getting the impression this was based around the idea that a microcontroller like the AVR would be working on individual bits all the time. This is extremely rare, but frequently emulated by PLCs. What we do have is operations making single bit work less costly than on general purpose CPUs. For instance, consider "PORTB |= 1<<3". This can be read as a few fundamental operations:
v0 := 1 // load immediate
v1 := 3
v2 := v0 shiftleft v1 // shift left
v3 := PORTB // load I/O register
v4 := v3 or v2
PORTB := v4 // store back to I/O register
This interpretation would be an extremely reduced instruction set, where loads and stores never combine with ALU operations such as shift and or. You may even get such code out of the compiler if you ask it not to optimize at all. But since it's such a common operation for a microcontroller, the AVR has a single instruction to do this without spending registers on holding v0-v4:
SBI PORTB, 3 // (set bit in I/O register)
This brings us from needing two registers (from reusing vN which are no longer needed) and six instructions to zero registers and one instruction. Further gains are possible because once it's a single instruction, one can use a skip instead of a branch. But it relies on a few things being known, such as 1<<3 setting only a single, fixed bit, and PORTB being among the lowest 32 I/O registers. If the compiler did not know these things, it could never use the SBI instructions, and there was such a time. This is why we have the advice "use the bitwise operators" - you no longer need to write sbi(PORTB,PB3);, which is inobvious to people who don't know the AVR instruction set, but can now write PORTB |= 1<<3; which is standard C, and therefore clearer while being just as effective. Arguably better macro naming might make more readable code too, but many of these macros came along as typing shorthands instead - for instance _BV(x) which is equal to 1<<x.
Sadly some of the standard C formulations become rather tricky, like clearing bit N: port &= ~(1<<N); It makes a pretty good case for a "clear_bit(port, bit)" macro, like Arduino's digitalWrite. Some microcontrollers (such as 8051) provide specific addresses for single bit work, and some compilers provide syntax extensions such as port.3. I sometimes wonder why AVR Libc doesn't declare bitfields for bit manipulation. Pardon the rant. There also remain some optimizations the compiler doesn't know of, such as converting PORTB ^= x; into PINB = x; (which really looks weird - PIN registers aren't writable, so they used that operation for another function).
See also the AVR Libc manual section on bit manipulation, particularly "Porting programs that use the deprecated sbi/cbi macros".
You can also try useful switch(){ case } statement like :
#define OTHER_CONST_VALUE 0x19
switch(received_msg){
case 0xff:
do_this();
break;
case 0x0f:
do_that();
break;
case OTHER_CONST_VALUE:
do_other_thing();
break;
case 1:
case 2:
received_1_or_2();
break;
default:
received_somethig_else();
break;
}
this code will execute command depending on value of received_msg, it is important to place constant value after case word, and be careful with break statement it tells when jump off from { } block.
I'm unsure of what received_msg will be representing. If it is a numerical value, than by all means use a switch-case, if-else or other structure of comparison; no need for a bitmask.
However, if received_msg contains binary data and you only want to look at certain elements and exclude others, a bitmask would be the appropriate approach.

Why might an enum bit mask be checked with an extra comparison?

I've come across this a few times recently:
if ((flags & PERFORM_DELETION_CONCURRENTLY) == PERFORM_DELETION_CONCURRENTLY)
...
What's the reason for the extra comparison? Why not this?
if (flags & PERFORM_DELETION_CONCURRENTLY)
...
My guess is that it's a leftover habit to silence warnings from the days of yore when compilers were more strict.
There is also the possibility that in the mask there is more than one bit set. In that case, the two comparisons have different semantics.
To be precise, the condition is true iff all of the bits in PERFORM_DELETION_CONCURRENTLY are set in flags. A slightly more efficient way (on some architectures) to do this is if ((~flags & PERFORM_DELETION_CONCURRENTLY) == 0) ... which I bury in an ALL_BITS_SET macro in my standard header file, which makes the code more readable/understandable as well.

~0 vs. !0 vs. 1 in C

I was reading some code today by someone I consider to be a reasonable programmer, and I noticed they used a =~0 to set a loop quit variable.
Is there any compelling reason to do this rather than simply quit = 1;?
I'm mostly just curious, before I go ahead and change it. Thanks!
Example:
while(!quit){
...;
if(!strcmp(s, "q"))
quit=~0;
}
There is no strong reason for that, unless some other code tests quit differently, such as testing any other bit. !0 is one, but ~0 is -1 on all modern architectures.
On some architectures ~0 is faster than !0, though that should be optimized away by any decent compiler.
~0 is usually -1, while !0 is defined to be 1.
Of course, !~0 and !!0 are both 0, so there is no compelling reason to use one or the other, aside from the fact that ~0 is non-idiomatic (meaning that people won't know what the heck you're doing).
In C, ~ is the unary operator for ones compliment which flips bits to their opposite state. So technically the code works because the if() clause is only evaluating for the presence of 0 (false) or anything not 0 (true). Frankly I think this is a bit overkill for evaluating true or false. I'd only consider using it if I was truly doing evaluations on a bitwise basis. Maybe there could be an argument made for performance but I somehow doubt it.

Embedded C: what does var = 0xFF; do?

I'm working with embedded C for the first time. Although my C is rusty, I can read the code but I don't really have a grasp on why certain lines are the way the are. For example, I want to know if a variable is true or false and send it back to another application. Rather than setting the variable to 1 or 0, the original implementor chose 0xFF.
Is he trying to set it to an address space? or else why set a boolean variable to be 255?
0xFF sets all the bits in a char.
The original implementer probably decided that the standard 0 and 1 wasn't good enough and decided that if all bits off is false then all bits on is true.
That works because in C any value other than 0 is true.
Though this will set all bytes in a char, it will also work for any other variable type, since any one bit being set in a variable makes it true.
If you are in desperate need of memory, you might want to store 8 booleans in one byte (or 32 in a long, or whatever)
This can easily be done by using a flag mask:
// FLAGMASK = ..1<<n for n in 0..7...
FLAGMASK = 0x10; // e.g. n=4
flags &= ~FLAGMASK; // clear bit
flags |= FLAGMASK; // set bit
flags ^= FLAGMASK; // flip bit
flags = (flags & ~FLAGMASK) | (booleanFunction() & FLAGMASK); // clear, then maybe set
this only works when booleanFunction() returns 0 (all bits clear) or -1 (all bits set).
0xFF is the hex representation of ~0 (i.e. 11111111)
In, for example, VB and Access, -1 is used as True.
These young guys, what do they know?
In one of the original embedded languages - PL/M (-51 yes as in 8051, -85, -86, -286, -386) - there was no difference between logical operators (!, &&, || in C) and bitwise (~, &, |, ^). Instead PL/M has NOT, AND, OR and XOR taking care of both categories. Are we better off with two categories? I'm not so sure. I miss the logical ^^ operator (xor) in C, though. Still, I guess it would be possible to construct programs in C without having to involve the logical category.
In PL/M False is defined as 0. Booleans are usually represented in byte variables. True is defined as NOT False which will give you 0ffh (PL/M-ese for C's 0xff).
To see how the conversion of the status flag carry took place defore being stored in a byte (boolean wasn't available as a type) variable, PL/M could use the assembly instruction "sbb al,al" before storing. If carry was set al would contain 0ff, if it wasn't it would contain 0h. If the opposite value was required, PL/M would insert a "cmc" before the sbb or append a "not al" after (actually xor - one or the other).
So the 0xff for TRUE is a direct compatibility port from PL/M. Necessary? Probably not, unless you're unsure of your skills (in C) AND playing it super safe.
As I would have.
PL/M-80 (used for the 8080, 8085 and Z80) did not have support for integers or floats, and I suspect it was the same for PL/M-51. PL/M-86 (used for the 8086, 8088, 80188 and 80186) added integers, single precision floating point, segment:offset pointers and the standard memory models small, medium, compact and large. For those so inclined there were special directives to create do-it-yourself hybrid memory models. Microsoft's huge memory model was equivalent to intel's large. MS also sported tiny, small, compact, medium and large models.
Often in embedded systems there is one programmer who writes all the code and his/her idiosyncrasies are throughout the source. Many embedded programmers were HW engineers and had to get a system running as best they could. There was no requirement nor concept of "portability". Another consideration in embedded systems is the compiler is specific for the CPU HW. Refer to the ISA for this CPU and check all uses of the "boolean".
As others have said, it's setting all the bits to 1. And since this is embedded C, you might be storing this into a register where each bit is important for something, so you want to set them all to 1. I know I did similar when writing in assembler.
What's really important to know about this question is the type of "var". You say "boolean", but is that a C++/C99's bool, or is it (highly likely, being an embedded C app), something of a completely different type that's being used as a boolean?
Also adding 1 to 0xff sets it to 0( assuming unsigned char) and the checking might have been in a loop with an increment to break.
Here's a likely reason: 0xff is the binary complement of 0. It may be that on your embedded architecture, storing 0xff into a variable is more efficient than storing, say, 1 which might require extra instructions or a constant stored in memory.
Or perhaps the most efficient way to check the "truth value" of a register in your architecture is with a "check bit set" instruction. With 0xff as the TRUE value, it doesn't matter which bit gets checked... they're all set.
The above is just speculation, of course, without knowing what kind of embedded processor you're using. 8-bit, 16-bit, 32-bit? PIC, AVR, ARM, x86???
(As others have pointed out, any integer value other than zero is considered TRUE for the purposes of boolean expressions in C.)

Resources