How the logical NOT operator ! actually works in c?
How it turns all non-zero int into 0 and vice-versa?
For example:
#include <stdio.h>
void main() {
if(!(-76))
printf("I won't print anything");
if(!(2))
printf("I will also not print anything");
}
doesn't print anything, which could mean -76 and 2 was turned into zero...
So, I tried this:
#include <stdio.h>
void main() {
int x = !4;
printf("%d", x);
}
which indeed printed 0
and now I don't get how, is it flipping all the bits to 0 or what?
Most CPU architectures include an instruction to compare to zero; or even check the result against zero for most operations run on the processor. How this construct is implemented will differ from compiler to compiler.
For example, in x86, there are two instructions: JZ and JNZ: jump zero and jump not zero, which can be used if your test is an if statement. If the last value looked at was zero, jump (or don't jump) to a new instruction.
Given this, it's trivial to implement int x = !4; at assembly level as a jump if 4 is zero or not, though this particular example would be likely calculated at compile time, since all values are constant.
Additionally, most versions of the x86 instruction set support the SETZ instruction directly, which will set a register directly to 1 or 0, based on whether the processors zero flag is currently set. This can be used to implement the logical NOT operation directly.
6.5.3.3 Unary arithmetic operators
Constraints
1 The operand of the unary + or - operator shall have arithmetic type; of the ~ operator, integer type; of the ! operator, scalar type.
Semantics
...
5 The result of the logical negation operator ! is 0 if the value of its operand compares unequal to 0, 1 if the value of its operand compares equal to 0. The result has type int. The expression !E is equivalent to (0==E).
C 202x Working Draft
So, that's what language definition says should happen; if the expression x evaluates to non-zero, then the expression !x should evaluate to zero; if x evaluates to zero, then !x should evaluate to 1. The bits of the operand are not affected.
How that's accomplished in the machine code is up to the specific implementation; it depends on the available instruction set, the compiler, and various other factors such that no one answer works everywhere. It could translate to a branch statement, it could take advantage of specialized instructions, etc.
Related
Context
We are porting C code that was originally compiled using an 8-bit C compiler for the PIC microcontroller. A common idiom that was used in order to prevent unsigned global variables (for example, error counters) from rolling over back to zero is the following:
if(~counter) counter++;
The bitwise operator here inverts all the bits and the statement is only true if counter is less than the maximum value. Importantly, this works regardless of the variable size.
Problem
We are now targeting a 32-bit ARM processor using GCC. We've noticed that the same code produces different results. So far as we can tell, it looks like the bitwise complement operation returns a value that is a different size than we would expect. To reproduce this, we compile, in GCC:
uint8_t i = 0;
int sz;
sz = sizeof(i);
printf("Size of variable: %d\n", sz); // Size of variable: 1
sz = sizeof(~i);
printf("Size of result: %d\n", sz); // Size of result: 4
In the first line of output, we get what we would expect: i is 1 byte. However, the bitwise complement of i is actually four bytes which causes a problem because comparisons with this now will not give the expected results. For example, if doing (where i is a properly-initialized uint8_t):
if(~i) i++;
we will see i "wrap around" from 0xFF back to 0x00. This behaviour is different in GCC compared with when it used to work as we intended in the previous compiler and 8-bit PIC microcontroller.
We are aware that we can resolve this by casting like so:
if((uint8_t)~i) i++;
or, by
if(i < 0xFF) i++;
however in both of these workarounds, the size of the variable must be known and is error-prone for the software developer. These kinds of upper bounds checks occur throughout the codebase. There are multiple sizes of variables (eg., uint16_t and unsigned char etc.) and changing these in an otherwise working codebase is not something we're looking forward to.
Question
Is our understanding of the problem correct, and are there options available to resolving this that do not require re-visiting each case where we've used this idiom? Is our assumption correct, that an operation like bitwise complement should return a result that is the same size as the operand? It seems like this would break, depending on processor architectures. I feel like I'm taking crazy pills and that C should be a bit more portable than this. Again, our understanding of this could be wrong.
On the surface this might not seem like a huge issue but this previously-working idiom is used in hundreds of locations and we're eager to understand this before proceeding with expensive changes.
Note: There is a seemingly similar but not exact duplicate question here: Bitwise operation on char gives 32 bit result
I didn't see the actual crux of the issue discussed there, namely, the result size of a bitwise complement being different than what's passed into the operator.
What you are seeing is the result of integer promotions. In most cases where an integer value is used in an expression, if the type of the value is smaller than int the value is promoted to int. This is documented in section 6.3.1.1p2 of the C standard:
The following may be used in an expression wherever an intor
unsigned int may be used
An object or expression with an integer type (other than intor unsigned int) whose integer conversion rank is less
than or equal to the rank of int and unsigned int.
A bit-field of type _Bool, int ,signed int, orunsigned int`.
If an int can represent all values of the original type (as
restricted by the width, for a bit-field), the value is
converted to an int; otherwise, it is converted to an
unsigned int. These are called the integer promotions. All
other types are unchanged by the integer promotions.
So if a variable has type uint8_t and the value 255, using any operator other than a cast or assignment on it will first convert it to type int with the value 255 before performing the operation. This is why sizeof(~i) gives you 4 instead of 1.
Section 6.5.3.3 describes that integer promotions apply to the ~ operator:
The result of the ~ operator is the bitwise complement of its
(promoted) operand (that is, each bit in the result is set if and only
if the corresponding bit in the converted operand is not set). The
integer promotions are performed on the operand, and the
result has the promoted type. If the promoted type is an unsigned
type, the expression ~E is equivalent to the maximum value
representable in that type minus E.
So assuming a 32 bit int, if counter has the 8 bit value 0xff it is converted to the 32 bit value 0x000000ff, and applying ~ to it gives you 0xffffff00.
Probably the simplest way to handle this is without having to know the type is to check if the value is 0 after incrementing, and if so decrement it.
if (!++counter) counter--;
The wraparound of unsigned integers works in both directions, so decrementing a value of 0 gives you the largest positive value.
in sizeof(i); you request the size of the variable i, so 1
in sizeof(~i); you request the size of the type of the expression, which is an int, in your case 4
To use
if(~i)
to know if i does not value 255 (in your case with an the uint8_t) is not very readable, just do
if (i != 255)
and you will have a portable and readable code
There are multiple sizes of variables (eg., uint16_t and unsigned char etc.)
To manage any size of unsigned :
if (i != (((uintmax_t) 2 << (sizeof(i)*CHAR_BIT-1)) - 1))
The expression is constant, so computed at compile time.
#include <limits.h> for CHAR_BIT and #include <stdint.h> for uintmax_t
Here are several options for implementing “Add 1 to x but clamp at the maximum representable value,” given that x is some unsigned integer type:
Add one if and only if x is less than the maximum value representable in its type:
x += x < Maximum(x);
See the following item for the definition of Maximum. This method
stands a good chance of being optimized by a compiler to efficient
instructions such as a compare, some form of conditional set or move,
and an add.
Compare to the largest value of the type:
if (x < ((uintmax_t) 2u << sizeof x * CHAR_BIT - 1) - 1) ++x
(This calculates 2N, where N is the number of bits in x, by shifting 2 by N−1 bits. We do this instead of shifting 1 N bits because a shift by the number of bits in a type is not defined by the C standard. The CHAR_BIT macro may be unfamiliar to some; it is the number of bits in a byte, so sizeof x * CHAR_BIT is the number of bits in the type of x.)
This can be wrapped in a macro as desired for aesthetics and clarity:
#define Maximum(x) (((uintmax_t) 2u << sizeof (x) * CHAR_BIT - 1) - 1)
if (x < Maximum(x)) ++x;
Increment x and correct if it wraps to zero, using an if:
if (!++x) --x; // !++x is true if ++x wraps to zero.
Increment x and correct if it wraps to zero, using an expression:
++x; x -= !x;
This is is nominally branchless (sometimes beneficial for performance), but a compiler may implement it the same as above, using a branch if needed but possibly with unconditional instructions if the target architecture has suitable instructions.
A branchless option, using the above macro, is:
x += 1 - x/Maximum(x);
If x is the maximum of its type, this evaluates to x += 1-1. Otherwise, it is x += 1-0. However, division is somewhat slow on many architectures. A compiler may optimize this to instructions without division, depending on the compiler and the target architecture.
Before stdint.h the variable sizes can vary from compiler to compiler and the actual variable types in C are still int, long, etc and are still defined by the compiler author as to their size. Not some standard nor target specific assumptions. The author(s) then need to create stdint.h to map the two worlds, that is the purpose of stdint.h to map the uint_this that to int, long, short.
If you are porting code from another compiler and it uses char, short, int, long then you have to go through each type and do the port yourself, there is no way around it. And either you end up with the right size for the variable, the declaration changes but the code as written works....
if(~counter) counter++;
or...supply the mask or typecast directly
if((~counter)&0xFF) counter++;
if((uint_8)(~counter)) counter++;
At the end of the day if you want this code to work you have to port it to the new platform. Your choice as to how. Yes, you have to spend the time hit each case and do it right, otherwise you are going to keep coming back to this code which is even more expensive.
If you isolate the variable types on the code before porting and what size the variable types are, then isolate the variables that do this (should be easy to grep) and change their declarations using stdint.h definitions which hopefully won't change in the future, and you would be surprised but the wrong headers are used sometimes so even put checks in so you can sleep better at night
if(sizeof(uint_8)!=1) return(FAIL);
And while that style of coding works (if(~counter) counter++;), for portability desires now and in the future it is best to use a mask to specifically limit the size (and not rely on the declaration), do this when the code is written in the first place or just finish the port and then you won't have to re-port it again some other day. Or to make the code more readable then do the if x<0xFF then or x!=0xFF or something like that then the compiler can optimize it into the same code it would for any of these solutions, just makes it more readable and less risky...
Depends on how important the product is or how many times you want send out patches/updates or roll a truck or walk to the lab to fix the thing as to whether you try to find a quick solution or just touch the affected lines of code. if it is only a hundred or few that is not that big of a port.
6.5.3.3 Unary arithmetic operators
...
4 The result of the ~ operator is the bitwise complement of its (promoted) operand (that is,
each bit in the result is set if and only if the corresponding bit in the converted operand is
not set). The integer promotions are performed on the operand, and the result has the
promoted type. If the promoted type is an unsigned type, the expression ~E is equivalent
to the maximum value representable in that type minus E.
C 2011 Online Draft
The issue is that the operand of ~ is being promoted to int before the operator is applied.
Unfortunately, I don't think there's an easy way out of this. Writing
if ( counter + 1 ) counter++;
won't help because promotions apply there as well. The only thing I can suggest is creating some symbolic constants for the maximum value you want that object to represent and testing against that:
#define MAX_COUNTER 255
...
if ( counter < MAX_COUNTER-1 ) counter++;
I would like to ask if this short code:
int i = 0;
has 1 operand or 2? The i is an operand, but is 0 too? According to wikipedia, 0 shouldn't (or maybe I misunderstand). If 0 isn't operand, is it a constant or something?
If it is important, the code is in C99.
In int i = 0;, = is not an operator. It's simply a part of the variable initializaton syntax. On the other hand, in int i; i = 0; it would be an operator.
Since = here is not an operator, there are no operands. Instead, 0 is the initializer.
Since you've tagged the question as "C", one way to look at it is by reading the C standard. As pointed out by this answer, initialization (such as int i = 0) is not an expression in itself, and by a strict reading of the standard, the = here is not an operator according to the usage of those terms in the standard.
It is not as clear whether i and 0 are operands, however. On one hand, the C standard does not seem to refer to the parts of the initialization as operands. On the other hand, it doesn't define the term "operand" exhaustively. For example, one could interpret section 6.3 as calling almost any non-operator part of an expression an "operand", and therefore at least the 0 would qualify as one.
(Also note that if the code was int i; i = 0; instead, the latter i and the 0 would definitely both be operands of the assignment operator =. It remains unclear whether the intent of the question was to make a distinction between assignment and initialization.)
Apart from the C standard, you also refer to Wikipedia, in particular:
In computing, an operand is the part of a computer instruction which specifies what data is to be manipulated or operated on, while at the same time representing the data itself.
If we consider the context of a "computer instruction", the C code might naively be translate to assembly code like mov [ebp-4], 0 where the two operands would clearly be the [ebp-4] (a location where the variable called i is stored) and the 0, which would make both i and 0 operands by this definition. Yet, in reality the code is likely to be optimized by the compiler into a different form, such as only storing i in a register, in which case zeroing it might become xor eax, eax where the 0 no longer exists as an explicit operand (but is the result of the operation). Or, the whole 0 might be optimized away and replaced by some different value that inevitably gets assigned. Or, the whole variable might end up being removed, e.g., if it is used as a loop counter and the loop is unrolled.
So, in the end, either it is something of a philosophical question ("does the zero exist as an operand if it gets optimized away"), or just a matter of deciding on the desired usage of the terms (perhaps depending on the context of discussion).
The i is an operand, but is 0 too? According to wikipedia, 0 shouldn't
(or maybe I misunderstand).
The question links to the Wikipedia page describing the term "operand" in a mathematical context. This likely factors in to your confusion about whether the 0 on the right-hand side of the = is an operand, because in a mathematical statement of equality such as appears in the article, it is in no way conventional to consider = an operator. It does not express a computation, so the expressions it relates are not operated upon. I.e. they do not function as operands.
You have placed the question in C context, however, and in C, = can be used as an assignment operator. This is a bona fide operator in that it expresses a (simple) operation that produces a result from two operands, and also has a side effect on the left-hand operand. In an assignment statement such as
i = 0;
, then, both i and 0 are operands.
With respect to ...
If 0 isn't operand, is it a constant or something?
... 0 standing on its own is a constant in C, but that has little to do with whether it serves as an operand in any given context. Being an operand is a way an expression, including simply 0, can be used. More on that in a moment.
Now it's unclear whether it was the intent, but the code actually presented in the question is very importantly different from the assignment statement above. As the other answers also observe,
int i = 0;
is a declaration, not a statement. In that context, the i is the identifier being declared, the 0 is used as its initializer, and just as the = in a mathematical equation is not an operator, the = introducing an initializer in a C declaration is not an operator either. No operation is performed here, in that no value computation is associated with this =, as opposed to when the same symbol is used as an assignment operator. There being no operator and no operation being performed, there also are no operands in this particular line of code.
Given that x is a variable of type int with the number 5 as its value, consider the following statement:
int y = !!x;
This is what I think it happens: x is implicitly casted to a bool and the first negation is executed, after that the last negation is made, so a cast and two negations.
My question is, isn't just casting to bool (executing int y = (bool)x; instead of int y = !!x) faster than using double negation, as you are saving two negations from executing.
I might be wrong because I see the double negation a lot in the Linux kernel, but I don't understand where my intuition goes wrong, maybe you can help me out.
There was no bool type when Linux was first written. The C language treated everything that was not zero as true in Boolean expressions. So 7, -2 and 0xFF are all "true". No bool type to cast to. The double negation trick ensures the result is either zero or whatever bit pattern the compiler writers chose to represent true in Boolean expressions. When you're debugging code and looking at memory and register values, it's easier to recognize true values when they all have the same bit patterns.
Addendum: According the C89 draft standard, section 3.3.3.3:
The result of the logical negation operator ! is 0 if the value of its operand compares unequal to 0, 1 if the value of its operand compares equal to 0. The result has type int . The expression !E is equivalent to (0==E).
So while there was no Boolean type in the early days of the Linux OS, the double negation would have yielded either a 0 or a 1 (thanks to Gox for pointing this out), depending on the truthiness of the expression. In other words any bit pattern in the range of INT_MIN..-1 and 1..INT_MAX would have yielded a 1 and the zero bit pattern is self-explanatory.
C language unlike other languages does not have bool type. bool in C is actually defined in stdbool.h which is not included in many C projects. Linux kernel is one such projects, it would be a pain to go through Linux code and update everything to use bool now as well. That is reason why Linux kernel does not use bool in C.
why !!x? This is done to ensure that value of y is either 1 or 0. As an example if you have this cocd
x=5;
int y = !!x;
We know that everything that non-zero values in C mean true. So above code would brake down to y= !!(5) followed by y = !(0) and than y = 1.
EDIT:
One more thing, I just saw OP mentioned casting to bool. In C there is no bool as base type, bool is defined type, thus compilers do not cast integers to Boolean type.
EDIT 2:
To further explain, in C++, Java and other languages when you type bool a = false you do not have to use headers or compile some other libraries or define bool type for bool type to work, it is already incorporated into compilers, where as in c you have to.
EDIT 3:
bool is not the same as _Bool.
The only reason I can imagine is because this saves some typing (7 chars vs 2 chars).
As #jwdonahue and #Gox have already mentioned, this is not the correct reason. C did not have bool when the linux kernel was written therefore casting to bool was not an option.
As far as efficiency goes, both are equivalent because compilers can easily figure this out. See https://godbolt.org/g/ySo6K1
bool cast_to_bool_1(int x) {
return !!x;
}
bool cast_to_bool_2(int x) {
return (bool) x;
}
Both the functions compile to the same assembly which uses the test instruction to check if the argument is zero or not.
test edi, edi // checks if the passed argument is 0 or not
setne al // set al to 0 or 1 based on the previous comparison
ret // returns the result
I am new to C language and trying to figure out the meaning of the following code.
In here if (!msize) checking to see if msize is zero or if msize is NULL ?
if (!msize)
msize = 1 / msize; /* provoke a signal */
//Example 1: A division-by-zero misuse, in lib/mpi/mpi-pow.c of the Linux kernel, where the entire code will be optimized away.
//Compilers, GCC 4.7 and Clang 3.1
It depends on the type of msize.
If msize is a pointer, it tests whether it is NULL.
If msize is not a pointer, it tests whether it is 0.
This distinction may seem pedantic, but it's important. While NULL is actually 0 on most systems, the C standard allows it to be any other value.
I did some further reading, because I started to doubt whether my understanding above is correct.
Here are relevant parts of the C standard.
§6.5.3.3 Unary arithmetic operators
(5) The result of the logical negation operator ! is 0 if the value of its
operand compares unequal to 0, 1 if the value of its operand compares
equal to 0. The result has type int. The expression !E is equivalent
to (0==E).
§6.3.2.3 Pointers
(3) An integer constant expression with the value 0, or such an expression
cast to type void *, is called a null pointer constant. 66) If a null
pointer constant is converted to a pointer type, the resulting
pointer, called a null pointer, is guaranteed to compare unequal to a
pointer to any object or function.
(6) Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any integer
type.
Footnotes: 66 The macro NULL is defined in <stddef.h> (and other headers) as a null pointer constant; see 7.19.
As you can see, 0 is kind of a magic number in C. For systems with a non-zero NULL, I expect that the actual behaviour of !msize may be implementation-defined. In any case, this is all a bit nit-picky.
I tracked down the source of your example in the paper: Undefined Behavior: What Happened to My Code?. The text discussing your example states:
As mentioned earlier, at the instruction set level, x86 raises an
exception for a division by zero [17, 3.2], while MIPS [22, A.6] and
PowerPC [15, 3.3.38] silently ignore it. A division by zero in C is
undefined behavior [19, 6.5.5], and a compiler can thus simply assume
that the divisor is always non-zero.
Figure 1 shows a division-by-zero misuse in the Linux kernel. From the
programmer’s comment it is clear that the intention is to signal an
error in case msize is zero. When compiling with GCC, this code
behaves as intended on an x86, but not on a PowerPC, because it will
not generate an exception. When compiling with Clang, the result is
even more surprising. Clang assumes that the divisor msize must be
non-zero—on any system—since otherwise the division is undefined.
Combined with this assumption, the zero check !msize becomes always
false, since msize cannot be both zero and non-zero. The compiler
determines that the whole block of code is unreachable and removes it,
which has the unexpected effect of removing the programmer’s original
intention of guarding against the case when msize is zero.
So in your case, the answer you really needed was yes: it tests whether msize is 0.
This question already has answers here:
What does !!(x) mean in C (esp. the Linux kernel)?
(3 answers)
Closed 9 years ago.
I've seen code where people have used conditional clauses with two '!'s
#define check_bit(var, pos) (!!((var) & (1 << (pos))))
#define likely(x) __builtin_expect(!!(x),1)
#define unlikely(x) __builtin_expect(!!(x),0)
are some of the examples I could find.
Is there any advantage in using !!(condition) over (condition)?
Well if the variable you are applying !! is not already a bool(either zero or one) then it will normalize the value to either 0 or 1.
With respect to __builtin_expect this kernel newbies thread discusses the notation and one of the responses explains (emphasis mine):
The signature of __builtin_expect
http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html) is:
long __builtin_expect (long exp, long c)
Note that the exp parameter should be an integral expression, thus no pointers
or floating point types there. The double negation handles the conversion from
these types to integral expressions automatically. This way, you can simply
write: likely(ptr) instead of likely(ptr != NULL).
For reference in C99 bool macro expands to _Bool, true expands to 1 and false expands to 0. The details are given in the draft standard section 7.16 Boolean type and values .
Logical negation is covered in 6.5.3.3 Unary arithmetic operators in paragraph 5:
The result of the logical negation operator ! is 0 if the value of its operand compares
unequal to 0, 1 if the value of its operand compares equal to 0. The result has type int.
The expression !E is equivalent to (0==E).
The unary ! logical negation operator, applied to any scalar, yields the int value 0 if its operand is non-zero, 1 if the operand is equal to zero. Quoting the standard:
The expression !E is equivalent to (0==E).
Applying ! twice to the same scalar value yields a result that's false if the value is false, true if the value is true -- but the result is normalized to 0 or 1, respectively.
In most cases, this isn't necessary, since any scalar value can be used directly as a condition. But in some cases you actually need a 0 or 1 value.
In C99 or later, casting the expression to _Bool (or to bool if you have #include <stdbool.h> behaves similarly and might be considered clearer. But (a) the result is of type _Bool rather than int, and (b) if you're using a pre-C99 compiler that doesn't support _Bool and you've defined your own bool type, it won't behave the same way as C99's _Bool.
The biggest I can see, is that it will force (or normalize) the value into 1 or 0 (that is a boolean value) regardless of how the x or var expression expands (e.g. char or double or int or etc.).
It casts to a boolean, which can sometimes be useful.