In https://github.com/numpy/numpy/issues/6428, the root cause for the bug seems to be that at simd.inc.src:543, a compiler optimizes !(tmp == 0.) to tmp != 0..
A comment says that these are "not quite the same thing." But doesn't specify any details. NaNs are mentioned further on, but a test shows that a NaN compares to 0. the expected way.
What are the cases where == and != can both return true/false?
Or the discrepancy is in another field - e.g. returning values that have the same truth value but are different as ints (but testing shows even this doesn't seem the case)?
A comment says that these are "not quite the same thing." But doesn't specify any details. NaNs are mentioned further on, but a test shows that a NaN compares to 0. the expected way.
What are the cases where == and != can both return true/false?
The standard says:
The == (equal to) and != (not equal to) operators are analogous to the relational operators except for their lower precedence. [...] For any pair of operands, exactly one of the relations is true.
(C2011, 6.5.9/3; emphasis added)
Therefore, for any expressions X and Y that are jointly allowed as operands of these operators, (X) != (Y) must evaluate to the same result as !((X) == (Y)). If they are found in practice not to do so, then the compiler that yielded that result is non-conforming in that respect. If that non-conformance is unexpected, then it constitutes a bug in the compiler.
Additionally, I observe that 6.5.9/3 applies just as much to NaNs, infinities, and subnormals as to any other operands. NaNs are special with respect to these operators for a different reason: NaNs compare unequal to all operands, including themselves (supposing IEEE semantics).
From the linked post:
charris commented on Oct 9, 2015
I'm going to guess the !(tmp == 0.) is optimized to tmp != 0., which is not quite the same thing.
Comment by the OP:
The author says it's a guess but they are quite positive that !(tmp==0.) and tmp!=0. are not equivalent and express that as if it's common knowledge
How do we reconcile these two?
Clearly, they are logically equivalent. But implementation-wise, they may not be. A compiler might implement !(a == b) as the test a == b followed by a negation. Alternately, it might optimize the expression, and directly test a != b. Resulting assembly code would be different in those two cases. The same result should (must) be achieved, but the execution time could be different.
"not quite the same thing" would simply be an acknowledgement that !(a == b) and a != b are actually different combinations of characters and the compiler might do something technically different with them, that must yield the same result. And if different results are observed, then a bug might exist in the compiler.
Related
I am a getting warning:
warning: '<<' in boolean context, did you mean '<' ? [-Wint-in-bool-context]
for the code similar to the following:
int a=7,b=3;
int index=((a<<1)||b)&&5;
To explain the rationale behind such warnings:
C did get a boolean type _Bool/bool as per C99, but no changes were done to the behavior of the various logical operators of the language. That is:
Equality operators ==/!=
Relational operators <,<=,>,>=
Logical AND &&
Logical OR ||
Logical negation !.
Despite C having a boolean type, all of these operators return type int with value 1 or 0. Unlike C++ where all of these operators actually return type bool with value true/false. This is a known flaw of C.
It's common good practice however to treat such expressions as if they were boolean. Coding guidelines like MISRA C encourage the use of a fictional type "essentially boolean", meaning any expression that could be treated as _Bool. The reason for this is that it makes code more self-documenting and also makes it harder to create various typo-related bugs.
For example if(str) could mean check for NULL, or it could mean check for null termination but oops we forgot to dereference. Same thing with if(*str). Whereas if we only pass the result of a logical operator to if, the code becomes much clearer and it becomes harders to write bugs: if(str != NULL) could only mean check for NULL and if(*str != '\0') could only mean check for null termination.
In your case, the || operator only cares if the operands are zero or non-zero ("essentially boolean"). In case a is non-zero then a<<1 will not change that. In case a is zero, then a<<1 is zero as well. Since the result is passed on to a logical operator, "in a boolean context", the shift is pointless - hence the warning. But of course a<<1 might take all kinds of other values in another context such as for example if(a<<1 & mask).
It would be reasonable to strongly suspect that you actually meant to write a<1 instead. Because that would yield either 0 or 1, where < is also a logical operator like ||.
I got the solution. We should avoid integers using with logical operators (eg. ||,&&). Using integers with bitwise operators (eg. <<,&,|,etc.) is fine.
Sometimes we don't get this problem while running on compilers because of low priority warning filters. In complex and warning sensitive compilers it comes up.
Thanks
This is tricking my mind a little bit, because i can't seem to find any problem with this code.
Anyways, here's the riddle:
Suppose someone injects a random value into a, b:
int foo(int a, int b)
{
return b ? (a / b): 0;
}
b != 0 always!
Is it possible for an integer zero division exception to occur?
I'm starting to think this is a prank, yet....
NOTE:
This question was published in a conference, there is a possibility that the author was specific to a compiler/architecture where this is problematic.
No, divide by zero is not possible here.
Quoting C11, chapter §6.5.15, Conditional operator , (emphasis mine)
The first operand is evaluated; there is a sequence point between its evaluation and the
evaluation of the second or third operand (whichever is evaluated). The second operand
is evaluated only if the first compares unequal to 0; the third operand is evaluated only if
the first compares equal to 0; [...]
So, in case, b is 0, the expression (a/b) will not be evaluated.
That said, just as note:
The division is integer division.
If you have a wrapper, that ensures b != 0, then you could cut down the whole function call and simply write someVar = a/b;
Also, I don't know of any architecture which (possibly) changes the aforesaid behavior.
There is no possibility for a division by 0 in your example, yet there is another special case you should check for: dividing INT_MIN by -1 may cause a division overflow, and usually causes an fatal exception on Intel hardware, a surprising behavior, yet consistent with the C Standard that specifies that integer overflow can cause an implementation specific behavior.
If you need to protect against such unwanted behavior, you must special case these operands and handle them specifically:
#include <limits.h>
int foo(int a, int b) {
if (b == 0) return 0;
if (a == INT_MIN && b == -1) return INT_MAX;
return a / b;
}
Since the values of a and b can be crafted by an external source, you definitely need to worry about division overflow. It is a fun game to play to try a crash unsafe calculators by feeding them these values.
Let's say we have these inequalities:
if (a*a+b*b>0) {
...
}
if (a*b+c*d>0) {
...
}
Obviously, both of them require 2 multiplications to evaluate.
The thing is, do we really need to calculate 2 full-precision products just to check whether these expressions are positive or not?
Is there any mathematical trickery that allows me to write those if commands without the need to evaluate 2 products?
Will it be faster?
Or perhaps the compiler takes care of making it as fast as possible?
Am I overthinking?
EDIT:
Well, that escalated quickly.
I just want to point out that I am speaking in general terms. I don't need such a micro-optimization in any project of mine anyway.
Also, yes, I could have omitted the first one for being too trivial. Possibly the second one is more interesting.
Your "am I overthinking" question suggests me that you haven't found this to be an actual bottleneck by really profiling your code. So I'd say yes, you're just trying to do premature optimization.
However, if this really is a major performance-critical part of your application, then the only improvement I can think of right now is the following. Since squares of real numbers can never be negative, then "a squared is greater than zero" is equivalent with "a is not zero". So if comparisons are fast (well, that's relative -- faster than multiplication) on your architecture, then
if (a*a+b*b>0) {
...
}
can be written as
if (a || b) {
...
}
(provided that no corner cases arise. If the variables are signed integers or floating-point numbers representing real numbers, then this should be fine. If, however, there are some unsigned integer overflow or complex numbers involved, then you will have to perform additional checks, and at that point, it's hard to reason about the relative performance without true profiling.)
I don't have such a "clever" "optimization" for the second case in my mind, but perhaps someone else can come up with something similar -- if and only if it is absolutely necessary. Not otherwise -- code readability is preferred over performance when performance is not critical.
I'm assuming none of these expressions will overflow either because the types don't have a concept of overflow or because the values are in range. When overflow and potential wrap-around enters the picture (e.g. if a and b are unsigned int) different rules apply.
The first statement is obviously equivalent to
if (a != 0 || b != 0)
or
if (a || b)
which trades an extra branch for two multiplications and an addition.
The second statement is a bit more interesting: I'd think it could be reasonable to determine the signs of the operand and only do the actual math when a*b and c*d have opposite signs. In all other cases the condition can be determined without knowing the actual values. Whether the resulting logic is faster than the computations will depend on the types, I'd guess.
The first one will always be >= 0. It will be 0 if and only if a and b are 0, so it's equivalent to:
if (a || b) {
...
}
About the second one: if sign of a is equal to sign of b, and sign of c is equal to sign of d, then it's the same situation as above:
if (sign(a)==sign(b) && sign(c)==sign(d))
{
if ((a && b) || (c && d))
{
... > 0
}
else
{
... = 0
}
}
else
{
if (sign(a)*sign(b)==sign(c)*sign(d))
{
... <= 0
}
else
{
/* must do the actual product to find out */
}
}
For a IEEE-754 compliant floating point number, the sign is at the MSb of each number.
For environments in which FP is emulated, there's one thing you can do to optimize a bit the comparison: you can avoid the additions, if you just compare the two results of the products, like this:
if (a*b>c*d) {
...
}
This is a bit faster because to compare two floating point numbers, you just compare them as if they were signed integer numbers, and a FP-less CPU surely will have resources to compare two integer numbers faster than the time it spends doing a FP software addition.
Another rewrite (assuming you are using floats, they are 32 bits wide and IEEE 754 compliant, and the same size as int; yes, this is hacky and platform-dependent).
For the first case, you can use a single bitwise 'or' & 'and' (the and is used to ignore the sign-bit and the exponent, retaining only the mantissa; you can remove it if there cannot be any -0s):
if (*((int *)&a) | (*(int *)&b) & 0x7FFFFF) { // a*a + b*b>0
...
}
I really doubt that there is any similarly branch-less magic for the second case.
Imagine i have the following piece of C-code where foo() produces a side effect and returns an integer:
if(bar) {
foo();
return 0;
}
Now, say I really like making my code compact, possibly at the reader's expense, and I change it into this:
if (bar)
return foo() && 0;
Can I be sure these two pieces of code will produce the same behavior, or would I risk the call to foo() not being executed due to possible compiler optimizations or something like that, thus not producing the desired side-effect?
NOTE: This is not a question about which piece of code is better, but whether the two pieces actually produce the same behavior in all cases. I think the majority (and I) can agree that the former piece of code should be used.
Yes, those two are the same. foo() will always be called (assuming bar is true).
The two forms you give are equivalent. The C11 standard (draft n1570) states,
6.5.13 Logical AND operator
...
Semantics
3 The && operator shall yield 1 if both of its operands compare unequal to 0;
otherwise, it yields 0. The result has type int.
4 Unlike the bitwise binary & operator, the && operator guarantees left-to-right
evaluation; if the second operand is evaluated, there is a sequence point between
the evaluations of the first and second operands. If the first operand compares
equal to 0, the second operand is not evaluated.
Similar language appeared in all C standards so far.
You should probably prefer using the comma operator here (return foo(), 0;) because:
It's shorter (one character versus two for the operator, and you can get away with removing the left space character when using a comma, for a total of two fewer characters).
It gives you more flexibility, as you can return non-scalar types (such as structs), and a wider range of integers than just 0 or 1.
It conveys the intent better: "Discard return value of foo() and return something else (0) instead".
Now if you do chance upon a compiler that deletes the call to foo(), then either the compiler managed to prove that foo() is a function with no visible side-effects, or more likely it has a serious bug and you should report it.
Why obfuscate your code in the latter?
Use the former.
Easier to read i.e. this is easier to understand
if(bar) {
foo();
return 0;
}
Or unless got a problem with job security
Is !!(x) guaranteed by the standard to return 0/1?
Note that I am not asking about c++, where a bool type is defined.
Yes, in C99, see §6.5.3.3/4:
The result of the logical negation operator ! is 0 if the value of its operand compares
unequal to 0, 1 if the value of its operand compares equal to 0. The result has type int.
The expression !E is equivalent to (0==E).
So !x and !!y can only yield 0 or 1, as ints.
For other operators, in C99, see also Is the "true" result of >, <, !, &&, || or == defined?
This is a comment really, but it's too long.
I found a very bizarre document while looking for the standard to answer your question: The New C Standard: An Economic and Cultural Commentary. And they say academia is under-funded. (Here is the full, 2083 page 10.5MB PDF. The former link is just the section on double negation.)
It has this to say on the subject of double negation:
A double negative is very often interpreted as a positive statement in English (e.g., “It is not unknown for double negatives to occur in C source”). The same semantics that apply in C. However, in some languages (e.g., Spanish) a double negative is interpreted as making the statement more negative (this usage does occur in casual English speech, e.g., “you haven’t seen nothing yet”, but it is rare and frowned on socially1).
I believe that the author would be happy knowing that this is of no use whatsoever in answering your real question (the answer to which is yes.)