In the C programing language, why do the bitwise operators (& and |) have lower precedence than the equality operator (==)? It does not make sense to me.
You need to ask Brian Kernighan or Dennis Ritchie.
From this forum: http://bytes.com/topic/c/answers/167377-operator-precedence
The && and || operators were added later
for their "short-circuiting" behavior. Dennis Ritchie admits in
retrospect that the precedence of the bitwise operators should have
been changed when the logical operators were added. But with several
hundred kilobytes of C source code in existence at that point and an
installed base of three computers, Dennis thought it would be too big
of a change in the C language...
So, that might be a reason? I'm guessing since there are several layers of bitwise precendence (unlike relational comparisons) that it's cruft that's existed since...forever...and just was never corrected.
It doesn't make sense to Dennis Ritchie, either, in retrospect.
http://www.lysator.liu.se/c/dmr-on-or.html
&& and || were added to the language after | and &, and precedence was maintained for reasons of compatibility.
I don't have an authoritative answer as to why K&R chose the precedence they did. One example that makes a fair amount of sense would be this one:
if (x == 1 & y == 0) {
/* ... */
}
Since this is the bitwise AND operator it uses a non-short-circuiting evaluation mode, as would
if (x == 1 | y == 0) {
/* ... */
}
use the non-short-circuiting OR operator. This is probably why they chose to have the precedence group this way, but I agree with you that in retrospect it doesn't seem like a good idea.
Related
I'm learning C programming in university, and for a quiz the question above came. I would like to understand how it will execute. Does it have something to do with the order of precedence?
Yes it does, but that's only half the story.
To solve this one, you need to know two things:
the operator precedence of += and |=
if these are the same, the associativity of these operators (left-to-right or right-to-left)
Fortunately, there is a table at cppreference.
This tells us that:
both += and |= have the same precedence
their associativity is right-to-left
The answer to the quiz (as shown in your screenshot!) is therefore a += (b |= c), that is to say
b |= c is evaluated first and the result is then added to a.
But, as bolov points out, any self-respecting programmer would, at minimum, put the brackets in for you, or (ideally) code this as two separate statements.
When the calculation formulas have the same priority.
It will be resolved from the right side.
In other words, the result is the same as the following formula.
b=b|c;
a=a+b;
In the "Introduction" section of K&R C (2E) there is this paragraph:
C, like any other language, has its blemishes. Some of the operators have the wrong precedence; ...
Which operators are these? How are their precedence wrong?
Is this one of these cases?
Yes, the situation discussed in the message you link to is the primary gripe with the precedence of operators in C.
Historically, C developed without &&. To perform a logical AND operation, people would use the bitwise AND, so a==b AND c==d would be expressed with a==b & c==d. To facilitate this, == had higher precedence than &. Although && was added to the language later, & was stuck with its precedence below ==.
In general, people might like to write expressions such as (x&y) == 1 much more often than x & (y==1). So it would be nicer if & had higher precedence than ==. Hence people are dissatisfied with this aspect of C operator precedence.
This applies generally to &, ^, and | having lower precedence than ==, !=, <, >, <=, and >=.
There is a clear rule of precedence that is incontrovertible.
The rule is so clear that for a strongly typed system (think Pascal) the wrong precedence would give clear unambiguous syntax errors at compile time. The problem with C is that since its type system is laissez faire the errors turn out to be more logical errors resulting in bugs rather than errors catch-able at compile time.
The Rule
Let ○ □ be two operators with type
○ : α × α → β
□ : β × β → γ
and α and γ are distinct types.
Then
x ○ y □ z can only mean (x ○ y) □ z, with type assignment
x: α, y : α, z : β
whereas x ○ (y □ z) would be a type error because ○ can only take an α whereas the right sub-expression can only produce a γ which is not α
Now lets
Apply this to C
For the most part C gets it right
(==) : number × number → boolean
(&&) : boolean × boolean → boolean
so && should be below == and it is so
Likewise
(+) : number × number → number
(==) : number × number → boolean
and so (+) must be above (==) which is once again correct
However in the case of bitwise operators
the &/| of two bit-patterns aka numbers produce a number
ie
(&), (|) : number × number → number
(==) : number × number → boolean
And so a typical mask query eg. x & 0x777 == 0x777
can only make sense if (&) is treated as an arithmetic operator ie above (==)
C puts it below which in light of the above type rules is wrong
Of course Ive expressed the above in terms of math/type-inference
In more pragmatic C terms x & 0x777 == 0x777 naturally groups as
x & (0x777 == 0x777) (in the absence of explicit parenthesis)
When can such a grouping have a legitimate use?
I (personally) dont believe there is any
IOW Dennis Ritchie's informal statement that these precedences are wrong can be given a more formal justification
Wrong may sound a bit too harsh. Normal people generally only care about the basic operators like +-*/^ and if those don't work like how they write in math, that may be called wrong. Fortunately those are "in order" in C (except power operator which doesn't exist)
However there are some other operators that might not work as many people expect. For example the bitwise operators have lower precedence than comparison operators, which was already mentioned by Eric Postpischil. That's less convenient but still not quite "wrong" because there wasn't any defined standard for them before. They've just been invented in the last century during the advent of computers
Another example is the shift operators << >> which have lower precedence than +-. Shifting is thought as multiplication and division, so people may expect that it should be at a higher level than +-. Writing x << a + b may make many people think that it's x*2a + b until they look at the precedence table. Besides (x << 2) + (x << 4) + (y << 6) is also less convenient than simple additions without parentheses. Golang is one of the languages that fixed this by putting <</>> at a higher precedence than + and -
In other languages there are many real examples of "wrong" precedence
One example is T-SQL where -100/-100*10 = 0
PHP with the wrong associativity of ternary operators
Excel with wrong precedence (lower than unary minus) and associativity (left-to-right instead of right-to-left) of ^:
According to Excel, 4^3^2 = (4^3)^2. Is this really the standard mathematical convention for the order of exponentiation?
Why does =-x^2+x for x=3 in Excel result in 12 instead of -6?
Why is it that Microsoft Excel says that 8^(-1^(-8^7))) = 8 instead of 1/8?
It depends which precedence convention is considered "correct". There's no law of physics (or of the land) requiring precedence to be a certain way; it's evolved through practice over time.
In mathematics, operator precedence is usually taken as "BODMAS" (Brackets, Order, Division, Multiplication, Addition, Subtraction). Brackets come first and Subtraction comes last.Ordering Mathematical Operations | BODMAS Order of operations
Operator precedence in programming requires more rules as there are more operators, but you can distil out how it compares to BODMAS.
The ANSI C precedence scheme is pictured here:
As you can see, Unary Addition and Subtraction are at level 2 - ABOVE Multiplication and Division in level 3. This can be confusing to a mathematician on a superficial reading, as can precedence around suffix/postfix increment and decrement.
To that extent, it is ALWAYS worth considering adding brackets in your mathematical code - even where syntactically unnecessary - to make sure to a HUMAN reader that your intention is clear. You lose nothing by doing it (although you might get flamed a bit by an uptight code reviewer, in which you can flame back about coding risk management). You might lose readability, but intention is always more important when debugging.
And yes, the link you provide is a good example. Countless expensive production errors have resulted from this.
While at least from a hand wave point of view I believe I know what an "arithmetic operator" is, I'm looking for a formal definition. I've examined the C17 standard document and I can't find such a definition, although it uses the term "arithmetic operator" in several places.
The closest I've been able to find is in the index of C17, where page numbers are provided for additive, bitwise, increment and decrement, multiplicative, shift, and unary under the common heading "arithmetic operators". I've looked online at various sources and the most common thing I've found only says that binary +, -, *, /, and % are the C arithmetic operators. Some also throw in ++ and --.
I'm pretty sure I'm simply missing something since I do find the standard quite daunting. However, I also find the various online sources somewhat dubious since they often seem to differ.
Thanks!
Update: Since some readers objected to my references to both C and C++ in the same posting, I've removed the references to C++ in the modified version above and will do an entirely separate posting for it later if I can first get the issue resolved for C.
The C standard does not explicitly define the term arithmetic operator, though it defines what an arithmetic operand is. If you read carefully, nothing in C is defined by using the term arithmetic operator, it exists only as a grouping in the index and in a title of one section. The term arithmetic operator by itself does not appear in any paragraph.
From the index, we indeed can get a list
arithmetic operators
additive, 6.2.6.2, 6.5.6, G.5.2
bitwise, 6.2.6.2, 6.5.3.3, 6.5.10, 6.5.11, 6.5.12
increment and decrement, 6.5.2.4, 6.5.3.1
multiplicative, 6.2.6.2, 6.5.5, G.5.1
shift, 6.2.6.2, 6.5.7
unary, 6.5.3.3
From this we could formulate that the arithmetic operators are those that require the operands to be arithmetic operands, i.e. of an arithmetic type (except in special cases such as pointer addition, subtraction), i.e.
additive + and -
bitwise &, | and ^
increment and decrement ++ and --
multiplicative *, / and %
shift << and >>
unary -, ~ and +. It is debatable whether ! is an arithmetic operator or not, even though it is listed in section 6.5.3.3.
Another notable thing about these operators are that the operands might undergo usual arithmetic conversions.
Arithmatic operators are operators used to perform mathematical operations like addition, substraction, multiplication and division. As simple as that.
ex: a+b = c
I just found legacy code which tests a flag like this:
if( some_state & SOME_FLAG )
So far, so good!
But further in code, I see an improper negation
if( ! some_state & SOME_FLAG )
My understanding is that it is interpreted as (! some_state) & SOME_FLAG which is probably a bug, and gcc logically barks with -Wlogical-not-parentheses...
Though it could eventually have worked in the past if ever !some_state was implemented as ~some_state by some legacy compiler. Does anyone know if it was possibly the case?
EDIT
sme_state is declared as int (presumably 32 bits, 2 complement on target achitecture).
SOME_FLAG is a constant set to a single bit 0x00040000, so SOME_FLAG & 1 == 0
Logical negation and bitwise negation have never been equivalent. No conforming compiler could have implemented one as the other. For example, the bitwise negation of 1 is not 0, so ~1 != !1.
It is true that the expression ! some_state & SOME_FLAG is equivalent to (! some_state) & SOME_FLAG because logical negation has higher precedence than bitwise and. That is indeed suspicious, but the original code is not necessarily in error. In any case, it is more likely that the program is buggy in this regard than that any C implementation evaluated the original expression differently than the current standard requires, even prior to standardization.
Since the expressions (! some_state) & SOME_FLAG and !(some_state & SOME_FLAG) will sometimes evaluate to the same value -- especially if SOME_FLAG happens to expand to 1 -- it is also possible that even though they are inequivalent, their differences do not manifest during actual execution of the program.
While there was no standard before 1989, and thus compilers could do things as they wished, no compiler to my knowledge has ever done this; changing the meaning of operators wouldn't be a smart call if you want people to use your compiler.
There's very little reason to write an expression like (!foo & FLAG_BAR); the result is just !foo if FLAG_BAR is odd or always zero if it is even. What you've found is almost certainly just a bug.
It would not be possible for a legacy compiler to implement ! as bitwise negation, because such approach would produce incorrect results in situations when the value being negated is outside the {0, 0xFF...FF} set.
Standard requires the result of !x to produce zero for any non-zero value of x. Hence, applying ! to, say, 1 would yield 0xFF..FFFE, which is non-zero.
The only situation when the legacy code would have worked as intended is when SOME_FLAG is set to 1.
Let's start with the most interesting (and least obvious) part: gcc logically barks with -Wlogical-not-parentheses. What does this mean?
C has two different operators that have similar looking characters (but different behaviour and intended for very different purposes) - the & which is a bitwise AND, and && which is a boolean AND. Unfortunately this led to typos, in the same way that typing = when you meant == can cause problems, so some compilers (GCC) decided to warn people about "& without parenthesis used as a condition" (even though it's perfectly legal) to reduce the risk of typos.
Now...
You're showing code that uses & (and not showing code that uses &&). This implies that some_state is not a boolean and is number. More specifically it implies that each bit in some_state may be completely independent and unrelated.
For an example of this, let's pretend that we're implementing a Pacman game and need a nice compact way to store the map for each level. We decide that each tile in the map might be a wall or not, might be a collected dot or not, might be power pill or not, and might be a cherry or not. Someone suggests that this can be an array of bytes, like this (assuming the map is 30 tiles wide and 20 tiles high):
#define IS_WALL 0x01
#define HAS_DOT 0x02
#define HAS_POWER_PILL 0x04
#define HAS_CHERRY 0x08
uint8_t level1_map[20][30] = { ..... };
If we want to know if a tile happens to be safe to move into (no wall) we could do this:
if( level1_map[y][x] & IS_WALL == 0) {
For the opposite, if we want to know if a tile is a wall we could do any of these:
if( level1_map[y][x] & IS_WALL != 0) {
if( !level1_map[y][x] & IS_WALL == 0) {
if( level1_map[y][x] & IS_WALL == IS_WALL) {
..because it makes no difference which one it is.
Of course (to avoid the risk of typos) GCC might (or might not) warn about some of these.
In C (and some other C-like languages) we have 2 unary operators for working with pointers: the dereference operator (*) and the 'address of' operator (&). They are left unary operators, which introduces an uncertainty in order of operations, for example:
*ptr->field
or
*arr[id]
The order of operations is strictly defined by the standard, but from a human perspective, it is confusing. If the * operator was a right unary operator, the order would be obvious and wouldn't require extra parentheses:
ptr*->field vs ptr->field*
and
arr*[id] vs arr[id]*
So is there a good reason why are the operators left unary, instead of right. One thing that comes to mind would be the declaration of types. Left operators stay near the type name (char *a vs char a*), but there are type declarations, which already break this rule, so why bother (char a[num], char (*a)(char), etc).
Obviously, there are some problems with this approach too, like the
val*=2
Which would be either an *= short hand for val = val * 2 or dereference and assign val* = 2.
However this can be easily solved by requiring a white space between the * and = tokens in case of dereferencing. Once again, nothing groundbreaking, since there is a precedent of such a rule (- -a vs --a).
So why are they left instead of right operators?
Edit:
I want to point out, that I asked this question, because many of the weirder aspects of C have interesting explanations, for why they are the way they are, like the existence of the -> operator or the type declarations or the indexing starting from 0. And so on. The reasons may be no longer valid, but they are still interesting in my opinion.
There indeed is an authoritative source: "The Development of the C Language" by the creator of the language, Dennis M. Ritchie:
An accident of syntax contributed to the perceived complexity of the language. The indirection operator, spelled * in C, is syntactically a unary prefix operator, just as in BCPL and B. This works well in simple expressions, but in more complex cases, parentheses are required to direct the parsing. For example, to distinguish indirection through the value returned by a function from calling a function designated by a pointer, one writes *fp() and (*pf)() respectively. The style used in expressions carries through to declarations, so the names might be declared
int *fp();
int (*pf)();
In more ornate but still realistic cases, things become worse:
int *(*pfp)();
is a pointer to a function returning a pointer to an integer. There are two effects occurring. Most important, C has a relatively rich set of ways of describing types (compared, say, with Pascal). Declarations in languages as expressive as C—Algol 68, for example—describe objects equally hard to understand, simply because the objects themselves are complex. A second effect owes to details of the syntax. Declarations in C must be read in an `inside-out' style that many find difficult to grasp [Anderson 80]. Sethi [Sethi 81] observed that many of the nested declarations and expressions would become simpler if the indirection operator had been taken as a postfix operator instead of prefix, but by then it was too late to change.
Thus the reason why * is on the left in C is because it was on the left in B.
B was partially based on BCPL, where the dereferencing operator was !.
This was on the left; the binary ! was an array indexing operator:
a!b
is equivalent to !(a+b).
!a
is the content of the cell whose address is given by a; it can appear on the left of an assignment.
Yet the 50 year old BCPL manual doesn't even contain mentions of the ! operator - instead, the operators were words: unary lv and rv. Since these were understood as if they were functions, it was natural that they preceded the operand; later the longish rv a could then be replaced with syntactic sugar !a.
Many of the current C operator practices can be traced via this route. B alike had a[b] being equivalent to *(a + b) to *(b + a) to b[a] just like in BCPL one could use a!b <=> b!a.
Notice that in B variables were untyped, so certainly similarity with declarations could not have been the reason to use * on the left there.
So the reason for unary * being on the left in C is as boring as "there wasn't any problem in the simpler programs with the unary * being on the left, in the position that everyone was accustomed to have the dereferencing operator in other languages, that no one really thought that some other way would have been better until it was too late to change it".