Bitwise operator precedence - sql-server

I just came across that baffles me in MSSQL.
If I do something like:
SELECT 1 | 0 & 0
The output is 0 - I would've thought that the & operator would have a precedence over the |operator.
Any ideas ? Is this a bug or an intended behavior in MSSQL ?

According to SQL Server operator precedence table (SQL2k8+) as #MartinSmith already referenced it in a comment this is how precedence looks like:
Level Operators
1 ~ (Bitwise NOT)
2 * (Multiply), / (Division), % (Modulo)
3 + (Positive), - (Negative), + (Add), (+ Concatenate), - (Subtract), & (Bitwise AND), ^ (Bitwise Exclusive OR), | (Bitwise OR)
4 =, >, <, >=, <=, <>, !=, !>, !< (Comparison operators)
5 NOT
6 AND
7 ALL, ANY, BETWEEN, IN, LIKE, OR, SOME
8 = (Assignment)
When two operators in an expression have the same operator precedence level, they are evaluated left to right based on their position in the expression.
Your issue with bitwise operators
As you can see bitwise operators are all in the same level (3) that's why they're evaluated left to right, so in your example | is evaluated first and then &.
Boolean logical operators are different
These are of course bitwise operators and not logical ones, which would of course evaluate in correct order since AND has level 6 and OR has 7. This means that AND would be evaluated first and OR afterwards.
Brackets are of course the preferred choice for any non trivial expression. In your case it seems trivial, yes, hence brackets weren't used, but since bitwise operators operate on the same level, they must be used when combining at least 3 operands or make sure you order operators in the way you want them to evaluate. In your case:
select 0 & 0 | 1
Would yield correct result.
It hasn't always been this way
Bitwise operator precedence has been equalised since SQL Server 2008. SQL Server 2005 and older had bitwise | (along with exlusive or ^) on a lower level (5), so your expression would evaluate correctly. Why Microsoft decided to put all bitwise operators on the same level is a mystery to me.
It turned out that SQL Books Online were stating invalid real life processing (Thanks to #MartinSmith). So they just updated operator precedence levels documentation page for SQL Server 2008. So apparently this has been working this way for a much longer time (maybe all the time).

Related

Are innermost parentheses evaluated first in C?

Consider the below question in regard to this expression: (a+b)+ (c +(d+e)) +(f+g)
In C we use precedence and associativity to decide which operator should be evaluated first and if there is more than one operator with the same precedence, then in what order they should be associated.
My question is:
Is (d+e) evaluated first by the compiler as it is the innermost nested parenthesis in the expression?
If so, do associativity and precedence somehow recommend innermost parenthesis evaluation should be done first?
I strongly feel that it doesn't, and if it doesn't then why would the compiler even decide to evaluate the innermost parenthesis first? Because going from left to right and evaluating parentheses at the same level seems much more logical to me.
You are confusing operator precedence with order of evaluation. These are different though related terms. See What is the difference between operator precedence and order of evaluation?
Specifically:
Do (d+e) is evaluated first by compiler as it is innermost nested parenthesis in the expression?
This depends on the order of evaluation of the + (additive) operators, nothing else. The order is unspecified for + (as it is for most operators) and we can't know it, nor should we rely on it. Compilers are allowed to do as they please and don't need to tell you (document) how and when they pick a certain order.
If so does associativity and precedence somehow explains innermost parenthesis evaluation should be done first?
No, this has nothing to do with operator precedence.
I strongly feel that it doesn't and if it don't then why we decided to evaluate innermost parenthesis first, because going from left to right and evaluating parenthesis at same level seems much logical to me
See 1)
Consider instead
(a() + b()) + (c() + (d() + e())) + (f() + g())
where each of the functions prints it's letter and, of course, returns a value
int a(void) { putchar('a'); return 42; }
When that expression is evaluated, something like "abcdefg" will be printed according to "order of evaluation", regardless of the operator precedence.
The order of evaluation is not something the Standard has rules for. Each compiler implementation can do the evaluation as it likes best ... even changing the order between compilations or runs.

Which operator(s) in C have wrong precedence?

In the "Introduction" section of K&R C (2E) there is this paragraph:
C, like any other language, has its blemishes. Some of the operators have the wrong precedence; ...
Which operators are these? How are their precedence wrong?
Is this one of these cases?
Yes, the situation discussed in the message you link to is the primary gripe with the precedence of operators in C.
Historically, C developed without &&. To perform a logical AND operation, people would use the bitwise AND, so a==b AND c==d would be expressed with a==b & c==d. To facilitate this, == had higher precedence than &. Although && was added to the language later, & was stuck with its precedence below ==.
In general, people might like to write expressions such as (x&y) == 1 much more often than x & (y==1). So it would be nicer if & had higher precedence than ==. Hence people are dissatisfied with this aspect of C operator precedence.
This applies generally to &, ^, and | having lower precedence than ==, !=, <, >, <=, and >=.
There is a clear rule of precedence that is incontrovertible.
The rule is so clear that for a strongly typed system (think Pascal) the wrong precedence would give clear unambiguous syntax errors at compile time. The problem with C is that since its type system is laissez faire the errors turn out to be more logical errors resulting in bugs rather than errors catch-able at compile time.
The Rule
Let ○ □ be two operators with type
○ : α × α → β
□ : β × β → γ
and α and γ are distinct types.
Then
x ○ y □ z can only mean (x ○ y) □ z, with type assignment
x: α, y : α, z : β
whereas x ○ (y □ z) would be a type error because ○ can only take an α whereas the right sub-expression can only produce a γ which is not α
Now lets
Apply this to C
For the most part C gets it right
(==) : number × number → boolean
(&&) : boolean × boolean → boolean
so && should be below == and it is so
Likewise
(+) : number × number → number
(==) : number × number → boolean
and so (+) must be above (==) which is once again correct
However in the case of bitwise operators
the &/| of two bit-patterns aka numbers produce a number
ie
(&), (|) : number × number → number
(==) : number × number → boolean
And so a typical mask query eg. x & 0x777 == 0x777
can only make sense if (&) is treated as an arithmetic operator ie above (==)
C puts it below which in light of the above type rules is wrong
Of course Ive expressed the above in terms of math/type-inference
In more pragmatic C terms x & 0x777 == 0x777 naturally groups as
x & (0x777 == 0x777) (in the absence of explicit parenthesis)
When can such a grouping have a legitimate use?
I (personally) dont believe there is any
IOW Dennis Ritchie's informal statement that these precedences are wrong can be given a more formal justification
Wrong may sound a bit too harsh. Normal people generally only care about the basic operators like +-*/^ and if those don't work like how they write in math, that may be called wrong. Fortunately those are "in order" in C (except power operator which doesn't exist)
However there are some other operators that might not work as many people expect. For example the bitwise operators have lower precedence than comparison operators, which was already mentioned by Eric Postpischil. That's less convenient but still not quite "wrong" because there wasn't any defined standard for them before. They've just been invented in the last century during the advent of computers
Another example is the shift operators << >> which have lower precedence than +-. Shifting is thought as multiplication and division, so people may expect that it should be at a higher level than +-. Writing x << a + b may make many people think that it's x*2a + b until they look at the precedence table. Besides (x << 2) + (x << 4) + (y << 6) is also less convenient than simple additions without parentheses. Golang is one of the languages that fixed this by putting <</>> at a higher precedence than + and -
In other languages there are many real examples of "wrong" precedence
One example is T-SQL where -100/-100*10 = 0
PHP with the wrong associativity of ternary operators
Excel with wrong precedence (lower than unary minus) and associativity (left-to-right instead of right-to-left) of ^:
According to Excel, 4^3^2 = (4^3)^2. Is this really the standard mathematical convention for the order of exponentiation?
Why does =-x^2+x for x=3 in Excel result in 12 instead of -6?
Why is it that Microsoft Excel says that 8^(-1^(-8^7))) = 8 instead of 1/8?
It depends which precedence convention is considered "correct". There's no law of physics (or of the land) requiring precedence to be a certain way; it's evolved through practice over time.
In mathematics, operator precedence is usually taken as "BODMAS" (Brackets, Order, Division, Multiplication, Addition, Subtraction). Brackets come first and Subtraction comes last.Ordering Mathematical Operations | BODMAS Order of operations
Operator precedence in programming requires more rules as there are more operators, but you can distil out how it compares to BODMAS.
The ANSI C precedence scheme is pictured here:
As you can see, Unary Addition and Subtraction are at level 2 - ABOVE Multiplication and Division in level 3. This can be confusing to a mathematician on a superficial reading, as can precedence around suffix/postfix increment and decrement.
To that extent, it is ALWAYS worth considering adding brackets in your mathematical code - even where syntactically unnecessary - to make sure to a HUMAN reader that your intention is clear. You lose nothing by doing it (although you might get flamed a bit by an uptight code reviewer, in which you can flame back about coding risk management). You might lose readability, but intention is always more important when debugging.
And yes, the link you provide is a good example. Countless expensive production errors have resulted from this.

C What is an "arithmetic operator"?

While at least from a hand wave point of view I believe I know what an "arithmetic operator" is, I'm looking for a formal definition. I've examined the C17 standard document and I can't find such a definition, although it uses the term "arithmetic operator" in several places.
The closest I've been able to find is in the index of C17, where page numbers are provided for additive, bitwise, increment and decrement, multiplicative, shift, and unary under the common heading "arithmetic operators". I've looked online at various sources and the most common thing I've found only says that binary +, -, *, /, and % are the C arithmetic operators. Some also throw in ++ and --.
I'm pretty sure I'm simply missing something since I do find the standard quite daunting. However, I also find the various online sources somewhat dubious since they often seem to differ.
Thanks!
Update: Since some readers objected to my references to both C and C++ in the same posting, I've removed the references to C++ in the modified version above and will do an entirely separate posting for it later if I can first get the issue resolved for C.
The C standard does not explicitly define the term arithmetic operator, though it defines what an arithmetic operand is. If you read carefully, nothing in C is defined by using the term arithmetic operator, it exists only as a grouping in the index and in a title of one section. The term arithmetic operator by itself does not appear in any paragraph.
From the index, we indeed can get a list
arithmetic operators
additive, 6.2.6.2, 6.5.6, G.5.2
bitwise, 6.2.6.2, 6.5.3.3, 6.5.10, 6.5.11, 6.5.12
increment and decrement, 6.5.2.4, 6.5.3.1
multiplicative, 6.2.6.2, 6.5.5, G.5.1
shift, 6.2.6.2, 6.5.7
unary, 6.5.3.3
From this we could formulate that the arithmetic operators are those that require the operands to be arithmetic operands, i.e. of an arithmetic type (except in special cases such as pointer addition, subtraction), i.e.
additive + and -
bitwise &, | and ^
increment and decrement ++ and --
multiplicative *, / and %
shift << and >>
unary -, ~ and +. It is debatable whether ! is an arithmetic operator or not, even though it is listed in section 6.5.3.3.
Another notable thing about these operators are that the operands might undergo usual arithmetic conversions.
Arithmatic operators are operators used to perform mathematical operations like addition, substraction, multiplication and division. As simple as that.
ex: a+b = c

Lazy arithmetic in C

As far as I know, C uses lazy calculation for logical expressions, e. g. in expression
f(x) && g(x)
g(x) will not be called if f(x) is false.
But what about arithmetic expressions like
f(x)*g(x)
Does g(x) will be called if f(x) is zero?
Yes, arithmetic operations are eager, not lazy.
So in f(x)*g(x) both f and g are always called (pedantically the compiler is transforming that into some A-normal form and could even avoid some calls if that is not observable), but there is no guarantee about the order of calling f before or after g. And evaluating x*1/x or y*1/x is undefined behavior when x is 0.
This is not true in Haskell AFAIU
Yes, g(x) will still be called.
Generally, it would be a quite slow to conditionally elide the evaluation of the right-hand side just because the left-hand side is zero. Perhaps not in the case where the right-hand side is an expensive function call, but the compiler wouldn't presume to know that.
It's called "Short Circuit" instead of lazy. And, at least as far as the standard cares, yes -- i.e., it doesn't specify short-circuit evaluation for *.
A compiler might be able to do short-circuit evaluation if it can be certain g() has no side effects, but only under the as-if rule (i.e., it can do so only by finding that there's no externally observable difference, not because the standard gives it any direct permission to do so).
In case of logical operators && and || order of evaluation bound to take place from left to right and short circuiting takes place.
There is a sequence point between evaluation of the left and right operands of the && (logical AND), || (logical OR) (as part of short-circuit evaluation). For example, in the expression *p++ != 0 && *q++ != 0, all side effects of the sub-expression *p++ != 0 are completed before any attempt to access q, but not in case of arithmetic operators .
While that optimization would be possible, there are a few arguments against it:
You might pay more for the optimization than you get back from it: Unlike with logical operators, the optimization is likely to be beneficial in only a small percentage of all cases with arithmetic operators, but at the same time requires an additional check for 0 for every operation.
Because boolean truth values only have two possible values, there is a theoretical 50 % chance (1 &div; 2) with short-circuiting boolean expressions that the second operand will not have to be evaluated. (This assumes uniform distribution, which is perhaps not realistic, but bear with me.) That is, you are likely to profit from the optimization in a relatively large percentage of cases.
Contrast this with integral numbers, where 0 is only one out of millions of possible values. The probability that the first operand is 0 is much lower: 1 &div; 232 (for 32-bit integers, again assuming uniform distribution). Even if 0 were in fact somewhat more probable to occur than that (i.e. with a non-uniform distribution), it's still unlikely that we're dealing with the same order of magnitude as with truth values.
Floating point math further aggravates that issue. Here you need to deal with the possibility of rounding errors and denormalization. The probability that some calculation yields exactly 0 is likely to be even lower than with integral numbers.
Therefore the optimization is relatively unlikely to result in the remaining operand not being evaluated. But it will result in an added check for zero, 100 % of the time!
If you want evaluation rules to remain reasonably consistent, you would have to redefine short-circuit evaluation order of && and ||: Division has one important corner case, namely division by 0: Even if the first operand is 0, the quotient is not necessarily 0. Divison by 0 is to be treated as an error (except perhaps in IEEE floating-point math); therefore, you always have to evaluate the second operand in order to determine whether the calculation is valid.
There is one alternative optimization for /: division by 1. In that case, you wouldn't have to divide at all, but simply return the first operand. / would therefore be better optimised by starting with the second operand (divisor).
Now, unless you want &&, ||, and * to start evaluation with the first operand, but / to start with the second (which might seem unintuitive), you would have to generally re-define short-circuiting behavior such that the second operand always gets evaluated first, which would be a departure from the status quo.
This is not per se a problem, but might break a lot of existing code if the C language were thus changed.
The optimization might break "compatibility" with C++ code where operators can be overloaded. Would the optimizations still apply to overloaded * and / operators? Or would there have to be two different forms of these operators, one short-circuiting, and one with eager evaluation?
Again, this is not a deficiency inherent in short-circuit arithmetic operators, but an issue that would arise if such short-circuiting were introduced into the C (and C++) language as a breaking change.

Order of operations for dereference and bracket-ref in C

If I do *ptr[x], is that equivalent to *(ptr[x]), or (*ptr)[x]?
*(ptr[x])
See the Wikipedia operator precedence table, or, for a more detailed table, this C/C++ specific table.
In C, all postfix operators have higher precedence than prefix operators, and prefix operators have higher precedence than infix operators. So its *(ptr[x])
Using the counter-clockwise movement of analyzing and parsing that simple example
1. starting with ptr, work in counter-clockwise until you hit asterisk operator
2. asterisk, in counter-clockwise until you hit subscript operator
3. we arrive here, at subscript operator [x]
Since [] has higher precedence than the asterisk as per this table , that makes it *(ptr[x])

Resources