BETWEEN operator vs. >= AND <=: Is there a performance difference? - sql-server

These two statements are logically equivalent:
SELECT * FROM table WHERE someColumn BETWEEN 1 AND 100
SELECT * FROM table WHERE someColumn >= 1 AND someColumn <= 100
Is there a potential performance benefit to one versus the other?

No benefit, just a syntax sugar.
By using the BETWEEN version, you can avoid function reevaluation in some cases.

There's no performance benefit, it's just easier to read/write the first one.

No, no performance benifit. Its just a little candy.
If you were to check a query comparison, something like
DECLARE #Table TABLE(
ID INT
)
SELECT *
FROM #Table
WHERE ID >= 1 AND ID <= 100
SELECT *
FROM #Table
WHERE ID BETWEEN 1 AND 100
and check the execution plan, you should notice that it is exactly the same.

Hmm, here was a surprising result. I don't have SQL Server here, so I tried this in Postgres. Obviously disclaimers apply: this won't necessarily give the same results, your mileage may vary, consult a physician before using. But still ...
I just wrote a simple query in two different ways:
select *
from foo
where (select code from bar where bar.barid=foo.barid) between 'A' and 'B'
and
select *
from foo
where (select code from bar where bar.barid=foo.barid)>='A'
and (select code from bar where bar.barid=foo.barid)<='B'
Surprisingly to me, both had almost identical run times. When I did an EXPLAIN PLAN, they gave identical results. Specifically, the first query did the lookup against bar twice, once for the >= test and again for the <= test, just like the second query.
Conclusion: In Postgres, at least, BETWEEN is indeed just syntactic sugar.
Personally, I use it regularly because it is clearer to the reader, especially if the value being tested is an expression. Figuring out that two complex expressions are identical can be a non-trivial exercise. Figuring out that two complex expressions SHOULD BE identical even though they're not is even more difficult.

Oh, but you all referring to the case when search value is on the left side of the where clause.
Did anybody look at differences when is in the other side of the clause.
SELECT * FROM table WHERE #date BETWEEN someCol1 AND someCol2
SELECT * FROM table WHERE someCol1 >= #date AND someCol2 <= #date

Related

SQL Server: random number in WHERE clause

As far as I am aware, the only way to get a random value in a SELECT statement is by using the newid() function, as the random() function doesn’t generate new values for each row.
This leads to the following awkward construction to get a random number from, say 0 - 9:
abs(checksum(newid())) % 10
If I use this expression in the SELECT clause, it behaves as expected. However, if I try something like the following:
select *
from table
where abs(checksum(newid())) % 10>4;
I should have though that I would get roughly half the rows. Instead I get I get all or none of them. Apparently newid() is only evaluated once, instead of for each row.
The question is, how can I use a random number in the WHERE clause?
More
There is a similar question which asks for fixed number of rows at random. In the above example I could have used:
select top 50 percent from table order by newid();
which will get me what I am looking for.
The question remains, how can I use a random number in the WHERE clause. For example, is it possible to do something like this?
select *
from table
where code={random number};
Here is one way to get around the problem
SELECT *
FROM (SELECT *,
Abs(Checksum(Newid())) % 10 AS ran
FROM yourtable) a
WHERE ran > 4;
for some reason newid() in where clause it is executed only once and it is checked with the constant.
When I check the execution plan your query is missing compute scalar where as my query has compute scalar present in execution plan.
The function newid() is calculate only once in the WHERE clause, not row by row. The trick is to force it to run row by row.
Of course it is possible to include it in a SELECT clause, and, in turn, include that in a CTE or a subquery, as per the other answers.
Microsoft offer a solution here: https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/ms189108(v=sql.105)?redirectedfrom=MSDN
The trick is to force newid() to recalculate by combining it with some row value. This is easily done in the checksum() function.
For example:
SELECT *
FROM table
WHERE abs(checksum(newid(),id)) % 10>4;
I should have though that I would get roughly half the rows. Instead I get I get all or none of them
You may get all of the rows or none of them ,since NEWID() is executed once per query when you use it in where clause..This is explained here by Conor Cunnigham and the technical term for this is called RumTimeConstants
You can look at your execution plan and look out for below expression
Const ConstValue
which you can see is calculated once and used throughout and finally you are doing just a boolean comparison,so you will end up with all rows or none
you have to use CTE Like the one stated in another answer or use Top with order by newid() or tablesample to return random rows
you may find Tablesample option more helpfull,since this may not go though all the table data to get only sample set of rows,unlike Newid()
below is one example on a table having 1000000 rows
select * from Orders
TABLESAMPLE (50 PERCENT)
plan

Group by an evaluated field (sql server) [duplicate]

Why are column ordinals legal for ORDER BY but not for GROUP BY? That is, can anyone tell me why this query
SELECT OrgUnitID, COUNT(*) FROM Employee AS e GROUP BY OrgUnitID
cannot be written as
SELECT OrgUnitID, COUNT(*) FROM Employee AS e GROUP BY 1
When it's perfectly legal to write a query like
SELECT OrgUnitID FROM Employee AS e ORDER BY 1
?
I'm really wondering if there's something subtle about the relational calculus, or something, that would prevent the grouping from working right.
The thing is, my example is pretty trivial. It's common that the column that I want to group by is actually a calculation, and having to repeat the exact same calculation in the GROUP BY is (a) annoying and (b) makes errors during maintenance much more likely. Here's a simple example:
SELECT DATEPART(YEAR,LastSeenOn), COUNT(*)
FROM Employee AS e
GROUP BY DATEPART(YEAR,LastSeenOn)
I would think that SQL's rule of normalize to only represent data once in the database ought to extend to code as well. I'd want to only right that calculation expression once (in the SELECT column list), and be able to refer to it by ordinal in the GROUP BY.
Clarification: I'm specifically working on SQL Server 2008, but I wonder about an overall answer nonetheless.
One of the reasons is because ORDER BY is the last thing that runs in a SQL Query, here is the order of operations
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
so once you have the columns from the SELECT clause you can use ordinal positioning
EDIT, added this based on the comment
Take this for example
create table test (a int, b int)
insert test values(1,2)
go
The query below will parse without a problem, it won't run
select a as b, b as a
from test
order by 6
here is the error
Msg 108, Level 16, State 1, Line 3
The ORDER BY position number 6 is out of range of the number of items in the select list.
This also parses fine
select a as b, b as a
from test
group by 1
But it blows up with this error
Msg 164, Level 15, State 1, Line 3
Each GROUP BY expression must contain at least one column that is not an outer reference.
There is a lot of elementary inconsistencies in SQL, and use of scalars is one of them. For example, anyone might expect
select * from countries
order by 1
and
select * from countries
order by 1.00001
to be a similar queries (the difference between the two can be made infinitesimally small, after all), which are not.
I'm not sure if the standard specifies if it is valid, but I believe it is implementation-dependent. I just tried your first example with one SQL engine, and it worked fine.
use aliasses :
SELECT DATEPART(YEAR,LastSeenOn) as 'seen_year', COUNT(*) as 'count'
FROM Employee AS e
GROUP BY 'seen_year'
** EDIT **
if GROUP BY alias is not allowed for you, here's a solution / workaround:
SELECT seen_year
, COUNT(*) AS Total
FROM (
SELECT DATEPART(YEAR,LastSeenOn) as seen_year, *
FROM Employee AS e
) AS inline_view
GROUP
BY seen_year
databases that don't support this basically are choosing not to. understand the order of the processing of the various steps, but it is very easy (as many databases have shown) to parse the sql, understand it, and apply the translation for you. Where its really a pain is when a column is a long case statement. having to repeat that in the group by clause is super annoying. yes, you can do the nested query work around as someone demonstrated above, but at this point it is just lack of care about your users to not support group by column numbers.

Select top n query - what can I plug into n to select all?

I have a query:
Select top(#val) from myTable
In fact, this is in Dapper in an asp.net MVC 5 site - but I assume the principle is the same.
Rather than have an if with one branch selecting all (dropping the top(#val)) - and the other branch selecting the top n records. Is there a value I can pass to #val to just select all - as if the top statement wasn't there? (eg passing in 0 or -1 say - (I tried these, 0 returns 0 rows, -1 returns a syntax error)).
I don't want to just pass an arbitrary high number in as this may cause issues later, is inelegant, and may be less efficient.
Is there a solution here or am I stuck with passing in a high number or doing an if statement?
I can't find anything on Google about this.
I am not sure what you need but may be this will work for your needs:
select top (select count(*) from myTable) * from myTable
But I have no idea if you will be able to produce such query from Dapper...

NOT IN subquery fails when there are NULL-valued results

Sorry guys, I had no idea how to phrase this one, but I have the following in a where clause:
person_id not in (
SELECT distinct person_id
FROM protocol_application_log_devl pal
WHERE pal.set_id = #set_id
)
When the subquery returns no results, my whole select fails to return anything. To work around this, I replaced person_id in the subquery with isnull(person_id, '00000000-0000-0000-0000-000000000000').
It seems to work, but is there a better way to solve this?
It is better to use NOT EXISTS anyway:
WHERE NOT EXISTS(
SELECT 1 FROM protocol_application_log_devl pal
WHERE pal.person_id = person_id
AND pal.set_id = #set_id
)
Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?
A pattern I see quite a bit, and wish that I didn't, is NOT IN. When
I see this pattern, I cringe. But not for performance reasons – after
all, it creates a decent enough plan in this case:
The main problem is that the results can be surprising if the target
column is NULLable (SQL Server processes this as a left anti semi
join, but can't reliably tell you if a NULL on the right side is equal
to – or not equal to – the reference on the left side). Also,
optimization can behave differently if the column is NULLable, even if
it doesn't actually contain any NULL values
Instead of NOT IN, use a correlated NOT EXISTS for this query pattern.
Always. Other methods may rival it in terms of performance, when all
other variables are the same, but all of the other methods introduce
either performance problems or other challenges.
While I support Tim's answer as being correct-in-practice (NOT IN is not appropriate here), this is an interesting case noted in the IN / NOT IN documentation:
Caution: Any null values returned by subquery or expression that are compared to test_expression using IN or NOT IN return UNKNOWN. Using null values in together with IN or NOT IN can produce unexpected results1.
This is why the isnull "fixes" the problem - it masks any such NULL values and avoids the unexpected behavior. With that in mind, the following approach would also work (but please heed the advice about not using NOT IN to begin with):
person_id not in (
SELECT distinct person_id
FROM protocol_application_log_devl pal
WHERE pal.set_id = #set_id
AND person_id NOT NULL -- guard here
)
However, a NULL person_id is suspicious and might indicate other issues ..
1 Here is the Proof pudding:
select case when 1 not in (2) then 1 else 0 end as r1,
case when 1 not in (2, NULL) then 1 else 0 end as r2
-- r1: 1, r2: 0
I just replaced the null value with empty value using isnull function as below example. It solved my issue
where isnull(UserId,'') not in (select UserID from users where ...)
This should work:
nvl(person_id, '') not in (
SELECT distinct person_id
FROM protocol_application_log_devl pal
WHERE pal.set_id = #set_id
)

Error converting date in sql server [duplicate]

I've been trying to understand why I get a "divide by zero encountered" (Msg 8134) with my SQL query, but I must be missing something. I would like like to know the why for the specific case below, I am not looking for NULLIF, CASE WHEN... or similar as I already know about them (and can of course use them in a situation as the one below).
I have an SQL statement with a computed column similar to
SELECT
TotalSize,
FreeSpace,
(FreeSpace / TotalSize * 100)
FROM
tblComputer
...[ couple of joins ]...
WHERE
SomeCondition = SomeValue
Running this statement errors with the above mentioned error messages, which, in itself, is not the problem - obviously TotalSize might well be 0 and therefore cause the error.
Now what I don't understand is that I do not have any rows where the TotalSize column is 0 when I comment the computed column out, I double checked that this isn't the case.
Then I thought that for some reason the column computation would be performed on the whole result set before actually filtering with the conditions of the where clause, but this a) wouldn't make sense imho and b) when trying to reproduce the error with a test set-up everything works fine (see below):
INSERT INTO tblComputer (ComputerName, IsServer) VALUES ('PC0001',1)
INSERT INTO tblComputer (ComputerName, IsServer) VALUES ('PC0002',1)
INSERT INTO tblComputer (ComputerName, IsServer) VALUES ('PC0003',1)
INSERT INTO tblComputer (ComputerName, IsServer) VALUES ('PC0004',0)
INSERT INTO tblComputer (ComputerName, IsServer) VALUES ('PC0005',1)
INSERT INTO tblComputer (ComputerName, IsServer) VALUES ('PC0006',0)
INSERT INTO tblComputer (ComputerName, IsServer) VALUES ('PC0007',1)
INSERT INTO tblHDD (ComputerID, TotalSize, FreeSpace) VALUES (1,100,21)
INSERT INTO tblHDD (ComputerID, TotalSize, FreeSpace) VALUES (2,100,10)
INSERT INTO tblHDD (ComputerID, TotalSize, FreeSpace) VALUES (3,100,55)
INSERT INTO tblHDD (ComputerID, TotalSize, FreeSpace) VALUES (4,0,10)
INSERT INTO tblHDD (ComputerID, TotalSize, FreeSpace) VALUES (5,100,23)
INSERT INTO tblHDD (ComputerID, TotalSize, FreeSpace) VALUES (6,100,18)
INSERT INTO tblHDD (ComputerID, TotalSize, FreeSpace) VALUES (7,100,11)
-- This statement does not throw an error as apparently the row for ComputerID 4
-- is filtered out before computing the (FreeSpace / TotalSize * 100)
SELECT
TotalSize,
FreeSpace,
(FreeSpace / TotalSize * 100)
FROM
tblComputer
JOIN
tblHDD ON
tblComputer.ID = tblHDD.ComputerID
WHERE
IsServer = 1
I am quite stumped and would like to know what the reason is.
Any ideas or pointers into the right direction are very welcome, thanks in advance
Update
Thank you so far for your input, but unfortunately I seem not to be getting closer to the root of the problem. I managed to strip the statement down a little bit and now have the case that I can execute it without errors if one JOIN is removed (I would need it for additional columns in the output which I temporarily removed).
I do not understand, why using the JOIN leads to the error, shouldn't a standard INNER JOIN always either return the same number of rows or less, but never more?
Working code
SELECT
TotalSize,
FreeSpace
((FreeSpace / TotalSize) * 100)
FROM
MyTable1
INNER JOIN
MyTable2 ON
MyTable1.ID = MyTable2.Table1ID
WHERE
SomeCondition
Error causing code
SELECT
TotalSize,
FreeSpace
((FreeSpace / TotalSize) * 100)
FROM
MyTable1
INNER JOIN
MyTable2 ON
MyTable1.ID = MyTable2.Table1ID
-- This JOIN causes "divide by zero encountered" error
INNER JOIN
MyTable3 ON
MyTable2.ID = MyTable3.Table2ID
WHERE
SomeCondition
I also tried my luck using a cursor and looping over the result row by row, but in that case no error occurred (no matter, which of the two statements above I tried).
Sorry for the messy code indentation, somehow the correct formatting doesn't seem to be applied.
G.
SQL is a declarative language; you write a query that logically describes the result you want, but it is up to the optimizer to produce a physical plan. This physical plan may not bear much relation to the written form of the query, because the optimizer does not simply reorder 'steps' derived from the textual form of the query, it can apply over 300 different transformations to find an efficient execution strategy.
The optimizer has considerable freedom to reorder expressions, joins, and other logical query constructions. This means that you cannot, in general, rely on any written query form to force one thing to be evaluated before another. In particular, the rewrite given by Lieven does not force the WHERE clause predicate to be evaluated before the expression. The optimizer may, depending on cost estimations, decide to evaluate the expression wherever it seems most efficient to do so. This may even mean, in some cases, that the expression is evaluated more than once.
The original question considered this possibility, but rejected it as 'not making much sense'. Nevertheless, this is the way the product works - if SQL Server estimates that a join will reduce the set size enough to make it cheaper to compute the expression on the result of the join, it is free to do so.
The general rule is to never depend on a particular evaluation order to avoid things like overflow or divide-by-zero errors. In this example, one would employ a CASE statement to check for a zero divisor - an example of defensive programming.
The optimizer's freedom to reorder things is a fundamental tenet of its design. You can find cases where it leads to counter-intuitive behaviours, but overall the benefits far outweigh the disadvantages.
Paul
The basic steps that SQL Server uses to process a single SELECT statement include the following
The parser scans the SELECT statement and breaks it into logical
units such as keywords, expressions,
operators, and identifiers.
A query tree, sometimes referred to as a sequence tree, is built
describing the logical steps needed to
transform the source data into the
format required by the result set.
The query optimizer analyzes different ways the source tables can
be accessed. It then selects the
series of steps that returns the
results fastest while using fewer
resources. The query tree is updated
to record this exact series of steps.
The final, optimized version of the
query tree is called the execution
plan.
The relational engine starts executing the execution plan. As the
steps that require data from the base
tables are processed, the relational
engine requests that the storage
engine pass up data from the rowsets
requested from the relational engine.
The relational engine processes the data returned from the storage
engine into the format defined for the
result set and returns the result set
to the client.
My interpretation of things is that there is no guarantee that your where clause get's evaluated before evaluating the computed column for all rows.
You could verify that assumption by changing you query like below and forcing the where clause to be evaluated before the computation.
SELECT
TotalSize,
FreeSpace,
(FreeSpace / TotalSize * 100)
FROM (
SELECT
TotalSize,
FreeSpace,
FROM
tblComputer
...[ couple of joins ]...
WHERE
SomeCondition = SomeValue
) t
What rows are returned when you run:
SELECT
TotalSize
FROM
tblComputer
...[ couple of joins ]...
WHERE
SomeCondition = SomeValue
and ((TotalSize * 100) = 0)
This might give you a clue as to how SQL Serve ris evaluating (TotalSize * 100) to be zero.
Another idea, is there anything in your where statement which might also be the problem?
You're assuming it's the TotalSize, but it might be somewhere else.
I was running into the same issue. In my case NULLs were acceptable so I was able to fix it this way:
Select Expression1 / Expression2 -- Caused Division By 0
Select Expression1 / NULLIF(Expression2,0) -- Causes result to be NULL
If you need other handling, you can wrap the entire expression in an ISNULL function like this:
Select ISNULL(Expression1 / NULLIF(Expression2,0)-5) -- Returns -5 instead of null or divide by 0

Resources