What makes a SQL statement sargable? - sql-server

By definition (at least from what I've seen) sargable means that a query is capable of having the query engine optimize the execution plan that the query uses. I've tried looking up the answers, but there doesn't seem to be a lot on the subject matter. So the question is, what does or doesn't make an SQL query sargable? Any documentation would be greatly appreciated.
For reference: Sargable

The most common thing that will make a query non-sargable is to include a field inside a function in the where clause:
SELECT ... FROM ...
WHERE Year(myDate) = 2008
The SQL optimizer can't use an index on myDate, even if one exists. It will literally have to evaluate this function for every row of the table. Much better to use:
WHERE myDate >= '01-01-2008' AND myDate < '01-01-2009'
Some other examples:
Bad: Select ... WHERE isNull(FullName,'Ed Jones') = 'Ed Jones'
Fixed: Select ... WHERE ((FullName = 'Ed Jones') OR (FullName IS NULL))
Bad: Select ... WHERE SUBSTRING(DealerName,4) = 'Ford'
Fixed: Select ... WHERE DealerName Like 'Ford%'
Bad: Select ... WHERE DateDiff(mm,OrderDate,GetDate()) >= 30
Fixed: Select ... WHERE OrderDate < DateAdd(mm,-30,GetDate())

Don't do this:
WHERE Field LIKE '%blah%'
That causes a table/index scan, because the LIKE value begins with a wildcard character.
Don't do this:
WHERE FUNCTION(Field) = 'BLAH'
That causes a table/index scan.
The database server will have to evaluate FUNCTION() against every row in the table and then compare it to 'BLAH'.
If possible, do it in reverse:
WHERE Field = INVERSE_FUNCTION('BLAH')
This will run INVERSE_FUNCTION() against the parameter once and will still allow use of the index.

In this answer I assume the database has sufficient covering indexes. There are enough questions about this topic.
A lot of the times the sargability of a query is determined by the tipping point of the related indexes. The tipping point defines the difference between seeking and scanning an index while joining one table or result set onto another. One seek is of course much faster than scanning a whole table, but when you have to seek a lot of rows, a scan could make more sense.
So among other things a SQL statement is more sargable when the optimizer expects the number of resulting rows of one table to be less than the tipping point of a possible index on the next table.
You can find a detailed post and example here.

For an operation to be considered sargable, it is not sufficient for it to just be able to use an existing index. In the example above, adding a function call against an indexed column in the where clause, would still most likely take some advantage of the defined index. It will "scan" aka retrieve all values from that column (index) and then eliminate the ones that do not match to the filter value provided. It is still not efficient enough for tables with high number of rows.
What really defines sargability is the query ability to traverse the b-tree index using the binary search method that relies on half-set elimination for the sorted items array. In SQL, it would be displayed on the execution plan as a "index seek".

Related

How to match a substring exactly in a string in SQL server?

I have a column workId in my table which has values like :
W1/2009/12345, G2/2018/2345
Now a user want to get this particular id G2/2018/2345. I am using like operator in my query as below:
select * from u_table as s where s.workId like '%2345%' .
It is giving me both above mentioned workids. I tried following query:
select * from u_table as s where s.workId like '%2345%' and s.workId not like '_2345'
This query also giving me same result.
If anyone please provide me with the correct query. Thanks!
Why not use the existing delimiters to match with your criteria?
select *
from u_table
where concat('/', workId, '/') like concat('%/', '2345', '/%');
Ideally of course your 3 separate values would be 3 separate columns; delimiting multiple values in a single column goes against first-normal form and prevents the optimizer from performing an efficient index seek, forcing a scan of all rows every time, hurting performance and concurrency.

SQL Server : function precedence and short curcuiting in where clause

Consider this setup:
create table #test (val varchar(10))
insert into #test values ('20100101'), ('1')
Now if I run this query
select *
from #test
where ISDATE(val) = 1
and CAST(val as datetimeoffset) > '2005-03-01 00:00:00 +00:00'
it will fail with
Conversion failed when converting date and/or time from character string
which tells me that the where conditions are not short-circuited and both functions are evaluated. OK.
However if I run
select *
from #test
where LEN(val) > 2
and CAST(val as datetimeoffset) > '2005-03-01 00:00:00 +00:00'
it doesn't fail, which tells me that where clause is short-circuited in this case.
This
select *
from #test
where ISDATE(val) = 1
and CAST(val as datetimeoffset) > '2005-03-01 00:00:00 +00:00'
and LEN(val) > 2
fails again, but if I move length check to before cast, it work. So looks like the functions are evaluated in the order they appear in query.
Can anyone explain why first query fails?
It fails because SQL is declarative so the order of your conditions is not taken into account when the plan is generated (nor is it required to do so).
The usual way to get around this is to use CASE which has strict rules about sequence and when to stop.
In your case you will probably need nested CASEs, something like this:
WHERE
(
case when ISDATE(val) = 1 then
case when CAST(val as datetimeoffset) > '2005-03-01 00:00:00 +00:00' and
LEN(val) > 2
THEN 1 ELSE 0 END
ELSE 0
END
) = 1
(note this is unlikely to be actually correct SQL as I just typed it in).
By the way, even if you get it "working" by rearranging the conditions, I'd advise you don't. Accept that SQL simply doesn't work in that way. As the data changes & stats change, SQL is upgraded, workload varies, indexes are added the query plan could change. Any attempts to "get it working" are going to be short-lived at best so go with the CASE which will continue to work once you've got it right (provided you nest CASE statements where required and don't fall into the same precedence trap in the CASE conditions!)
The mystery is answered if you examine the Execution Plan. Both the CAST() and the LEN() are applied as part of the Table Scan step, while the test for IsDate() is a separate Filter test after the Table Scan.
It appears that the SQL Engine's internal optimizations use certain filtering functions as part of the retrieval of the data, and others as separate filters, almost certainly as a form of query optimization to minimize the load from disk into main memory. However, more complex functions, such as IsDate(), which is dependent on system variables such as system date format in some cases (is '01/02/2017' Jan 2nd or Feb 1st?), need to have the data retrieved before the filter is applied.
Although I have no hard information on this, I strongly suspect that any filter more resource intensive than a certain level is delegated to the Filter steps in the query plan, and anything simple/fast enough to be checked as the data is being read in is applied during the Scan/Seek steps. Also, if a filter could be applied on the data in the index, I am certain that it will be tested before any non-index data is tested, solely to minimize disk reads, which are bad performance juju (this may not apply on the Clustered index of the table). In these cases, the short-circuiting might not be straightforward, with an IsDate() test specified on a non-index field being executed after a similar test on an indexed field, no matter where they are in the list of conditions.
That said, it appears to be true that conditions short-circuit when they are executed in the same step of the query plan. If you insert a string like '201612123' into the temp table, then add a check on Len(val) < 9 after the date comparison, it still generates an error, instead of checking both LEN() conditions at the same time in a tiny optimization.
which tells me that where conditions are not short-circuited and both functions are evaluated.
To expand on LoztInSpace's answer, your terminology suggests you are not interpreting SQL correctly, on its own terms.
The various parts of a SELECT statement are not "functions". The entire statement is atomic. You supply the query as unit, and the DBMS responds. There is no "before" and no "after". There is just the query.
Those are the rules. Your job in formulating the query is to supply one that is valid. It's a logical progression: valid question, valid answer, etc. The moment you step out of that frame, you might as well be asking, "why is the sky seven?".
One a small clarification to #LoztInSpace's answer. When he refers to the order of your statements, he's presumably talking about the phrasing of your query, which for purposes of evaluation is inconsequential. Sequential SQL statements are executed sequentially, as presented. That is guaranteed by the SQL standard.

Character-matching queries in SQL

I'm attempting to optimize a T-SQL stored procedure I have. It's for pulling records based on a VIN (a 17-character alphanumeric string); usually people only know a few of the digits—e.g. the first digit could be '1', '2', or 'J'; the second is 'H' but the third could be 'M' or 'G'; and so on.
This leads to a pretty convoluted query whose WHERE clause is something like
WHERE SUBSTRING(VIN,1,1) IN ('J','1','2')
AND SUBSTRING(VIN,2,1) IN ('H')
AND SUBSTRING(VIN,3,1) IN ('M','G')
AND SUBSTRING(VIN,4,1) IN ('E')
AND ... -- and so on for however many digits we need to search on
The table I'm querying on is huge (millions of records) so the queries I'm running that have this kind of WHERE clause can take hours to run if there are more than a couple digits being searched on, even if I'm only requesting the top 3000 records. I feel like there has to be a way to get this substring character matching to run faster. Hours are completely unacceptable; I'd like to have these kinds of queries run in just a few minutes.
I don't have any editing privileges on the database, sadly, so I can't add indexes or anything like that; all I can do is change my stored procedure (although I can try to beg the DBAs to modify the table).
You can use
WHERE VIN LIKE '[J12]H[MG]E%'
At least that should hopefully lead to 3 index seeks on the ranges JH%, 1H%, and 2H% rather than a full scan.
Edit Although testing locally I found that it does not do multiple index seeks as I had hoped it converts the above to a single seek on the larger range VIN >= '1' and VIN < 'K' with a residual predicate to evaluate the LIKE
I'm not sure whether it will do this for larger tables or not but otherwise it may well be worth trying to encourage this plan with
WHERE (VIN LIKE 'JH%' OR VIN LIKE '1H%' OR VIN LIKE '2H%')
AND VIN LIKE '[J12]H[MG]E%'
You could use the LIKE keyword
SELECT
*
FROM Table
WHERE VIN LIKE '[J12]H[MG]E%'
This would even allow you to work with instance where they know the second character is not 'A' by using [^A] in the statement, such as:
WHERE VIN LIKE '[J12][^A][MG]E%'
Reference
http://msdn.microsoft.com/en-us/library/ms179859.aspx
I like the LIKE answers, but here's another alternative (especially if your input isn't always the same).
I would do this as a series of queries on ever-smaller temp tables (Yes, I'm in love with temp tables- sue me.)
So I would do something like
SELECT [Fields]
INTO #tempResultsFirstTwoDigits
FROM VIN
WHERE [Clause]
Then keep moving down the chain digit by digit until you've searched each of the provided characters. So you might do this:
if len(#input) > 2
SELECT [Fields]
INTO #tempResultsThreeDigits
FROM VIN
WHERE Substring(VIN, 3, 1) = Substring(#input, 3, 1)
//NOTE: That where clause might be sped up by initializing a variable at
// the beginning of the SP for each character you got.
Else Select * From #tempResultsFirstTwoDigits
GOTO Stop //Where "Stop" just defines the end of the SP to skip any further checks
Again, LIKE might be a better answer for you, but I would try both approaches and benchmark both of them.

How to force table select to go over blocks

How can I make Sybase's database engine return an unsorted list of records in non-numeric order?
~~~
I have an issue where I need to reproduce an error in the application where I select from a table where the ID is generated in sequence, but the ID is not the last one in the selection.
Let me explain.
ID STATUS
_____________
1234 C
1235 C
1236 O
Above is 3 IDs. I had code where these would be the results of a
select #p_id = ID from table where (conditions).
However, there wasn't a clause to check for status = 'O' (open). Remember Sybase saves the last returned record into a variable.
~~~~~
I'm being asked to give the testing team something that will make the results not work. If Sybase selects the above in an unordered list, it could appear in ascending order, or, if the database engine needs to change blocks of stored data or something technical magic stuff, the order could be messed up. The original error was when the procedure would return say 1234 instead of 1236.
Is there a way that I can have a 100% guarantee that Sybase will search over a block of data and have to double back, effectively 'breaking' the ascending search, and returning not the last record, but any other one? (all records except the maximum will end up erroring, because they are all 'Closed')
I want some sort of magical SQL code that will make sure things don't search the table in exactly numeric order. Ideally I'd like to not have to change the procedure, as the test team want to see the exact same procedure breaking (as easy as plonking a order by id desc would fudge the results).
If you don't specify an order, there is no way to guarantee the return order of the results. It will be however the index is built - and can depend on the order of insertion, the type of index, and the content of index keys.
It's generally a bad idea to do those sorts of singleton SELECTs. You should always specify a specific record with the WHERE clause, or use a cursor, or TOPn or similar. The problem comes when someone tries to understand your code, because some databases when they see multiple hits take the first value, some take the last value, some take a random value (they call that "implementation-defined"), and some throw an error.
Is this by any chance related to 1156837? :)

T-SQL Where Clause Case Statement Optimization (optional parameters to StoredProc)

I've been battling this one for a while now. I have a stored proc that takes in 3 parameters that are used to filter. If a specific value is passed in, I want to filter on that. If -1 is passed in, give me all.
I've tried it the following two ways:
First way:
SELECT field1, field2...etc
FROM my_view
WHERE
parm1 = CASE WHEN #PARM1= -1 THEN parm1 ELSE #PARM1 END
AND parm2 = CASE WHEN #PARM2 = -1 THEN parm2 ELSE #PARM2 END
AND parm3 = CASE WHEN #PARM3 = -1 THEN parm3 ELSE #PARM3 END
Second Way:
SELECT field1, field2...etc
FROM my_view
WHERE
(#PARM1 = -1 OR parm1 = #PARM1)
AND (#PARM2 = -1 OR parm2 = #PARM2)
AND (#PARM3 = -1 OR parm3 = #PARM3)
I read somewhere that the second way will short circuit and never eval the second part if true. My DBA said it forces a table scan. I have not verified this, but it seems to run slower on some cases.
The main table that this view selects from has somewhere around 1.5 million records, and the view proceeds to join on about 15 other tables to gather a bunch of other information.
Both of these methods are slow...taking me from instant to anywhere from 2-40 seconds, which in my situation is completely unacceptable.
Is there a better way that doesn't involve breaking it down into each separate case of specific vs -1 ?
Any help is appreciated. Thanks.
I read somewhere that the second way will short circuit and never eval the second part if true. My DBA said it forces a table scan.
You read wrong; it will not short circuit. Your DBA is right; it will not play well with the query optimizer and likely force a table scan.
The first option is about as good as it gets. Your options to improve things are dynamic sql or a long stored procedure with every possible combination of filter columns so you get independent query plans. You might also try using the "WITH RECOMPILE" option, but I don't think it will help you.
if you are running SQL Server 2005 or above you can use IFs to make multiple version of the query with the proper WHERE so an index can be used. Each query plan will be placed in the query cache.
also, here is a very comprehensive article on this topic:
Dynamic Search Conditions in T-SQL by Erland Sommarskog
it covers all the issues and methods of trying to write queries with multiple optional search conditions
here is the table of contents:
Introduction
The Case Study: Searching Orders
The Northgale Database
Dynamic SQL
Introduction
Using sp_executesql
Using the CLR
Using EXEC()
When Caching Is Not Really What You Want
Static SQL
Introduction
x = #x OR #x IS NULL
Using IF statements
Umachandar's Bag of Tricks
Using Temp Tables
x = #x AND #x IS NOT NULL
Handling Complex Conditions
Hybrid Solutions – Using both Static and Dynamic SQL
Using Views
Using Inline Table Functions
Conclusion
Feedback and Acknowledgements
Revision History
If you pass in a null value when you want everything, then you can write your where clause as
Where colName = IsNull(#Paramater, ColName)
This is basically same as your first method... it will work as long as the column itself is not nullable... Null values IN the column will mess it up slightly.
The only approach to speed it up is to add an index on the column being filtered on in the Where clause. Is there one already? If not, that will result in a dramatic improvement.
No other way I can think of then doing:
WHERE
(MyCase IS NULL OR MyCase = #MyCaseParameter)
AND ....
The second one is more simpler and readable to ther developers if you ask me.
SQL 2008 and later make some improvements to optimization for things like (MyCase IS NULL OR MyCase = #MyCaseParameter) AND ....
If you can upgrade, and if you add an OPTION (RECOMPILE) to get decent perf for all possible param combinations (this is a situation where there is no single plan that is good for all possible param combinations), you may find that this performs well.
http://blogs.msdn.com/b/bartd/archive/2009/05/03/sometimes-the-simplest-solution-isn-t-the-best-solution-the-all-in-one-search-query.aspx

Resources