sql server, big procedure performance - sql-server

I have a stored procedure with 1k lines and 16 not nested if clauses. Each if works over a set of 4 variables, like:
if #a is null and #b is null and #c is null and #d is null
if #a is null and #b is null and #c is null and #d is not null
and so on (therefore 4^2=16). I wrote the procedure this way:
First, because its more readable than nested ifs (theres a lot of work happening over this procedure).
Second, because each if block gets a very simple peace of code (a select over a primary key, or a select union over 3 or 4 primary keys).
Third, the presence of variable #a means a union all, the presense of variable #b means a join and a distance calculation, the presence of #c means a join and the presence of #d means another join. Like:
Now I have been wondering about perfomance, and Im not sure how to write the procedure to get the best perfomance. Any tips?

you need to read Dynamic Search Conditions in T-SQL by Erland Sommarskog there are many techniques to support queries that use numerous optional filter parameters. The trick is picking the proper strategy, and doing a thousand IFs isn't the best.
If you have the proper SQL Server 2008 version (SQL 2008 SP1 CU5 (10.0.2746) and later), you can use this little trick to actually use an index: add OPTION (RECOMPILE) onto your query, see Erland's article, and SQL Server will resolve the OR from within (#OptionalParameter IS NULL OR YourColumn= #OptionalParameter ) before the query plan is created based on the runtime values of the local variables, and an index can be used.
This will work for any SQL Server version (return proper results), but only include the OPTION(RECOMPILE) if you are on SQL 2008 SP1 CU5 (10.0.2746) and later. The OPTION(RECOMPILE) will recompile your query, only the verison listed will recompile it based on the current run time values of the local variables, which will give you the best performance. If not on that version of SQL Server 2008, just leave that line off. Just remember that OR can kill index usage, but on the proper SQL Server 2008 version that OPTION(RECOMPILE) will use the index.
--sample procedure that uses optional search parameters
CREATE PROCEDURE YourProcedure
#FirstName varchar(25) = null,
#LastName varchar(25) = null,
#Title varchar(25) = null
AS
BEGIN
SELECT ID, FirstName, LastName, Title
FROM tblUsers
WHERE
(#FirstName IS NULL OR (FirstName = #FirstName))
AND (#LastNameIS NULL OR (LastName= #LastName))
AND (#TitleIS NULL OR (Title= #Title))
OPTION (RECOMPILE) ---<<<<use if on for SQL 2008 SP1 CU5 (10.0.2746) and later
END

Related

Unexpected order running OR operator on SQL Server 2019

I'm having a problem when migrating from SQL Server 2008 R2 to SQL Server 2019.
My code
DECLARE #str NVARCHAR(50) = 'all',
#int TINYINT = 1
DECLARE #tmp TABLE (val nvarchar(MAX))
INSERT INTO #tmp VALUES('123')
INSERT INTO #tmp VALUES('all')
SELECT val
FROM #tmp
WHERE #str = 'ALL' OR #int = val
When using SQL Server 2008 R2, it's fine. The output as expected like below
val
123
all
However, when I migrate to SQL Server 2019, the error like below occurs. Besides, it just happens unusually in 2019.
Msg 245 Level 16 State 1 Line 8 Conversion failed when converting the
nvarchar value 'all' to data type int.
As you can see, the second condition OR #int = val happened unexpectedly.
I was wondering if it fails due to any breaking changes related to the order of OR operator or case sensitive ALL vs all in the next SQL Server 2008 R2 version.
Updated
Sorry for my reproduce code make you guys confuse. This is my original code
You should do two of these three things:
(
Either use DECLARE #int nvarchar(max) = 1
OR
Use WHERE val = CONVERT(nvarchar(max), #int)
)
AND
Change to using STRING_SPLIT. That looping function is among the least efficient methods you could ever use to split strings, even before native solutions existed. See https://sqlblog.org/split
This db<>fiddle fiddle demonstrates.
And this one shows why WHERE #str = 'ALL' OR (#str <> 'ALL' AND #int = val) is not a solution. These patterns you're choosing only work if #str is always 'all', because they all break when it's anything else. So why have the OR at all?
You keep insisting that SQL Server should obey left to right evaluation, but we keep telling you that is simply not the case.
Here is an article by Bart Duncan at Microsoft, who worked on SQL Server, that you should absolutely read in full before posting any more comments or editing your question further. The critical point, though, is:
You cannot depend on expression evaluation order for things like WHERE <expr1> OR <expr2>since the optimizer might choose a plan that evaluates the second predicate before the first one.

SQL Server 2019 ignoring WHERE clause?

I'm getting an error when I try to run a simple aggregating query.
SELECT MAX(CAST(someDate as datetime)) AS MAX_DT FROM #SomeTable WHERE ISDATE(someDate) = 1
ERROR: Conversion failed when converting date and/or time from character string.
Non-date entries should be removed by WHERE clause, but that doesn't seem to be happening. I can work around with an explicit CASE statement inside the MAX(), but I don't want to hack up the query if I can avoid it. If I use a lower COMPATIBILITY_LEVEL, it works fine. If I have fewer than 2^17 rows, it works fine.
-- SQLServer 15.0.4043.16
USE AdventureWorks;
GO
ALTER DATABASE AdventureWorks SET COMPATIBILITY_LEVEL = 150;
GO
-- delete temp table if exists
DROP TABLE IF EXISTS #SomeTable;
GO
-- create temp table
CREATE TABLE #SomeTable (
someDate varchar(20) DEFAULT GETDATE()
);
-- load data, need at least 2^17 rows with at least 1 bad date value
INSERT #SomeTable DEFAULT VALUES;
DECLARE #i int = 0;
WHILE #i < 17
BEGIN
INSERT INTO #SomeTable (someDate) SELECT someDate FROM #SomeTable
SET #i = #i + 1;
END
GO
-- create invalid date row
WITH cteUpdate AS (SELECT TOP 1 * FROM #SomeTable)
UPDATE cteUpdate SET someDate='NOT_A_DATE'
-- error query
SELECT MAX(CAST(someDate as datetime)) AS MAX_DT
FROM #SomeTable
WHERE ISDATE(someDate) = 1
--ERROR: Conversion failed when converting date and/or time from character string.
-- delete temp table if exists
DROP TABLE IF EXISTS #SomeTable;
GO
I would recommend try_cast() rather than isdate():
SELECT MAX(TRY_CAST(someDate as datetime)) AS MAX_DT
FROM #SomeTable
This is a much more reliable approach: instead of relying on some heuristic to guess whether the value is convertible to a datetime (as isdate() does), try_cast actually attempts to convert, and returns null if that fails - which aggregate function max() happily ignores.
try_cast() (and sister functions try_convert()) is a very handy functions, that many other databases are missing.
I actually just encountered the same issue (except I did cast to float). It seems that the SQL Server 2019 Optimizer sometimes (yes, it's not reliable) decides to execute the calculations in the SELECT part before it applies the WHERE.
If you set compatibility level to a lower version this also results in a different optimizer being used (it always uses the optimizer of the compatibility level). Older query optimizers seem to always execute the WHERE part first.
Seems lowering the compatibility level is already the best solution unless you want to replace all CAST with TRY_CAST (which would also mean you won't spot actual errors as easily, such as a faulty WHERE that causes your calculation to then return NULL instead of the correct value).

Is there a way to prevent SQL Server silently truncating data in local variables and stored procedure parameters?

I recently encountered an issue while porting an app to SQL Server. It turned out that this issue was caused by a stored procedure parameter being declared too short for the data being passed to it: the parameter was declared as VARCHAR(100) but in one case was being passed more than 100 characters of data. What surprised me was that SQL Server didn't report any errors or warnings -- it just silently truncated the data to 100 characters.
The following SQLCMD session demonstrates this:
1> create procedure WhereHasMyDataGone (#data varchar(5)) as
2> begin
3> print 'Your data is ''' + #data + '''.';
4> end;
5> go
1> exec WhereHasMyDataGone '123456789';
2> go
Your data is '12345'.
Local variables also exhibit the same behaviour:
1> declare #s varchar(5) = '123456789';
2> print #s;
3> go
12345
Is there an option I can enable to have SQL Server report errors (or at least warnings) in such situations? Or should I just declare all local variables and stored procedure parameters as VARCHAR(MAX) or NVARCHAR(MAX)?
SQL Server has no such option. You will either have to manually check the length of strings in your stored procedure and somehow handle the longer strings or use the nvarchar(max) option. If disk space isn't an issue then the nvarchar(max) option is certainly the easiest and quickest solution.
You don't have to use nvarchar(max) just use nvarchar(length+1) [e.g. if your column length is 50 then you would set the parameter to be nvarchar(51)]. See the answer from DavidHyogo - SQL Server silently truncates varchar's in stored procedures.
I don't know of a way to make the server do it, but I've been using the SQL Server Projects feature of Visual Studio Team System Developer Edition. It includes code analysis which caught a truncation problem of mine: using an int parameter to insert into a smallint column.
Though awkward, you can, however, dynamically check for parameter length before a call, e.g.
CREATE FUNCTION MyFunction(#MyParameter varchar(10))
RETURNS int
AS
BEGIN
RETURN LEN(#MyParameter)
END
GO
DECLARE #MyValue varchar(15) = '123456789012345'
DECLARE #ParameterMaxLength int
SELECT #ParameterMaxLength = CHARACTER_MAXIMUM_LENGTH
FROM INFORMATION_SCHEMA.PARAMETERS
WHERE SPECIFIC_SCHEMA = 'dbo' AND
SPECIFIC_name = 'MyFunction' AND
PARAMETER_NAME = '#MyParameter'
IF #ParameterMaxLength <> -1 AND
LEN(#MyValue) > #ParameterMaxLength
PRINT 'It''s too looooooooooooooooooong'
I omitted the called function's database name in the query and in the reference to INFORMATION_SCHEMA.PARAMETERS to ensure that my sample would run without edits.
I don't necessarily advocate this, but I wanted to point out that the information may be available to detect imminent truncation dynamically, if in some critical situation this is needed.
You can use LEFT in SQL and specified the length that you want to insert.
for example.
CREATE TABLE Table1
(
test varchar(10)
)
insert into Table1 values (LEFT('abcdefghijklmnopqrstuvwxyz',10))
This will insert only
abcdefghij on table

Why insert-select to variable table from XML variable so slow?

I'm trying to insert some data from a XML document into a variable table. What blows my mind is that the same select-into (bulk) runs in no time while insert-select takes ages and holds SQL server process accountable for 100% CPU usage while the query executes.
I took a look at the execution plan and INDEED there's a difference. The insert-select adds an extra "Table spool" node even though it doesn't assign cost. The "Table Valued Function [XML Reader]" then gets 92%. With select-into, the two "Table Valued Function [XML Reader]" get 49% each.
Please explain "WHY is this happening" and "HOW to resolve this (elegantly)" as I can indeed bulk insert into a temporary table and then in turn insert into variable table, but that's just creepy.
I tried this on SQL 10.50.1600, 10.00.2531 with the same results
Here's a test case:
declare #xColumns xml
declare #columns table(name nvarchar(300))
if OBJECT_ID('tempdb.dbo.#columns') is not null drop table #columns
insert #columns select name from sys.all_columns
set #xColumns = (select name from #columns for xml path('columns'))
delete #columns
print 'XML data size: ' + cast(datalength(#xColumns) as varchar(30))
--raiserror('selecting', 10, 1) with nowait
--select ColumnNames.value('.', 'nvarchar(300)') name
--from #xColumns.nodes('/columns/name') T1(ColumnNames)
raiserror('selecting into #columns', 10, 1) with nowait
select ColumnNames.value('.', 'nvarchar(300)') name
into #columns
from #xColumns.nodes('/columns/name') T1(ColumnNames)
raiserror('inserting #columns', 10, 1) with nowait
insert #columns
select ColumnNames.value('.', 'nvarchar(300)') name
from #xColumns.nodes('/columns/name') T1(ColumnNames)
Thanks a bunch!!
This is a bug in SQL Server 2008.
Use
insert #columns
select ColumnNames.value('.', 'nvarchar(300)') name
from #xColumns.nodes('/columns/name') T1(ColumnNames)
OPTION (OPTIMIZE FOR ( #xColumns = NULL ))
This workaround is from an item on the Microsoft Connect Site which also mentions a hotfix for this Eager Spool / XML Reader issue is available (under traceflag 4130).
The reason for the performance regression is explained in a different connect item
The spool was introduced due to a general halloween protection logic
(that is not needed for the XQuery expressions).
Looks to be an issue specific to SQL Server 2008. When I run the code in SQL Server 2005, both inserts run quickly and produce identical execution plans that start with the fragment shown below as Plan 1. In 2008, the first insert uses Plan 1 but the second insert produces Plan 2. The remainder of both plans beyond the fragment shown are identical.
Plan 1
Plan 2

SQL Server Query Optimization: Where (Col=#Col or #Col=Null)

Not sure where to start on this one -- not sure if the problem is that I'm fooling the query optimizer, or if it's something intrinsic to the way indexes work when nulls are involved.
One coding convention I've followed is to code stored procedures like such:
declare procedure SomeProc
#ID int = null
as
select
st.ID,st.Col1,st.Col2
from
SomeTable st
where
(st.ID = #ID or #ID is null) --works, but very slow (relatively)
Not very useful in that simple test case, of course, but useful in other scenarios when you want a stored proc to act on either the entire table OR rows that meet some criteria. However, that's quite slow when used on bigger tables... roughly 3-5x slower than if I replaced the where clause with:
where
st.ID = #ID --3-5x faster than first example
I'm even more puzzled by the fact that replacing the null with -1 gives me nearly the same speed as that "fixed" WHERE clause above:
declare procedure SomeProc
#ID int = -1
as
select
st.ID,st.Col1,st.Col2
from
SomeTable st
where
(st.ID = #ID or #ID=-1) --much better... but why?
Clearly it's the null that's making things wacky but why, exactly? The answer is not clear to me from examining the execution plan. This is something I've noticed over the years on various databases, tables, and editions of SQL Server so I don't think it's a quirk of my current environment. I've resolved the issue by switching the default parameter value from null to -1; my question is why this works.
Notes
SomeTable.ID is indexed
It may be related to (or may, in fact, be) a parameter sniffing issue
Parameter Sniffing (or Spoofing) in SQL Server
For whatever it's worth, I've been
testing almost exclusively with
"exec SomeProc" after each
edit/recompile of the proc, ie, with
the optional parameter omitted.
You have a combination of issues, most likely
Parameter sniffing
OR is not a good operator to use
But without seeing the plans, these are educated guesses.
Parameter sniffing
... of the default "NULL". Try it with different defaults, say -1 or no default.
The #ID = -1 with a default of NULL and parameter sniffing = trivial check, so it's faster.
You could also try OPTIMISE FOR UNKNOWN in SQL Server 2008
The OR operator
Some ideas..
If the columns is not nullable, in most cases the optimiser ignores the condition
st.ID = ISNULL(#ID, st.ID)
Also, you can use IF statement
IF #ID IS NULL
SELECT ... FROM...
ELSE
SELECT ... FROM... WHERE st.ID
Or UNION ALL in a similar fashion.
Personally, I'd use parameter masking (always) and ISNULL in most cases (I'd try it first)
alter procedure SomeProc
#ID int = NULL
AS
declare #maskID int
select #maskID = #ID
...

Resources