DATETIME search predicate on DATETIME column much slower than string literal predicate - sql-server

I'm doing a search on a large table of about 10 million rows. I want to specify a start and end date and return all records in the table created between those dates.
It's a straight-forward query:
declare #StartDateTime datetime = '2016-06-21',
#EndDateTime datetime = '2016-06-22';
select *
FROM Archive.dbo.Order O WITH (NOLOCK)
where O.Created >= #StartDateTime
AND O.Created < #EndDateTime;
Created is a DATETIME column which has a non-clustered index.
This query took about 15 seconds to complete.
However, if I modify the query slightly, as follows, it takes only 1 second to return the same result:
declare #StartDateTime datetime = '2016-06-21',
#EndDateTime datetime = '2016-06-22';
select *
FROM Archive.dbo.Order O WITH (NOLOCK)
where O.Created >= '2016-06-21'
AND O.Created < #EndDateTime;
The only change is replacing the #StartDateTime search predicate with a string literal. Looking at the execution plan, when I used #StartDateTime it did an index scan but when I used a string literal it did an index seek and was 15 times faster.
Does anyone know why using the string literal is so much faster?
I would have thought doing a comparison between a DATETIME column and a DATETIME variable would be quicker than comparing the column to a string representation of a date. I've tried dropping and recreating the index on the Created column and it made no difference. I notice I get similar results on the production system as I do on the test system so the weird behaviour doesn't seem specific to a particular database or SQL Server instance.

All variables have instances that they are recognized.
In OOP languages, we usually distinguish between static/constant variables from temporary variables by using keywords, or when a variable is called into a function where inside that instance the variable is treated as a constant if the function transforms that variable, such like the following in C++:
void string MyFunction(string& name)
//technically, `&` calls the actual location of the variable
//instead of using a logical representation. The concept is the same.
In SQL Server, the Standard chose to implement it a bit differently. There are no constant data types, so instead we use literals which are either
object names (which have similar precedence in the call as system keywords)
names with an object deliminator (including ', [])
or strings with a deliminator CHAR(39) (').
This is the reason you noticed that the two queries produce different results, because those variables are not constants to the Optimizer, which means SQL Server will already have chosen it's execution path beforehand.
If you have SSMS installed, include the Actual Execution Plan (CTRL + M), and notice in the select statement what the Estimated Rows are. This is the highlight of the execution plan. The greater difference between the Estimated and Actual rows, the more likely your query can use optimization. In your example, SQL Server had to guess how many rows, and ended up overshooting the results, losing efficiency.
The solution is one and the same, but you can still encapsulate everything if you wanted to. We use the AdventureWorks2012 for this example:
1) Declare the Variable in the Procedure
CREATE PROC dbo.TEST1 (#NameStyle INT, #FirstName VARCHAR(50) )
AS
BEGIN
SELECT *
FROM Person.PErson
WHERE FirstName = #FirstName
AND NameStyle = #NameStyle; --namestyle is 0
END
2) Pass the variable into Dynamic SQL
CREATE PROC dbo.TEST2 (#NameStyle INT)
AS
BEGIN
DECLARE #Name NVARCHAR(50) = N'Ken';
DECLARE #String NVARCHAR(MAX)
SET #String =
N'SELECT *
FROM Person.PErson
WHERE FirstName = #Other
AND NameStyle = #NameStyle';
EXEC sp_executesql #String
, N'#Other VARCHAR(50), #NameStyle INT'
, #Other = #Name
, #NameStyle = #NameStyle
END
Both plans will produce the same results. I could have used EXEC by itself, but sp_executesql can cache the entire select statement (plus, its more SQL Injection safe)
Notice how in both cases the level of the instance allowed SQL Server to transform the variable into a constant value (meaning it entered the object with a set value), and then the Optimizer was capable of choosing the most efficient execution plan available.
-- Remove Procs
DROP PROC dbo.TEST1
DROP PROC dbo.TEST2
A great article was highlighted in the comment section of the OP, but you can see it here: Optimizing Variables and Parameters - SQLMAG

Related

Perform SQL query parallel procesing on table with n sql statements?

I am running a cursor to generate automatically SQL statements to search a DB for a list of specific values. This will generate, for example, 180 queries stored in a SQL_QueryTable. Secondly, as seen below, I will use a cursor to fetch each statement from the SQL_QueryTable, execute the statement against a table with 150 million records, and ultimately store the results into a result table.
However, this works, but it takes a very long time.
Looking for a suggestion to improve running time.
DECLARE #SQLQuery nvarchar(max)
DECLARE #Counter int = 1
DECLARE #TrackerID nvarchar(max)
DECLARE SQLQuery CURSOR
FOR SELECT SQL_Query, TrackerID FROM SQL_QueryTable
OPEN SQLQuery
FETCH NEXT FROM SQLQuery INTO #SQLQuery, #TrackerID
WHILE ##FETCH_STATUS = 0
BEGIN
PRINT #SQLQuery
PRINT #Counter
Insert Into Table_1 (column1, column2, column3, column4)
Exec(#SQLQuery) --
Update Table_1
Set TrackerID = #TrackerID
where TrackerID is null
SET #Counter = #Counter + 1
FETCH NEXT FROM SQLQuery INTO #SQLQuery, #TrackerID
END
Close SQLQuery
Deallocate SQLQuery
Regardless of any standard optimization methods (indexes, optimized SELECT statement...) you query the table one time per single data you need to find. This multiplies the effort. You need a different approach here to speed things up.
A simple upgrade could be to search one time for a SET of variables. If for example your operator is (=) this is simple. Store the search variable data into a new assistant table and then build one statement for all stored variables using IN or INNER JOIN to get the filtered data. The more values you search for that way, the better the performance. If your values are stored on different columns then you can repeat for each column, because using OR will degrade performance significantly.
The operators will somewhat or hardly complicate things. It depends on your custom searches. The equal operator (=) is easy to handle. Range operators (lesser, greater, between) define a set on their own and are harder to combine. Same for LIKE operator plus the extra effort to search strings.
Concluding, if you search using (=) you can group the searches even if those are split on different columns. If you do not use other operators you are fine. If you use at low numbers, use the grouping for the majority and leave those complex searches the way they work now.

Dynamic SQL Statement is too long

I am using quite a long dynamic SQL Statement (a bit more than 13000 characters) but when I am trying to execute it, I am noticing that the exec isn't reading the statement completly and cuts the last part off.
I am using:
DECLARE #Statement nvarchar(max)
SET #Statement = N'[LONG STATEMENT]'
EXEC (#Statement)
I did notice, that it could read even less characters, if I am not using the the brackets in EXEC (#Statement)
I also tried using EXEC sp_executesql #Statement
It just stops reading the statement after 12482 characters...
I have the problems with SQL-Server 2008 R2 and SQL Server 2014
EDIT: OK, now I noticed something different. It seems, that the lenght itself of the statement is not exactly the problem. As I mentioned in a comment below, I am using this long dynamic sql statement, because I am creating an update script, which adds a new stored procedure and within this procedure I am using table names, which can differ. So I created variables, which contain the table names and used these variables with the dynamic sql statement, so I don't need to change the table names within the procedures and functions I am adding with this update script, but just changing the content of the variables.
However, if I am NOT using these variables and use the table names "hardcoded" in the statement, the statement can be executed successfully...
I guess the answer is here:
So I created variables, which contain the table names and used these
variables with the dynamic sql statement, so I don't need to change
the table names within the procedures and functions I am adding with
this update script, but just changing the content of the variables.
I guess, your dynamic T-SQL statement is built using string concatenation. So, let's say we have something like this:
DECLARE #DynamicSQLSTatement NVARCHAR(MAX);
DECLARE #TableName01 NVARCHAR(128) = 'T01';
DECLARE #TableName02 NVARCHAR(128) = 'T02';
DECLARE #TSQL NVARCHAR(4000) = REPLICATE(N'X', 4000);
SET #DynamicSQLSTatement = #TableName01 + #TSQL + #TableName02;
We have three short strings (length < max) and when they are concatenated, we expect that the new NVARCHAR(MAX) value will be capable of storing the whole new string (it is with max length, after all).
So, the following statement will give as T02, right?
SELECT RIGHT(#DynamicSQLSTatement, 3);
But no, the output is XXX. So, the question is why the whole concatenation text is not preserved?
When you are concatenating nvarchar(1-4000) strings they output string is not converted to max if it is not possible to store all the data.
In order to fix this, we can cast the first part of the string to nvarchar(max):
SET #DynamicSQLSTatement = CAST(#TableName01 AS NVARCHAR(MAX)) + #TSQL + #TableName02
SELECT RIGHT(#DynamicSQLSTatement, 3);
I'd imagine your hitting a limit of about 16K (8K used to be the most data you could hold in a variable before nvarchar(max) etc was invented. You could try using varchar(max) instead of nvarchar as that will be half the size (unicode is 16-bit, ascii is 8-bit). I have a feeling that won't help much though.
Really though, you're hitting this issue because whatever you're trying to do shouldn't really be done like that, super long SQL statements are a sign that you've gone down the wrong path. Find a way to break down your query or operations. If this is a query, consider if you could restrict your data set over a few steps rather than in one query.

TVF is much slower when using parameterized query

I am trying to run an inline TVF as a raw parameterized SQL query.
When I run the following query in SSMS, it takes 2-3 seconds
select * from dbo.history('2/1/15','1/1/15','1/31/15',2,2021,default)
I was able to capture the following query through SQL profiler (parameterized, as generated by Entity framework) and run it in SSMS.
exec sp_executesql N'select * from dbo.history(#First,#DatedStart,#DatedEnd,#Number,#Year,default)',N'#First date,#DatedStart date,#DatedEnd date,#Maturity int,#Number decimal(10,5)',#First='2015-02-01',#DatedStart='2015-01-01',#DatedEnd='2015-01-31',#Year=2021,#Number=2
Running the above query in SSMS takes 1:08 which is around 30x longer than the non parameterized version.
I have tried adding option(recompile) to the end of the parameterized query, but it did absolutely nothing as far as performance. This is clearly an indexing issue to me, but I have no idea how to resolve it.
When looking at the execution plan, it appears that the parameterized version mostly gets mostly hung up on an Eager Spool (46%) and then a Clustered Index scan (30%) which are not present in the execution plan without parameters.
Perhaps there is something I am missing, can someone please point me in the right direction as to how I can get this parameterized query to work properly?
EDIT: Parameterized query execution plan, non-parameterized plan
Maybe it's a parameter sniffing problem.
Try modifying your function so that the parameters are set to local variables, and use the local vars in your SQL instead of the parameters.
So your function would have this structure
CREATE FUNCTION history(
#First Date,
#DatedStart Date,
#DatedEnd Date,
#Maturity int,
#Number decimal(10,5))
RETURNS #table TABLE (
--tabledef
)
AS
BEGIN
Declare #FirstVar Date = #First
Declare #DatedStartVar Date = #DatedStart
Declare #DatedEndVar Date = #DatedEnd
Declare #MaturityVar int = #Maturity
Declare #NumberVar decimal(10,5) = #Number
--SQL Statement which uses the local 'Var' variables and not the parameters
RETURN;
END
;
I've had similar probs in the past where this has been the culprit, and mapping to local variables stops SQL Server from coming up with a dud execution plan.

SQL Where clause Coalese vs ISNULL VS Dynamic

I have a question for best use when creating where clause in a SQL Procedure.
I have written a query three different ways one using Coalesce in where clause, one using a isnull or statement, and one which is dynamic using sp_executesql.
Coalesce:
WHERE ClientID = COALESCE(#Client, ClientID) AND
AccessPersonID = COALESCE(#AccessPerson, AccessPersonID)
IsNull Or:
WHERE (#Client IS NULL OR #Client = ClientID)
AND (#AccessPerson IS NULL OR #AccessPerson= AccessPersonID)
and dynamically:
SET #sql = #sql + Char(13) + Char(10) + N'WHERE 1 = 1';
IF #Client <> 0
BEGIN
SET #sql = #sql + Char(13) + Char(10) + N' AND ClientID = #Client '
END
IF #AccessPerson <> 0
BEGIN
SET #sql = #sql + Char(13) + Char(10) + N' AND AccessPersonID = #AccessPerson '
END
When I use SQL Sentry Plan Explorer the results show for the estimated that the Coalesce is the best but the the lest accurate between estimated and actual. Where the dynamic has the worst estimated but it is 100% accurate to the actual.
This is a very simple procedure I am just trying to figure out what is the bes way to write procedures like this. I would thin the dynamic is the way to go since it is the most accurate.
The correct answer is the 'dynamic' option. It's good you left parameters in because it protects against SQL Injection (at this layer anyway).
The reason 'dynamic' is the best is because it will create a query plan that is best for the given query. With your example you might get up to 3 plans for this query, depending on which parameters are > 0, but each plan generated one will be optimized for that scenario (they will leave out unnecessary parameter comparisons).
The other two styles will generate one plan (each), and it will only be optimized for the parameters you used AT THAT TIME ONLY. Each subsequent execution will use the old plan and might be cached using the parameter you are not calling with.
'Dynamic' is not as clean-code as the other two options, but for performance, it will give you the optimal query plan each time.
And the dynamic SQL operates in a different scope than your sproc will, so even though you declare a variable in your sproc, you'll have to redeclare it in your dynamic SQL. Or concat it into the statement. But then you should also do NULL checks in your dynamic SQL AND in your sproc, because NULL isn't equal to 0 nor is it not equal to 0. You can't compare it because it doesn't exist. :-S
DECLARE #Client int = 1
, #AccessPerson int = NULL
;
DECLARE #sql nvarchar(2000) = N'SELECT * FROM ##TestClientID WHERE 1=1'
;
IF #Client <> 0
BEGIN
SET #sql = CONCAT(#sql, N' AND ClientID = ', CONVERT(nvarchar(10), #Client))
END
;
IF #AccessPerson <> 0
BEGIN
SET #sql = CONCAT(#sql, N' AND AccessPersonID =', CONVERT(nvarchar(10), #AccessPerson))
END
;
PRINT #sql
EXEC sp_ExecuteSQL #sql
Note: For demo purposes, I also had to modify my temp table above and make it a global temp instead of a local temp, since I'm calling it from dynamic SQL. It exists in a different scope. Don't forget to clean it up after you're done. :-)
Your top two statements don't do quite the same things if either value is NULL.
http://sqlfiddle.com/#!9/d0aa3/4
IF OBJECT_ID (N'tempdb..#TestClientID', N'U') IS NOT NULL
DROP TABLE #TestClientID;
GO
CREATE TABLE #TestClientID ( ClientID int , AccessPersonID int )
INSERT INTO #TestClientID (ClientID, AccessPersonID)
SELECT 1,1 UNION ALL
SELECT NULL,1 UNION ALL
SELECT 1,NULL UNION ALL
SELECT 0,0
DECLARE #ClientID int = NULL
DECLARE #AccessPersonID int = 1
SELECT * FROM #TestClientID
WHERE ClientID = COALESCE(#ClientID, ClientID)
AND AccessPersonID = COALESCE(#AccessPersonID, AccessPersonID)
SELECT * FROM #TestClientID
WHERE (#ClientID IS NULL OR #ClientID = ClientID)
AND (#AccessPersonID IS NULL OR #AccessPersonID = AccessPersonID)
That said, if you're looking to eliminate a NULL input value, then use the COALESCE(). NULLs can get weird when doing comparisons. COALESCE(a,b) is more akin to MS SQL's ISNULL(a,b). In other words, if a IS NULL, use b.
And again, it really all depends on what you're ultimately trying to do. sp_ExecuteSQL is MS-centric, so if you don't plan to port this to any other database, you can use that. But honestly, in 15 years I've probably ported an application from one db to another fewer than a dozen times. It's more important if you're writing an application that will be used by other people who will install it on different systems, but if it's an enclosed system, the benefits of the database you're using usually outweigh the lack of portability.
I probably should have included One more section of the query
For the ISNULL and the COALESCE I am converting a value of 0 to null where in the dynamic I am leaving the value as 0 for the if clause. That is why the look a bit different.
From what I have been seeing the the COALESCE seems to be the consistently the worst performing.
Surprisingly from what I have tested the ISNULL and dynamic are very similar with the ISNULL version being slightly better in most cases.
In most cases it has reviled indexes that needed to be add and in most cases the indexes improved the queries the most but after thet have been added the ISNULL and Dynamic still perform better than the COALESCE.
Also I can not see us switching from MSSQL in the near or distant future.

Sql server query with lots of substrings performance

I have stored procedure in which i perform a bulk insert into a temp table and perform substring on its field to get the different row required for the main table.
The no. of columns for the main table is 66 and the row added after each run of the sp is approx 5500.
Code for bulk insert part:
CREATE TABLE [dbo].[#TestData] (
logdate DATETIME,
id CHAR(15),
value VARCHAR(max)
)
BEGIN TRANSACTION
DECLARE #sql VARCHAR(max)
SET #sql = 'BULK INSERT [dbo].[#TestData] FROM ''' + #pfile + ''' WITH (
firstrow = 2,
fieldterminator = ''\t'',
rowterminator = ''\n''
)'
EXEC(#sql)
IF (##ERROR <> 0)
BEGIN
ROLLBACK TRANSACTION
RETURN 1
END
COMMIT TRANSACTION
Code for substring part :
CASE
WHEN (PATINDEX('%status="%', value) > 0)
THEN (nullif(SUBSTRING(value, (PATINDEX('%status="%', value) + 8), (CHARINDEX('"', value, (PATINDEX('%status="%', value) + 8)) - (PATINDEX('%status="%', value) + 8))), ''))
ELSE NULL
END,
This substring code is used in insert into and is similar for all the 66 columns.
It takes around 20-25 sec for the sp to execute. I have tried indexing on temp table,droped foreign keys,droped all indexes,droped primary key but still it takes the same time.
So my question is can the performance be improved?
Edit: The application for interface is visual foxpro 6.0.
As sql server is slow with string manipulation and doing all the string manipulations on foxpro now. New to foxpro Any suggestions how to send null from foxpro to sqlserver?
Never worked with null in foxpro 6.0.
Since you are not really leveraging the features of PATINDEX() here you may want to examine the use of CHARINDEX() instead, which, despite its name, operates on strings and not only on characters. CHARINDEX() may prove to be faster than PATINDEX() since it is a somewhat simpler function.
Indexes won't help you with those string operations because you're not searching for prefixes of strings.
You should definitely look into options to avoid the excessive use of PATINDEX() or CHARINDEX() inside the statement; there are up to 4(!) invocations thereof in your CASE for each of your 66 columns in every record processed.
For this you may want to split the string operations into multiple statements to pre-compute the values for the start and end index of the substring of interest, like
UPDATE temptable
SET value_start_index = CHARINDEX('status="', value) + 8
UPDATE temptable
SET value_end_index = CHARINDEX('"', value, value_start_index)
WHERE value_start_index >= 8
UPDATE temptable
SET value_str = SUBSTRING(value, value_start_index, value_end_index - value_start_index)
WHERE value_end_index IS NOT NULL
SQL Server is rather slow in dealing with strings. For this number of executions it would be best to use SQL CLR user defined function. There is not much more you can do beyond that.

Resources