SQL Where clause Coalese vs ISNULL VS Dynamic - sql-server

I have a question for best use when creating where clause in a SQL Procedure.
I have written a query three different ways one using Coalesce in where clause, one using a isnull or statement, and one which is dynamic using sp_executesql.
Coalesce:
WHERE ClientID = COALESCE(#Client, ClientID) AND
AccessPersonID = COALESCE(#AccessPerson, AccessPersonID)
IsNull Or:
WHERE (#Client IS NULL OR #Client = ClientID)
AND (#AccessPerson IS NULL OR #AccessPerson= AccessPersonID)
and dynamically:
SET #sql = #sql + Char(13) + Char(10) + N'WHERE 1 = 1';
IF #Client <> 0
BEGIN
SET #sql = #sql + Char(13) + Char(10) + N' AND ClientID = #Client '
END
IF #AccessPerson <> 0
BEGIN
SET #sql = #sql + Char(13) + Char(10) + N' AND AccessPersonID = #AccessPerson '
END
When I use SQL Sentry Plan Explorer the results show for the estimated that the Coalesce is the best but the the lest accurate between estimated and actual. Where the dynamic has the worst estimated but it is 100% accurate to the actual.
This is a very simple procedure I am just trying to figure out what is the bes way to write procedures like this. I would thin the dynamic is the way to go since it is the most accurate.

The correct answer is the 'dynamic' option. It's good you left parameters in because it protects against SQL Injection (at this layer anyway).
The reason 'dynamic' is the best is because it will create a query plan that is best for the given query. With your example you might get up to 3 plans for this query, depending on which parameters are > 0, but each plan generated one will be optimized for that scenario (they will leave out unnecessary parameter comparisons).
The other two styles will generate one plan (each), and it will only be optimized for the parameters you used AT THAT TIME ONLY. Each subsequent execution will use the old plan and might be cached using the parameter you are not calling with.
'Dynamic' is not as clean-code as the other two options, but for performance, it will give you the optimal query plan each time.

And the dynamic SQL operates in a different scope than your sproc will, so even though you declare a variable in your sproc, you'll have to redeclare it in your dynamic SQL. Or concat it into the statement. But then you should also do NULL checks in your dynamic SQL AND in your sproc, because NULL isn't equal to 0 nor is it not equal to 0. You can't compare it because it doesn't exist. :-S
DECLARE #Client int = 1
, #AccessPerson int = NULL
;
DECLARE #sql nvarchar(2000) = N'SELECT * FROM ##TestClientID WHERE 1=1'
;
IF #Client <> 0
BEGIN
SET #sql = CONCAT(#sql, N' AND ClientID = ', CONVERT(nvarchar(10), #Client))
END
;
IF #AccessPerson <> 0
BEGIN
SET #sql = CONCAT(#sql, N' AND AccessPersonID =', CONVERT(nvarchar(10), #AccessPerson))
END
;
PRINT #sql
EXEC sp_ExecuteSQL #sql
Note: For demo purposes, I also had to modify my temp table above and make it a global temp instead of a local temp, since I'm calling it from dynamic SQL. It exists in a different scope. Don't forget to clean it up after you're done. :-)

Your top two statements don't do quite the same things if either value is NULL.
http://sqlfiddle.com/#!9/d0aa3/4
IF OBJECT_ID (N'tempdb..#TestClientID', N'U') IS NOT NULL
DROP TABLE #TestClientID;
GO
CREATE TABLE #TestClientID ( ClientID int , AccessPersonID int )
INSERT INTO #TestClientID (ClientID, AccessPersonID)
SELECT 1,1 UNION ALL
SELECT NULL,1 UNION ALL
SELECT 1,NULL UNION ALL
SELECT 0,0
DECLARE #ClientID int = NULL
DECLARE #AccessPersonID int = 1
SELECT * FROM #TestClientID
WHERE ClientID = COALESCE(#ClientID, ClientID)
AND AccessPersonID = COALESCE(#AccessPersonID, AccessPersonID)
SELECT * FROM #TestClientID
WHERE (#ClientID IS NULL OR #ClientID = ClientID)
AND (#AccessPersonID IS NULL OR #AccessPersonID = AccessPersonID)
That said, if you're looking to eliminate a NULL input value, then use the COALESCE(). NULLs can get weird when doing comparisons. COALESCE(a,b) is more akin to MS SQL's ISNULL(a,b). In other words, if a IS NULL, use b.
And again, it really all depends on what you're ultimately trying to do. sp_ExecuteSQL is MS-centric, so if you don't plan to port this to any other database, you can use that. But honestly, in 15 years I've probably ported an application from one db to another fewer than a dozen times. It's more important if you're writing an application that will be used by other people who will install it on different systems, but if it's an enclosed system, the benefits of the database you're using usually outweigh the lack of portability.

I probably should have included One more section of the query
For the ISNULL and the COALESCE I am converting a value of 0 to null where in the dynamic I am leaving the value as 0 for the if clause. That is why the look a bit different.
From what I have been seeing the the COALESCE seems to be the consistently the worst performing.
Surprisingly from what I have tested the ISNULL and dynamic are very similar with the ISNULL version being slightly better in most cases.
In most cases it has reviled indexes that needed to be add and in most cases the indexes improved the queries the most but after thet have been added the ISNULL and Dynamic still perform better than the COALESCE.
Also I can not see us switching from MSSQL in the near or distant future.

Related

How to run a sql query for each entry by the user?

So the user enters the policy number in the form: 2000, 2001, 2002
I need to run a query for each of those 3 policy numbers. I am not sure how to do it.
This is the code I have right now. I was thinking about some sort of string manipulation and then use loop, but I am not sure how to do that. Can anyone please help me?
declare #sql1 varchar(1000)
declare #policy varchar(1000)
set #policy = '2000, 2001, 2002'
--THIS IS WHERE i NEED HELP???
set #policy = replace(#policy, ' ', '')
set #policy = '''' + replace(#policy, ',', ''',''') + ''''
print (#policy)
if #policy <> 'null'
set #sql1 =
(SELECT top 1
[MIL]
FROM
[DataManagement].[dbo].[lookuptable] where [policy] = #policy group by [MIL] )
exec (#sql1)
print(#sql1)
Options:
Are you sure you actually need one results set per item in #policy and not a single result set that matches any of the IDs? An IN query would let you get that result - sadly you can't do WHERE IN (#List), but it can be concatenated into a dynamic query (SQL injection concerns aside - make sure of your data source, but that applies to anything that's taking input like you seem to be.
If you declare #Policy as a user-defined table type variable, then you can pass that in as a list of IDs and simply loop through it. (More info - http://blog.sqlauthority.com/2008/08/31/sql-server-table-valued-parameters-in-sql-server-2008/, https://www.brentozar.com/archive/2014/02/using-sql-servers-table-valued-parameters/)
There's no shortage of examples online of code to split a delimited string list into a table of values. That would let you do the same as I've suggested in point 2.
The FOR XML PATH('') trick can be used to take a table of results and flatten it into a single variable. If you join against your values table and build the SELECT query from your question as the result, you can then do a single EXEC with no need for a loop at all.
Depends what you want for your environment really. I'd use option 2 to split the string, then build a combined single query using FOR XML PATH and execute that. But for some environments the table-valued parameter and loop approach would definitely be superior.

What is the advantage of using #ParmDefinition in sp_executesql

DECLARE #id int
DECLARE #name nvarchar(20)
SET #id = 5
SET #name = 'Paul'
What is the difference between these two options:
Set #SQLQueryInnen = 'SELECT * FROM someTable WHERE ID = ' + #id + ' AND NAME = ''' + #name + ''''
Execute sp_Executesql #SQLQueryInnen
and
Set #SQLQueryInnen = 'SELECT * FROM someTable WHERE ID = #id AND NAME = #name'
Set #ParmDefinition = '#id int, #name nvarchar(20)'
Execute sp_Executesql #SQLQueryInnen, #ParmDefinition, #id
So far I only see the overhad for declaring the data type of #id and #name twice, when using #ParmDefinition. On the other hand, the "string-building" seems a bit easier with #ParamDefinition.
First case is SQL injection prone and a security risk. The discussion stops here.
You avoid having stringly-typed code - where you have to convert everything into a string so that you can shove it into the #SQLQueryInnen parameter, and then introduce issues because you have to work out how to safely and unambiguously perform the conversions to and from the strings back into the correct original data types.
For ints, the conversion issues aren't very apparent. But if you look at the number of issues people report (here, and on other forums) where they have issues converting between datetimes and strings, you'll realise that it does cause real issues. Best to keep the data as its natural type throughout.
I see no one mentioned one of the most important things. When you're using a parameterized query, your execution plans are cached.
Your query is:
SELECT *
FROM someTable
WHERE ID = #id
AND NAME = #name;
Its execution plan will be stored in memory and reused each time you query it (which is a great benefit). Meanwhile, if you're generating your code using string concatenation like that:
Set #SQLQueryInnen = 'SELECT * FROM someTable WHERE ID = ' + #id + ' AND NAME = ''' + #name + ''''
Execute sp_Executesql #SQLQueryInnen
Your code will generate an execution plan for each parameter combination (unless it's repeating) and the cached plan will not be reused. Imagine that you're passing #Id = 1 and #Name = 'Paul', Your generated query will look like:
SELECT *
FROM someTable
WHERE ID = 5
AND NAME = 'Paul';
If you change your name to 'Rob', your generated query will look like and SQL Server will have to create a new plan for it:
SELECT *
FROM someTable
WHERE ID = 5
AND NAME = 'Rob';
Meaning plans won't be reused. I hope it helps.
This is an article explaining this in a bit more detail: EXEC vs. sp_executeSQL (Don't rely on the article title, it explains the exact differences you asked in your question). Quote from it:
The TSQL string is built only one time, after that every time same
query is called with sp_executesql, SQL Server retrieves the query
plan from the cache and reuse it

Which column is being truncated? [duplicate]

The year is 2010.
SQL Server licenses are not cheap.
And yet, this error still does not indicate the row or the column or the value that produced the problem. Hell, it can't even tell you whether it was "string" or "binary" data.
Am I missing something?
A quick-and-dirty way of fixing these is to select the rows into a new physical table like so:
SELECT * INTO dbo.MyNewTable FROM <the rest of the offending query goes here>
...and then compare the schema of this table to the schema of the table into which the INSERT was previously going - and look for the larger column(s).
I realize that this is an old one. Here's a small piece of code that I use that helps.
What this does, is returns a table of the max lengths in the table you're trying to select from. You can then compare the field lengths to the max returned for each column and figure out which ones are causing the issue. Then it's just a simple query to clean up the data or exclude it.
DECLARE #col NVARCHAR(50)
DECLARE #sql NVARCHAR(MAX);
CREATE TABLE ##temp (colname nvarchar(50), maxVal int)
DECLARE oloop CURSOR FOR
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'SOURCETABLENAME' AND TABLE_SCHEMA='dbo'
OPEN oLoop
FETCH NEXT FROM oloop INTO #col;
WHILE (##FETCH_STATUS = 0)
BEGIN
SET #sql = '
DECLARE #val INT;
SELECT #val = MAX(LEN(' + #col + ')) FROM dbo.SOURCETABLENAME;
INSERT INTO ##temp
( colname, maxVal )
VALUES ( N''' + #col + ''', -- colname - nvarchar(50)
#val -- maxVal - int
)';
EXEC(#sql);
FETCH NEXT FROM oloop INTO #col;
END
CLOSE oloop;
DEALLOCATE oloop
SELECT * FROM ##temp
DROP TABLE ##temp;
Another way here is to use binary search.
Comment half of the columns in your code and try again. If the error persists, comment out another half of that half and try again. You will narrow down your search to just two columns in the end.
You could check the length of each inserted value with an if condition, and if the value needs more width than the current column width, truncate the value and throw a custom error.
That should work if you just need to identify which is the field causing the problem. I don't know if there's any better way to do this though.
Recommend you vote for the enhancement request on Microsoft's site. It's been active for 6 years now so who knows if Microsoft will ever do anything about it, but at least you can be a squeaky wheel: Microsoft Connect
For string truncation, I came up with the following solution to find the max lengths of all of the columns:
1) Select all of the data to a temporary table (supply column names where needed), e.g.
SELECT col1
,col2
,col3_4 = col3 + '-' + col4
INTO #temp;
2) Run the following SQL Statement in the same connection (adjust the temporary table name if needed):
DECLARE #table VARCHAR(MAX) = '#temp'; -- change this to your temp table name
DECLARE #select VARCHAR(MAX) = '';
DECLARE #prefix VARCHAR(256) = 'MAX(LEN(';
DECLARE #suffix VARCHAR(256) = ')) AS max_';
DECLARE #nl CHAR(2) = CHAR(13) + CHAR(10);
SELECT #select = #select + #prefix + name + #suffix + name + #nl + ','
FROM tempdb.sys.columns
WHERE object_id = object_id('tempdb..' + #table);
SELECT #select = 'SELECT ' + #select + '0' + #nl + 'FROM ' + #table
EXEC(#select);
It will return a result set with the column names prefixed with 'max_' and show the max length of each column.
Once you identify the faulty column you can run other select statements to find extra long rows and adjust your code/data as needed.
I can't think of a good way really.
I once spent a lot of time debugging a very informative "Division by zero" message.
Usually you comment out various pieces of output code to find the one causing problems.
Then you take this piece you found and make it return a value that indicates there's a problem instead of the actual value (in your case, should be replacing the string output with the len(of the output)). Then manually compare to the lenght of the column you're inserting it into.
from the line number in the error message, you should be able to identify the insert query that is causing the error. modify that into a select query to include AND LEN(your_expression_or_column_here) > CONSTANT_COL_INT_LEN for the string various columns in your query. look at the output and it will give your the bad rows.
Technically, there isn't a row to point to because SQL didn't write the data to the table. I typically just capture the trace, run it Query Analyzer (unless the problem is already obvious from the trace, which it may be in this case), and quickly debug from there with the ages old "modify my UPDATE to a SELECT" method. Doesn't it really just break down to one of two things:
a) Your column definition is wrong, and the width needs to be changed
b) Your column definition is right, and the app needs to be more defensive
?
The best thing that worked for me was to put the rows first into a temporary table using select .... into #temptable
Then I took the max length of each column in that temp table. eg. select max(len(jobid)) as Jobid, ....
and then compared that to the source table field definition.

SQL Injection in Code/Static SQL (T-SQL)

Are parametrized static/code SQL statements subject to SQL injection attacks?
For example, let's say I have the following simplified stored procedure:
Does the fact that I am passing the input #PSeries_desc mean I am subject to injection attacks if it is parameterized?
Previously, this was a dynamic SQL statement and the code was executed using exec as opposed to sp_executesql So, it definitely was open to attacks.
CREATE procedure get_product_by_title
#PSearchType int = NULL
, #Pseries_desc varchar(40) = NULL
as
begin
declare
#whereLikeBeg varchar(1)
, #whereLikeEnd varchar(1)
set #whereLikeBeg = ''
set #whereLikeEnd = ''
if #search_code = 'contains'
begin
set #whereLikeBeg = '%'
set #whereLikeEnd = '%'
end
if #search_code = 'starts_with'
begin
set #whereLikeEnd = '%'
end
select
distinct B.parent_product_id
, B.parent_product_id
from
tableA
where
parent_product_id = child_product_id
and product_title like #whereLikeBeg + #Pseries_desc + #whereLikeEnd
end
This code look safe to me...
Parametrized query is not the only way to protect yourself from SQL-injection attacks but it's probably the simplest and safest way to do it.
And even if you forget about the sql-injection attacks, building query dynamically is not a good practice, especially when you are working with strings because they might contains SQL reserved words / characters that will have an impact on your query.
If you are using parameterized queries in the access code, you don't need to worry. Checking for it inside the stored procedure is improper.

Conditional select query

I have a large SQL query with multiple statements and UNION ALL. I am doing something like this now:
DECLARE #condition BIT;
SET #condition = 0;
SELECT * FROM table1
WHERE #condition = 1;
UNION ALL
SELECT * FROM table2
In this case, table1 won't return any results. However, that query is complex with many joins (such as FullTextTable). The execution plan's estimate shows a high cost, but the actual number of rows and time to execute seems to show otherwise. Is this the most efficient way of filtering a whole query, or is there a better way? I don't want anything in the first select to run, if possible.
I would imagine that your eventual SQL query with all of the unions and conditions that depend on pre-calculated values gets pretty complicated. If you're interested in reducing the complexity of the query (not to the machine but for maintenance purposes), I would go with a moving the individual queries into views or table valued functions to move that logic elsewhere. Then you can use the if #condition = 1 syntax that has been suggested elsewhere.
The best way to solve this is by using Dynamic SQL. The problem with DForck's solutions is that it may lead to parameter sniffing. Just to give a rough idea, your query might look something like this
DECLARE #query VARCHAR(MAX);
IF (#condition = 0)
SET #query = 'SELECT * FROM table1
UNION ALL '
SET #query = #query + 'SELECT * FROM table2'
sp_executesql #query
This is just a simplified case, but in actual implementation you would parameterize the dynamic query which will solve the problem of parameter sniffing. Here is an excellent explanation about this problem Parameter Sniffing (or Spoofing) in SQL Server
i think you might be better off with this:
if (#condition=1)
begin
select * from table1
union all
select * from table2
end
else
begin
select * from table2
end

Resources