Is there any way to optimize this query? - sql-server

I need to optimize the following query:
IF object_id('tempdb..#TAB001') IS NOT NULL
DROP TABLE #TAB001;
select *
into #TAB001
from dbo.uvw_TAB001
where 1 = 1
and isnull(COD_CUSTOMER,'') = isnull(#cod_customer,isnull(COD_CUSTOMER,''))
and isnull(TAXCODE,'') = isnull(#taxcode, isnull(TAXCODE,''))
and isnull(SURNAME,'') = isnull(#surname,isnull(SURNAME,''))
and isnull(VATCODE,'') = isnull(#vatCode,isnull(VATCODE,''))
The goal is to improve the performance of this query.
It is currently quite fast but I would like to speed it up even more.
This query has the optional parameters for which it is necessary to make a query that regardless of whether all or 1 parameter is set, returns results in the shortest possible time.

What you have here is known as a "catch-all" or "kitchen sink" query, which need a little helping hand sometimes.
Firstly, you need to get rid of those ISNULLs; they are making the query non-SARGable. Also, I would suggest getting rid of the SELECT * and limiting the query to the columns you need.
Then, finally, we can add OPTION (RECOMPILE) to the query; why is discussed in the articles I linked above. This gives you the following:
SELECT * --Replace with Column Names
INTO #TAB001 --Do you actually need to do this?
FROM dbo.uvw_TAB001
--Removed WHERE 1 = 1 as it's always true, thus pointless
WHERE (COD_CUSTOMER = #cod_customer OR #cod_customer IS NULL)
AND (TAXCODE = #taxcode OR #taxcode IS NULL)
AND (SURNAME = #surname OR #surname IS NULL)
AND (VATCODE = #vatCode OR #vatCode IS NULL)
OPTION (RECOMPILE);
Note I am assuming that when a variable (for example #cod_customer) has the value NULL you mean that the variable should be "ignored" and not matched against NULL.
If you actually want {Column} = #{Variable} including NULL then use SQL with the format below instead:
({Column} = #{Variable} OR ({Column} IS NULL AND #{Variable} IS NULL))

Related

SQL: how to do an IF check for data range parameters

I am writing a stored procedure that takes in 4 parameters: confirmation_number, payment_amount, start_range, end_range.
The parameters are optional, so I am doing a check in this fashion for the confirmation_number, and the payment_amount parameters:
IF (#s_Confirmation_Number IS NOT NULL)
SET #SQL = #SQL + ' AND pd.TransactionNumber = #s_Confirmation_Number'
IF (#d_Payment_Amount IS NOT NULL)
SET #SQL = #SQL + ' AND pd.PaymentAmount = #d_Payment_Amount'
I would like to ask for help because I am not sure what is the best method to check for the date range parameters.
If someone could give me en example, or several on how this is best achieved it would be great.
Thank you in advance.
UPDATE - after receiving some great help -.
This is what I have so far, I am following scsimon recommendation, but I am not sure about the dates, I got the idea from another post I found and some playing around with it. Would you care looking at it and tell me what you all think?
Many thanks.
#s_Confirmation_Number NVARCHAR(50) = NULL
, #d_Payment_Amount DECIMAL(18, 2) = NULL
, #d_Start_Range DATE = NULL
, #d_End_Range DATE = NULL
...
....
WHERE
ph.SourceType = #s_Source_Type
AND ((pd.TransConfirmID = #s_Confirmation_Number) OR #s_Confirmation_Number IS NULL)
AND ((pd.PaymentAmount = #d_Payment_Amount) OR #d_Payment_Amount IS NULL)
AND (((NULLIF(#d_Start_Range, '') IS NULL) OR CAST(pd.CreatedDate AS DATE) >= #d_Start_Range)
AND ((NULLIF(#d_End_Range, '') IS NULL) OR CAST(pd.CreatedDate AS DATE) <= #d_End_Range))
(The parameter sourceType is a hard-coded value)
This is called a catch all or kitchen sink query. It is usually written as such:
create procedure myProc
(#Payment_Amount int = null
,#Confirmation_Number = null
,#start_range datetime
,#end_range datetime)
as
select ...
from ...
where
(pd.TransactionNumber = #Confirmation_Number or #Confirmation_Number is null)
and (pd.PaymentAmount = #Payment_Amount or #Payment_Amount is null)
The NULL on the two parameters gives them a default of NULL and makes them "optional". The WHERE clause evaluates this to only return rows where your user input matches the column value, or all rows when no user input was supplied (i.e. parameter IS NULL). You can use this with the date parameters as well. Just pay close attention to your parentheses. They matter a lot here because we are mixing and and or logic.
Aaron Bertrand has blogged extensively on this.
I do it like this
WHERE
COALESCE(#s_Confirmation_Number,pd.TransactionNumber) = pd.TransactionNumber AND
COALESCE(#d_Payment_Amount,pd.PaymentAmount) = pd.PaymentAmount
If we have a value for each of these parameters then it will check against the filter value otherwise it will always match the filter value if the parameter is null.
I've found that using COALESCE is faster and clearer than IF control statements or using OR in the WHERE clause.
There is another way.
But I tested and realized that a scsimon query is faster than mine.
AND (CASE
WHEN #Confirmation_Number is not null
THEN (CASE
WHEN pd.TransactionNumber = #Confirmation_Number
THEN 1
ELSE 0
END)
ELSE 1
END = 1)

Query with integers not working

I've been searching here on stackoverflow and other sources but not found a solution to this
The query below works as expected expect for when either custinfo.cust_cntct_id or custinfo.cust_corrcntct_id = '' (blank not NULL) then I get no results. Both are integer fields and if both have an integer value then I get results. I still want a value returned for either cntct_email or corrcntct_email even if custinfo.cust_cntct_id or custinfo.cust_corrcntct_id = blank
Can someone help me out in making this work? The database is PostgreSQL.
SELECT
cntct.cntct_email AS cntct_email,
corrcntct.cntct_email AS corrcntct_email
FROM
public.custinfo,
public.invchead,
public.cntct,
public.cntct corrcntct
WHERE
invchead.invchead_cust_id = custinfo.cust_id AND
cntct.cntct_id = custinfo.cust_cntct_id AND
corrcntct.cntct_id = custinfo.cust_corrcntct_id;
PostgreSQL won't actually let you test an integer field for a blank value (unless you're using a truly ancient version - 8.2 or older), so you must be using a query generator that's "helpfully" transforming '' to NULL or a tool that's ignoring errors.
Observe this, on Pg 9.2:
regress=> CREATE TABLE test ( a integer );
CREATE TABLE
regress=> insert into test (a) values (1),(2),(3);
INSERT 0 3
regress=> SELECT a FROM test WHERE a = '';
ERROR: invalid input syntax for integer: ""
LINE 1: SELECT a FROM test WHERE a = '';
If you are attempting to test for = NULL, this is not correct. You must use IS NOT NULL or IS DISTINCT FROM NULL instead. Testing for = NULL always results in NULL, which is treated as false in a WHERE clause.
Example:
regress=> insert into test (a) values (null);
INSERT 0 1
regress=> SELECT a FROM test WHERE a = NULL;
a
---
(0 rows)
regress=> SELECT a FROM test WHERE a IS NULL;
a
---
(1 row)
regress=> SELECT NULL = NULL as wrong, NULL IS NULL AS right;
wrong | right
-------+-------
| t
(1 row)
By the way, you should really be using ANSI JOIN syntax. It's more readable and it's much easier to forget to put a condition in and get a cartesian product by accident. I'd rewrite your query for identical functionality and performance but better readability as:
SELECT
cntct.cntct_email AS cntct_email,
corrcntct.cntct_email AS corrcntct_email
FROM
public.custinfo ci
INNER JOIN public.invchead
ON (invchead.invchead_cust_id = ci.cust_id)
INNER JOIN public.cntct
ON (cntct.cntct_id = ci.cust_cntct_id)
INNER JOIN public.cntct corrcntct
ON (corrcntct.cntct_id = ci.cust_corrcntct_id);
Use of table aliases usually keeps it cleaner; here I've aliased the longer name custinfo to ci for brevity.

Using a bit input in stored procedure to determine how to filter results in the where clause

I'm beating my head against the wall here... can't figure out a way to pull this off.
Here's my setup:
My table has a column for the date something was completed. If it was never completed, the field is null. Simple enough.
On the front end, I have a checkbox that defaults to "Only show incomplete entries". When only pulling incomplete entries, it's easy.
SELECT
*
FROM Sometable
WHERE Completed_Date IS NULL
But offering the checkbox option complicates things a great deal. My checkbox inputs a bit value: 1=only show incomplete, 0=show all.
The problem is, I can't use a CASE statement within the where clause, because an actual value uses "=" to compare, and checking null uses "IS". For example:
SELECT
*
FROM Sometable
WHERE Completed_Date IS <---- invalid syntax
CASE WHEN
...
END
SELECT
*
FROM Sometable
WHERE Completed_Date =
CASE WHEN #OnlyIncomplete = 1 THEN
NULL <----- this translates to "WHERE Completed_Date = NULL", which won't work.. I have to use "IS NULL"
...
END
Any idea how to accomplish this seemly easy task? I'm stumped... thanks.
...
WHERE #OnlyIncomplete = 0
OR (#OnlyIncomplete = 1 AND Completed_Date IS NULL)
Hmmm... I think what you want is this:
SELECT
*
FROM Sometable
WHERE Completed_Date IS NULL OR (#OnlyIncomplete = 0)
So that'll show Date=NULL plus, if OnlyIncomplete=0, Date != Null. Yeah, I think that's it.
If you still want to use a CASE function (although it may be overkill in this case) :
SELECT
*
FROM Sometable
WHERE 1 =
(CASE WHEN #OnlyIncomplete = 0 THEN 1
WHEN #OnlyIncomplete = 1 AND Completed_Date IS NULL THEN 1
END)

SQL Server 2008 Stored Proc Performance where Column = NULL

When I execute a certain stored procedure (which selects from a non-indexed view) with a non-null parameter, it's lightning fast at about 10ms. When I execute it with a NULL parameter (resulting in a FKColumn = NULL query) it's much slower at about 1200ms.
I've executed it with the actual execution plan and it appears the most costly portion of the query is a clustered index scan with the predicate IS NULL on the fk column in question - 59%! The index covering this column is (AFAIK) good.
So what can I do to improve the performance here? Change the fk column to NOT NULL and fill the nulls with a default value?
SELECT top 20 dbo.vwStreamItems.ItemId
,dbo.vwStreamItems.ItemType
,dbo.vwStreamItems.AuthorId
,dbo.vwStreamItems.AuthorPreviewImageURL
,dbo.vwStreamItems.AuthorThumbImageURL
,dbo.vwStreamItems.AuthorName
,dbo.vwStreamItems.AuthorLocation
,dbo.vwStreamItems.ItemText
,dbo.vwStreamItems.ItemLat
,dbo.vwStreamItems.ItemLng
,dbo.vwStreamItems.CommentCount
,dbo.vwStreamItems.PhotoCount
,dbo.vwStreamItems.VideoCount
,dbo.vwStreamItems.CreateDate
,dbo.vwStreamItems.Language
,dbo.vwStreamItems.ProfileIsFriendsOnly
,dbo.vwStreamItems.IsActive
,dbo.vwStreamItems.LocationIsFriendsOnly
,dbo.vwStreamItems.IsFriendsOnly
,dbo.vwStreamItems.IsDeleted
,dbo.vwStreamItems.StreamId
,dbo.vwStreamItems.StreamName
,dbo.vwStreamItems.StreamOwnerId
,dbo.vwStreamItems.StreamIsDeleted
,dbo.vwStreamItems.RecipientId
,dbo.vwStreamItems.RecipientName
,dbo.vwStreamItems.StreamIsPrivate
,dbo.GetUserIsFriend(#RequestingUserId, vwStreamItems.AuthorId) as IsFriend
,dbo.GetObjectIsBookmarked(#RequestingUserId, vwStreamItems.ItemId) as IsBookmarked
from dbo.vwStreamItems WITH (NOLOCK)
where 1 = 1
and vwStreamItems.IsActive = 1
and vwStreamItems.IsDeleted = 0
and vwStreamItems.StreamIsDeleted = 0
and (
StreamId is NULL
or
ItemType = 'Stream'
)
order by CreateDate desc
When it's not null, do you have
and vwStreamItems.StreamIsDeleted = 0
and (
StreamId = 'xxx'
or
ItemType = 'Stream'
)
or
and vwStreamItems.StreamIsDeleted = 0
and (
StreamId = 'xxx'
)
You have an OR clause there which is most likely the problem, not the IS NULL as such.
The plans will show why: the OR forces a SCAN but it's manageable with StreamId = 'xxx'. When you use IS NULL, you lose selectivity.
I'd suggest changing your index make StreamId the right-most column.
However, a view is simply a macro that expands so the underlying query on the base tables could be complex and not easy to optimise...
The biggest performance gain would be for you to try to loose GetUserIsFriend and GetObjectIsBookmarked functions and use JOIN to make the same functionality. Using functions or stored procedures inside a query is basically the same as using FOR loop - the items are called 1 by 1 to determine the value of a function. If you'd use joining tables instead, all of the items values would be determined together as a group in 1 pass.

LINQ to SQL Take w/o Skip Causes Multiple SQL Statements

I have a LINQ to SQL query:
from at in Context.Transaction
select new {
at.Amount,
at.PostingDate,
Details =
from tb in at.TransactionDetail
select new {
Amount = tb.Amount,
Description = tb.Desc
}
}
This results in one SQL statement being executed. All is good.
However, if I attempt to return known types from this query, even if they have the same structure as the anonymous types, I get one SQL statement executed for the top level and then an additional SQL statement for each "child" set.
Is there any way to get LINQ to SQL to issue one SQL statement and use known types?
EDIT: I must have another issue. When I plugged a very simplistic (but still hieararchical) version of my query into LINQPad and used freshly created known types with just 2 or 3 members, I did get one SQL statement. I will post and update when I know more.
EDIT 2: This appears to be due to a bug in Take. See my answer below for details.
First - some reasoning for the Take bug.
If you just Take, the query translator just uses top. Top10 will not give the right answer if cardinality is broken by joining in a child collection. So the query translator doesn't join in the child collection (instead it requeries for the children).
If you Skip and Take, then the query translator kicks in with some RowNumber logic over the parent rows... these rownumbers let it take 10 parents, even if that's really 50 records due to each parent having 5 children.
If you Skip(0) and Take, Skip is removed as a non-operation by the translator - it's just like you never said Skip.
This is going to be a hard conceptual leap to from where you are (calling Skip and Take) to a "simple workaround". What we need to do - is force the translation to occur at a point where the translator can't remove Skip(0) as a non-operation. We need to call Skip, and supply the skipped number at a later point.
DataClasses1DataContext myDC = new DataClasses1DataContext();
//setting up log so we can see what's going on
myDC.Log = Console.Out;
//hierarchical query - not important
var query = myDC.Options.Select(option => new{
ID = option.ParentID,
Others = myDC.Options.Select(option2 => new{
ID = option2.ParentID
})
});
//request translation of the query! Important!
var compQuery = System.Data.Linq.CompiledQuery
.Compile<DataClasses1DataContext, int, int, System.Collections.IEnumerable>
( (dc, skip, take) => query.Skip(skip).Take(take) );
//now run the query and specify that 0 rows are to be skipped.
compQuery.Invoke(myDC, 0, 10);
This produces the following query:
SELECT [t1].[ParentID], [t2].[ParentID] AS [ParentID2], (
SELECT COUNT(*)
FROM [dbo].[Option] AS [t3]
) AS [value]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[ID]) AS [ROW_NUMBER], [t0].[ParentID]
FROM [dbo].[Option] AS [t0]
) AS [t1]
LEFT OUTER JOIN [dbo].[Option] AS [t2] ON 1=1
WHERE [t1].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p1 + #p2
ORDER BY [t1].[ROW_NUMBER], [t2].[ID]
-- #p0: Input Int (Size = 0; Prec = 0; Scale = 0) [0]
-- #p1: Input Int (Size = 0; Prec = 0; Scale = 0) [0]
-- #p2: Input Int (Size = 0; Prec = 0; Scale = 0) [10]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1
And here's where we win!
WHERE [t1].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p1 + #p2
I've now determined this is the result of a horrible bug. The anonymous versus known type turned out not to be the cause. The real cause is Take.
The following result in 1 SQL statement:
query.Skip(1).Take(10).ToList();
query.ToList();
However, the following exhibit the one sql statement per parent row problem.
query.Skip(0).Take(10).ToList();
query.Take(10).ToList();
Can anyone think of any simple workarounds for this?
EDIT: The only workaround I've come up with is to check to see if I'm on the first page (IE Skip(0)) and then make two calls, one with Take(1) and the other with Skip(1).Take(pageSize - 1) and addRange the lists together.
I've not had a chance to try this but given that the anonymous type isn't part of LINQ rather a C# construct I wonder if you could use:
from at in Context.Transaction
select new KnownType(
at.Amount,
at.PostingDate,
Details =
from tb in at.TransactionDetail
select KnownSubType(
Amount = tb.Amount,
Description = tb.Desc
)
}
Obviously Details would need to be an IEnumerable collection.
I could be miles wide on this but it might at least give you a new line of thought to pursue which can't hurt so please excuse my rambling.

Resources