SQL Server 2008 Stored Proc Performance where Column = NULL - sql-server

When I execute a certain stored procedure (which selects from a non-indexed view) with a non-null parameter, it's lightning fast at about 10ms. When I execute it with a NULL parameter (resulting in a FKColumn = NULL query) it's much slower at about 1200ms.
I've executed it with the actual execution plan and it appears the most costly portion of the query is a clustered index scan with the predicate IS NULL on the fk column in question - 59%! The index covering this column is (AFAIK) good.
So what can I do to improve the performance here? Change the fk column to NOT NULL and fill the nulls with a default value?
SELECT top 20 dbo.vwStreamItems.ItemId
,dbo.vwStreamItems.ItemType
,dbo.vwStreamItems.AuthorId
,dbo.vwStreamItems.AuthorPreviewImageURL
,dbo.vwStreamItems.AuthorThumbImageURL
,dbo.vwStreamItems.AuthorName
,dbo.vwStreamItems.AuthorLocation
,dbo.vwStreamItems.ItemText
,dbo.vwStreamItems.ItemLat
,dbo.vwStreamItems.ItemLng
,dbo.vwStreamItems.CommentCount
,dbo.vwStreamItems.PhotoCount
,dbo.vwStreamItems.VideoCount
,dbo.vwStreamItems.CreateDate
,dbo.vwStreamItems.Language
,dbo.vwStreamItems.ProfileIsFriendsOnly
,dbo.vwStreamItems.IsActive
,dbo.vwStreamItems.LocationIsFriendsOnly
,dbo.vwStreamItems.IsFriendsOnly
,dbo.vwStreamItems.IsDeleted
,dbo.vwStreamItems.StreamId
,dbo.vwStreamItems.StreamName
,dbo.vwStreamItems.StreamOwnerId
,dbo.vwStreamItems.StreamIsDeleted
,dbo.vwStreamItems.RecipientId
,dbo.vwStreamItems.RecipientName
,dbo.vwStreamItems.StreamIsPrivate
,dbo.GetUserIsFriend(#RequestingUserId, vwStreamItems.AuthorId) as IsFriend
,dbo.GetObjectIsBookmarked(#RequestingUserId, vwStreamItems.ItemId) as IsBookmarked
from dbo.vwStreamItems WITH (NOLOCK)
where 1 = 1
and vwStreamItems.IsActive = 1
and vwStreamItems.IsDeleted = 0
and vwStreamItems.StreamIsDeleted = 0
and (
StreamId is NULL
or
ItemType = 'Stream'
)
order by CreateDate desc

When it's not null, do you have
and vwStreamItems.StreamIsDeleted = 0
and (
StreamId = 'xxx'
or
ItemType = 'Stream'
)
or
and vwStreamItems.StreamIsDeleted = 0
and (
StreamId = 'xxx'
)
You have an OR clause there which is most likely the problem, not the IS NULL as such.
The plans will show why: the OR forces a SCAN but it's manageable with StreamId = 'xxx'. When you use IS NULL, you lose selectivity.
I'd suggest changing your index make StreamId the right-most column.
However, a view is simply a macro that expands so the underlying query on the base tables could be complex and not easy to optimise...

The biggest performance gain would be for you to try to loose GetUserIsFriend and GetObjectIsBookmarked functions and use JOIN to make the same functionality. Using functions or stored procedures inside a query is basically the same as using FOR loop - the items are called 1 by 1 to determine the value of a function. If you'd use joining tables instead, all of the items values would be determined together as a group in 1 pass.

Related

Is there any way to optimize this query?

I need to optimize the following query:
IF object_id('tempdb..#TAB001') IS NOT NULL
DROP TABLE #TAB001;
select *
into #TAB001
from dbo.uvw_TAB001
where 1 = 1
and isnull(COD_CUSTOMER,'') = isnull(#cod_customer,isnull(COD_CUSTOMER,''))
and isnull(TAXCODE,'') = isnull(#taxcode, isnull(TAXCODE,''))
and isnull(SURNAME,'') = isnull(#surname,isnull(SURNAME,''))
and isnull(VATCODE,'') = isnull(#vatCode,isnull(VATCODE,''))
The goal is to improve the performance of this query.
It is currently quite fast but I would like to speed it up even more.
This query has the optional parameters for which it is necessary to make a query that regardless of whether all or 1 parameter is set, returns results in the shortest possible time.
What you have here is known as a "catch-all" or "kitchen sink" query, which need a little helping hand sometimes.
Firstly, you need to get rid of those ISNULLs; they are making the query non-SARGable. Also, I would suggest getting rid of the SELECT * and limiting the query to the columns you need.
Then, finally, we can add OPTION (RECOMPILE) to the query; why is discussed in the articles I linked above. This gives you the following:
SELECT * --Replace with Column Names
INTO #TAB001 --Do you actually need to do this?
FROM dbo.uvw_TAB001
--Removed WHERE 1 = 1 as it's always true, thus pointless
WHERE (COD_CUSTOMER = #cod_customer OR #cod_customer IS NULL)
AND (TAXCODE = #taxcode OR #taxcode IS NULL)
AND (SURNAME = #surname OR #surname IS NULL)
AND (VATCODE = #vatCode OR #vatCode IS NULL)
OPTION (RECOMPILE);
Note I am assuming that when a variable (for example #cod_customer) has the value NULL you mean that the variable should be "ignored" and not matched against NULL.
If you actually want {Column} = #{Variable} including NULL then use SQL with the format below instead:
({Column} = #{Variable} OR ({Column} IS NULL AND #{Variable} IS NULL))

How to Improve the ADO Lookup Speed?

I write a C++ application via Visual Studio 2008 + ADO(not ADO.net). Which will do the following tasks one by one:
Create a table in SQL Server database, as follows:
CREATE TABLE MyTable
(
[S] bigint,
[L] bigint,
[T] tinyint,
[I1] int,
[I2] smallint,
[P] bigint,
[PP] bigint,
[NP] bigint,
[D] bit,
[U] bit
);
Insert 5,030,242 records via BULK INSERT
Create an index on the table:
CREATE Index [MyIndex] ON MyTable ([P]);
Start a function which will lookup for 65,000,000 times. Each lookup using the following query:
SELECT [S], [L]
FROM MyTable
WHERE [P] = ?
Each time the query will either return nothing, or return one row. If getting one row with the [S] and [L], I will convert [S] to a file pointer and then read data from offset specified by [L].
Step 4 takes a lot of time. So I try to profile it and find out the lookup query takes the most of the time. Each lookup will take about 0.01458 second.
I try to improve the performance by doing the following tasks:
Use parametered ADO query. See step 4
Select only the required columns. Originally I use "Select *" for step 4, now I use Select [S], [L] instead. This improves performance by about 1.5%.
Tried both clustered and non-clustered index for [P]. It seems that using non-clustered index will be a little better.
Are there any other spaces to improve the lookup performance?
Note: [P] is unique in the table.
Thank you very much.
You need to batch the work and perform one query that returns many rows, instead of many queries each returning only one row (and incurring a separate round-trip to the database).
The way to do it in SQL Server is to rewrite the query to use a table-valued parameter (TVP), and pass all the search criteria (denoted as ? in your question) together in one go.
First we need to declare the type that the TVP will use:
CREATE TYPE MyTableSearch AS TABLE (
P bigint NOT NULL
);
And then the new query will be pretty simple:
SELECT
S,
L
FROM
#input I
JOIN MyTable
ON I.P = MyTable.P;
The main complication is on the client side, in how to bind the TVP to the query. Unfortunately, I'm not familiar with ADO - for what its worth, this is how it would be done under ADO.NET and C#:
static IEnumerable<(long S, long L)> Find(
SqlConnection conn,
SqlTransaction tran,
IEnumerable<long> input
) {
const string sql = #"
SELECT
S,
L
FROM
#input I
JOIN MyTable
ON I.P = MyTable.P
";
using (var cmd = new SqlCommand(sql, conn, tran)) {
var record = new SqlDataRecord(new SqlMetaData("P", SqlDbType.BigInt));
var param = new SqlParameter("input", SqlDbType.Structured) {
Direction = ParameterDirection.Input,
TypeName = "MyTableSearch",
Value = input.Select(
p => {
record.SetValue(0, p);
return record;
}
)
};
cmd.Parameters.Add(param);
using (var reader = cmd.ExecuteReader())
while (reader.Read())
yield return (reader.GetInt64(0), reader.GetInt64(1));
}
}
Note that we reuse the same SqlDataRecord for all input rows, which minimizes allocations. This is documented behavior, and it works because ADO.NET streams TVPs.
Note: [P] is unique in the table.
Then you should make the index on P unique too - for correctness and to avoid wasting space on the uniquifier.

mysql complex select query from multiple tables

Table Visits;
fields[id,patient_id(fk),doctor_id(fk),flag(Xfk),type(Xfk),time_booked,date,...]
Xfk = it refer to other table, but its not a must to exist so i dont use constrain.
SELECT `v`.`date`, `v`.`time_booked`, `v`.`stats`, `p`.`name` as pt_name,
`d`.`name` as dr_name, `f`.`name` as flag_name, `f`.`color` as flag_color,
`vt`.`name` as type, `vt`.`color` as type_color
FROM (`visits` v, `users` p, `users` d, `flags` f, `visit_types` vt)
WHERE `p`.`id`=`v`.`patient_id`
AND `d`.`id`=`v`.`doctor_id`
AND `v`.`flag`=`f`.`id`
AND `v`.`type`=`vt`.`id`
AND `v`.`date` >= '2013-02-27'
AND (v.date <= DATE_ADD('2013-02-27', INTERVAL 7 DAY))
AND (`v`.`doctor_id`='00002' OR `v`.`doctor_id`='00001')
ORDER BY `v`.`date` ASC, `v`.`time_booked` ASC;
One big statmeant i have !
my question is,
1: should i consider using join instead of select multiple tables ?
and if i should why ?
this query execution time is 0.0009 so i think its fine, and since i get all my data in one query, or is it bad practice ?
2: in the select part i want to say
if v.type != 0 select f.name,f.color else i dont want to select them nither there tables flags f
is it possible ?
also currently if flag was not found, it replicate all rows as much as flag table have in rows ! is there a way i can prevent this ? both for
flag and visit_types table ?
If it's running fast, I wouldn't mess with it. I generally prefer to use joins instead of matching stuff in the where clause.
Any chance you'd remove the ` characters? Just makes it a bit harder to read in my opinion.
Look at the case statement for MySQL: http://dev.mysql.com/doc/refman/5.0/en/case.html
select case when v.type <> 0 then
f.name
else
''
end as name, ...

Query with integers not working

I've been searching here on stackoverflow and other sources but not found a solution to this
The query below works as expected expect for when either custinfo.cust_cntct_id or custinfo.cust_corrcntct_id = '' (blank not NULL) then I get no results. Both are integer fields and if both have an integer value then I get results. I still want a value returned for either cntct_email or corrcntct_email even if custinfo.cust_cntct_id or custinfo.cust_corrcntct_id = blank
Can someone help me out in making this work? The database is PostgreSQL.
SELECT
cntct.cntct_email AS cntct_email,
corrcntct.cntct_email AS corrcntct_email
FROM
public.custinfo,
public.invchead,
public.cntct,
public.cntct corrcntct
WHERE
invchead.invchead_cust_id = custinfo.cust_id AND
cntct.cntct_id = custinfo.cust_cntct_id AND
corrcntct.cntct_id = custinfo.cust_corrcntct_id;
PostgreSQL won't actually let you test an integer field for a blank value (unless you're using a truly ancient version - 8.2 or older), so you must be using a query generator that's "helpfully" transforming '' to NULL or a tool that's ignoring errors.
Observe this, on Pg 9.2:
regress=> CREATE TABLE test ( a integer );
CREATE TABLE
regress=> insert into test (a) values (1),(2),(3);
INSERT 0 3
regress=> SELECT a FROM test WHERE a = '';
ERROR: invalid input syntax for integer: ""
LINE 1: SELECT a FROM test WHERE a = '';
If you are attempting to test for = NULL, this is not correct. You must use IS NOT NULL or IS DISTINCT FROM NULL instead. Testing for = NULL always results in NULL, which is treated as false in a WHERE clause.
Example:
regress=> insert into test (a) values (null);
INSERT 0 1
regress=> SELECT a FROM test WHERE a = NULL;
a
---
(0 rows)
regress=> SELECT a FROM test WHERE a IS NULL;
a
---
(1 row)
regress=> SELECT NULL = NULL as wrong, NULL IS NULL AS right;
wrong | right
-------+-------
| t
(1 row)
By the way, you should really be using ANSI JOIN syntax. It's more readable and it's much easier to forget to put a condition in and get a cartesian product by accident. I'd rewrite your query for identical functionality and performance but better readability as:
SELECT
cntct.cntct_email AS cntct_email,
corrcntct.cntct_email AS corrcntct_email
FROM
public.custinfo ci
INNER JOIN public.invchead
ON (invchead.invchead_cust_id = ci.cust_id)
INNER JOIN public.cntct
ON (cntct.cntct_id = ci.cust_cntct_id)
INNER JOIN public.cntct corrcntct
ON (corrcntct.cntct_id = ci.cust_corrcntct_id);
Use of table aliases usually keeps it cleaner; here I've aliased the longer name custinfo to ci for brevity.

LINQ to SQL Take w/o Skip Causes Multiple SQL Statements

I have a LINQ to SQL query:
from at in Context.Transaction
select new {
at.Amount,
at.PostingDate,
Details =
from tb in at.TransactionDetail
select new {
Amount = tb.Amount,
Description = tb.Desc
}
}
This results in one SQL statement being executed. All is good.
However, if I attempt to return known types from this query, even if they have the same structure as the anonymous types, I get one SQL statement executed for the top level and then an additional SQL statement for each "child" set.
Is there any way to get LINQ to SQL to issue one SQL statement and use known types?
EDIT: I must have another issue. When I plugged a very simplistic (but still hieararchical) version of my query into LINQPad and used freshly created known types with just 2 or 3 members, I did get one SQL statement. I will post and update when I know more.
EDIT 2: This appears to be due to a bug in Take. See my answer below for details.
First - some reasoning for the Take bug.
If you just Take, the query translator just uses top. Top10 will not give the right answer if cardinality is broken by joining in a child collection. So the query translator doesn't join in the child collection (instead it requeries for the children).
If you Skip and Take, then the query translator kicks in with some RowNumber logic over the parent rows... these rownumbers let it take 10 parents, even if that's really 50 records due to each parent having 5 children.
If you Skip(0) and Take, Skip is removed as a non-operation by the translator - it's just like you never said Skip.
This is going to be a hard conceptual leap to from where you are (calling Skip and Take) to a "simple workaround". What we need to do - is force the translation to occur at a point where the translator can't remove Skip(0) as a non-operation. We need to call Skip, and supply the skipped number at a later point.
DataClasses1DataContext myDC = new DataClasses1DataContext();
//setting up log so we can see what's going on
myDC.Log = Console.Out;
//hierarchical query - not important
var query = myDC.Options.Select(option => new{
ID = option.ParentID,
Others = myDC.Options.Select(option2 => new{
ID = option2.ParentID
})
});
//request translation of the query! Important!
var compQuery = System.Data.Linq.CompiledQuery
.Compile<DataClasses1DataContext, int, int, System.Collections.IEnumerable>
( (dc, skip, take) => query.Skip(skip).Take(take) );
//now run the query and specify that 0 rows are to be skipped.
compQuery.Invoke(myDC, 0, 10);
This produces the following query:
SELECT [t1].[ParentID], [t2].[ParentID] AS [ParentID2], (
SELECT COUNT(*)
FROM [dbo].[Option] AS [t3]
) AS [value]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[ID]) AS [ROW_NUMBER], [t0].[ParentID]
FROM [dbo].[Option] AS [t0]
) AS [t1]
LEFT OUTER JOIN [dbo].[Option] AS [t2] ON 1=1
WHERE [t1].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p1 + #p2
ORDER BY [t1].[ROW_NUMBER], [t2].[ID]
-- #p0: Input Int (Size = 0; Prec = 0; Scale = 0) [0]
-- #p1: Input Int (Size = 0; Prec = 0; Scale = 0) [0]
-- #p2: Input Int (Size = 0; Prec = 0; Scale = 0) [10]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1
And here's where we win!
WHERE [t1].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p1 + #p2
I've now determined this is the result of a horrible bug. The anonymous versus known type turned out not to be the cause. The real cause is Take.
The following result in 1 SQL statement:
query.Skip(1).Take(10).ToList();
query.ToList();
However, the following exhibit the one sql statement per parent row problem.
query.Skip(0).Take(10).ToList();
query.Take(10).ToList();
Can anyone think of any simple workarounds for this?
EDIT: The only workaround I've come up with is to check to see if I'm on the first page (IE Skip(0)) and then make two calls, one with Take(1) and the other with Skip(1).Take(pageSize - 1) and addRange the lists together.
I've not had a chance to try this but given that the anonymous type isn't part of LINQ rather a C# construct I wonder if you could use:
from at in Context.Transaction
select new KnownType(
at.Amount,
at.PostingDate,
Details =
from tb in at.TransactionDetail
select KnownSubType(
Amount = tb.Amount,
Description = tb.Desc
)
}
Obviously Details would need to be an IEnumerable collection.
I could be miles wide on this but it might at least give you a new line of thought to pursue which can't hurt so please excuse my rambling.

Resources