Microsoft SQL Server: wrong query execution plan taking too long - sql-server

This is on Windows SQL Server Cluster.
Query is coming from 3rd party application so I can not modify the query permanently.
Query is:
DECLARE #FromBrCode INT = 1001
DECLARE #ToBrCode INT = 1637
DECLARE #Cdate DATE = '31-mar-2017'
SELECT
a.PrdCd, a.Name, SUM(b.Balance4) as Balance
FROM
D009021 a, D010014 b
WHERE
a.PrdCd = LTRIM(RTRIM(SUBSTRING(b.PrdAcctId, 1, 8)))
AND substring(b.PrdAcctId, 9, 24) = '000000000000000000000000'
AND a.LBrCode = b.LBrCode
AND a.LBrCode BETWEEN #FromBrCode AND #ToBrCode
AND b.CblDate = (SELECT MAX(c.CblDate)
FROM D010014 c
WHERE c.PrdAcctId = b.PrdAcctId
AND c.LBrCode = b.LBrCode
AND c.CblDate <= #Cdate)
GROUP BY
a.PrdCd, a.Name
HAVING
SUM(b.Balance4) <> 0
ORDER BY
a.PrdCd
This particular query is taking too much time to complete execution. The same problem happens on a different SQL Server.
No table lock was found, processor and memory usage is normal while the query is running.
Normal "select top 1000" working and showing output instantly in both tables (D009021, D010014)
Reindex and rebuild / update stats done in both tables but problem did not resolve (D009021, D010014)
The same query is working if we reduce number of branch but slowly
(
DECLARE #FromBrCode INT =1001
DECLARE #ToBrCode INT =1001
)
The same query is working faster giving output within 2 mins if we replace any one variable and use the value directly
AND a.LBrCode BETWEEN #FromBrCode AND #ToBrCode
changed to
AND a.LBrCode BETWEEN 1001 AND #ToBrCode
The same query is working faster and giving output within 2 mins if we add "OPTION (RECOMPILE)" at end
I tried to clean cache query execution plan and optimized new one but problem still exists
Found that the query estimate plan and actual execution plan are different (see screenshots)

Table D010014 is aliased twice once as b and once as c
the they are joined to the same table.
Try toto remove the sub query below and create a temp table to store
the values you need. I added * to the fields you self join
SELECT MAX(c.CblDate)
FROM D010014 c
WHERE c.PrdAcctId = b.PrdAcctId
AND c.LBrCode = b.LBrCode
AND c.CblDate <= #Cdate
if you cant do that then try
SELECT TOP 1 c.CblDate
FROM D010014 c
WHERE c.PrdAcctId = b.PrdAcctId
AND c.LBrCode = b.LBrCode
AND c.CblDate <= #Cdate
ORDER BY c.CblDate DESC

Related

Peewee select query with multiple joins and multiple counts

I've been attempting to write a peewee select query which results in a table with 2 counts (one for the number of prizes associated with the lottery, and the for the number of packages associated with the lottery), as well as the fields in the Lottery model.
I've managed to write select queries with 1 count working (seen below), and then I've had to convert the ModelSelects to lists and join them manually (which I think is very hacky).
I did manage to write a select query where the results were joined, but it would multiply the packages count with the prizes count (I've since lost that query).
I also tried using a .switch(Lottery) but I didn't have any luck with this.
query1 = (Lottery.select(Lottery,fn.count(Package.id).alias('packages'))
.join(LotteryPackage)
.join(Package)
.order_by(Lottery.id)
.group_by(Lottery)
.dicts())
query2 = (Lottery.select(Lottery.id.alias('lotteryID'), fn.count(Prize.id).alias('prizes'))
.join(LotteryPrize)
.join(Prize)
.group_by(Lottery)
.order_by(Lottery.id)
.dicts())
lottery = list(query1)
query3 = list(query2)
for x in range(len(lottery)):
lottery[x]['prizes'] = query3[x]['prizes']
While the above code works, is there a cleaner way to write this query?
Your best bet is to do this with subqueries.
# Create query which gets lottery id and count of packages.
L1 = Lottery.alias()
subq1 = (L1
.select(L1.id, fn.COUNT(LotteryPackage.package).alias('packages'))
.join(LotteryPackage, JOIN.LEFT_OUTER)
.group_by(L1.id))
# Create query which gets lottery id and count of prizes.
L2 = Lottery.alias()
subq2 = (L2
.select(L2.id, fn.COUNT(LotteryPrize.prize).alias('prizes'))
.join(LotteryPrize, JOIN.LEFT_OUTER)
.group_by(L2.id))
# Select from lottery, joining on each subquery and returning
# the counts.
query = (Lottery
.select(Lottery, subq1.c.packages, subq2.c.prizes)
.join(subq1, on=(Lottery.id == subq1.c.id))
.join(subq2, on=(Lottery.id == subq2.c.id))
.order_by(Lottery.name))
for row in query.objects():
print(row.name, row.packages, row.prizes)

Parameterized query creates an unfavorable execution plan. Optimize for parameters that are not NULL

As a forewarning, I'm working with a SQL query generated from entity framework, entity framework is irrelevant to the question though.
Some context:
I am trying to pull specific records from a batch of 4,000 C# objects and perform an update or insert on them. I do not have the primary key of the records, for the objects come from an API, so I have to use a unique set of columns to pull the correct record.
The (simplified) queries and their execution plans:
Parameterized query (parameters are declared with a value set to 1 for demonstration purposes)
SELECT [x].[Id]
FROM [Gradebook].[AssignmentScore] AS [x]
WHERE
( ( ((([x].[CourseSectionAssignment_Id] = #__CSA_1) AND ([x].[Student_Id] = #__S_ID_1)) AND ([x].[AssessmentGBID] = #__A_GBID_1))
OR ((([x].[CourseSectionAssignment_Id] = #__CSA_2) AND ([x].[Student_Id] = #__S_ID_2)) AND ([x].[AssessmentGBID] = #__A_GBID_2)) )
OR ((([x].[CourseSectionAssignment_Id] = #__CSA_3) AND ([x].[Student_Id] = #__S_ID_3)) AND ([x].[AssessmentGBID] = #__A_GBID_3)) )
And it's (unfavorable) execution plan:
Literal values query:
SELECT [x].[Id]
FROM [Gradebook].[AssignmentScore] AS [x]
WHERE
( ( ((([x].[CourseSectionAssignment_Id] = 1) AND ([x].[Student_Id] = 2)) AND ([x].[AssessmentGBID] = 3))
OR ((([x].[CourseSectionAssignment_Id] = 4) AND ([x].[Student_Id] = 5)) AND ([x].[AssessmentGBID] = 6)) )
OR ((([x].[CourseSectionAssignment_Id] = 7) AND ([x].[Student_Id] = 8)) AND ([x].[AssessmentGBID] = 9)) )
And it's (favorable) execution plan:
I know why, or at least I believe I know why, the execution plans are different. That is because the optimizer does not know whether the parameters will be NULL and must optimize around that case. Testing the literal query with NULLs creates the unfavorable execution plan. (Why wouldn't parameter sniffing see that the values are not null and create the better execution plan?)
Currently in the C# code, I am manually using expression trees to replace the object's properties with expression constants so that the generated query is the literal query. As far as I'm aware, literals force SQL server to generate a new execution plan each time, which isn't great.
I would like the parameterized query to generate the favorable execution plan. So far, the only answer I've found is using the OPTION(RECOMPILE) hint, which isn't exactly what I want, since it forces the execution plan to be recreated.
How can I use the same, favorable execution plan each time with the parameterized query?
If you can't get around a bad index, in my experience, this will consistently produce a good execution plan.
SELECT [x].[Id]
FROM [Gradebook].[AssignmentScore] AS [x]
WHERE [x].[CourseSectionAssignment_Id] = 1
AND [x].[Student_Id] = 2
AND [x].[AssessmentGBID] = 3
UNION
SELECT [x].[Id]
FROM [Gradebook].[AssignmentScore] AS [x]
WHERE [x].[CourseSectionAssignment_Id] = 4
AND [x].[Student_Id] = 5
AND [x].[AssessmentGBID] = 6
UNION
SELECT [x].[Id]
FROM [Gradebook].[AssignmentScore] AS [x]
WHERE [x].[CourseSectionAssignment_Id] = 7
AND [x].[Student_Id] = 8
AND [x].[AssessmentGBID] = 9
Alternatively, you could create a temp table with the "valid" combinations and simply join to it on these three fields.

Never ending query in SQL Server 2012

SQL Server 2012, Amazon RDS
This is my simple query
update [dbo].[DeliveryPlan]
set [Amount] = dp.Amount +
case when #useAmountColumn = 1 and dbo.ConvertToInt(bs.Amount) > 0
then dbo.ConvertToInt(bs.Amount)
else #amount
end
from
BaseSpecification bs
join
BaseSpecificationStatusType t on (StatusTypeID = t.StatusTypeID)
join
[DeliveryPlan] dp on (dp.BaseSpecificationID = bs.BaseSpecificationID and dp.ItemID = #itemID)
where
bs.BaseID = 130 and t.IsActive = 1
It can't be finished. If where condition bs.BaseID=130 (update 7000 rows) change for bs.BaseID=3 (update 1000000 rows) it lasts 13 sec.
Statistics are actual, I think
In performance monitor I see 5% processor usage
When I use sp to watch active connections and for this query
tempdb_allocations is 32, tembdb_current - 32, reads - 32 000 000, cpu - 860 000 (query lasts 20 minutes)
What is the problem?
UPDATE: I added non-clustered index for [DeliveryPlan] - by BaseSpecificationID + ItemID and problem is gone. Unfortunately I see this problem every day with different queries. And problem disappears unpredicatedly.
This will perform better and in a different way as the join conditions will narrow down the number of rows in the first go itself, rather than waiting for the where clause to execute. The execution plan will be different for both (with where/ without where).
UPDATE dp
SET Amount = dp.Amount + CASE
WHEN #useAmountColumn = 1
AND dbo.ConvertToInt( bs.Amount ) > 0 THEN dbo.ConvertToInt( bs.Amount )
ELSE #amount
END
FROM BaseSpecification bs
JOIN BaseSpecificationStatusType t ON
( bs.StatusTypeID = t.StatusTypeID
AND bs.BaseID = 130
AND t.IsActive = 1
)
JOIN DeliveryPlan dp ON
( dp.BaseSpecificationID = bs.BaseSpecificationID
AND dp.ItemID = #itemID
);
You may suffer from a locking condition for your base tables.
Optimize your query to update dp directly to avoid update all rows of DeliveryPlan
update dp set [Amount] = dp.Amount +
case
when #useAmountColumn=1 and dbo.ConvertToInt(bs.Amount)>0 then
dbo.ConvertToInt(bs.Amount)
else #amount
end
from
BaseSpecification bs
join BaseSpecificationStatusType t on (bs.StatusTypeID = t.StatusTypeID)
join [DeliveryPlan] dp on (dp.BaseSpecificationID = bs.BaseSpecificationID)
where
bs.BaseID = 130
and t.IsActive = 1
and dp.ItemID = #itemID
If the problem mentioned in the update part is that it comes and goes randomly, it sounds like bad parameter sniffing. When the problem happens you could look into plan cache to check if the query plan looks ok and in case it doesn't, what are the values the plan was created with (you can find them in the leftmost object in the plan) and for example use sp_recompile and see what kind of plan you'll get the next time.

How can I find what locks are involved in a process

I am running a SQL transaction with a bunch of statements in it.
The transaction was causing other processes to deadlock very occasionally, so I removed some of the things from the transaction that weren't really important. These are now done separately before the transaction.
I want to be able to compare the locking that occurs between the SQL before and after my change so that I can be confident the change will make a difference.
I expect more locking occurred before because more things were in the transaction.
Are there any tools that I can use? I can pretty easily get a SQL profile of both cases.
I am aware of things like sp_who, sp_who2, but the thing I struggle with for those things is that this is a snapshot in a particular moment in time. I would like the full picture from start to finish.
You can use SQL Server Profiler. Set up a profiler trace that includes the Lock:Acquired and Lock:Released events. Run your "before" query. Run your "after" query. Compare and contrast the locks taken (and types of locks). For context, you probably still want to also include some of the statement or batch events also, to see which statements are causing each lock to be taken.
you can use in built procedure:-- sp_who2
sp_who2 also takes a optional parameter of a SPID. If a spid is passed, then the results of sp_who2 only show the row or rows of the executing SPID.
for more detail info you can check: master.dbo.sysprocesses table
SELECT * FROM master.dbo.sysprocesses where spid=#1
below code shows reads and writes for the current command, along with the number of reads and writes for the entire SPID. It also shows the protocol being used (TCP, NamedPipes, or Shared Memory).
CREATE PROCEDURE sp_who3
(
#SessionID int = NULL
)
AS
BEGIN
SELECT
SPID = er.session_id
,Status = ses.status
,[Login] = ses.login_name
,Host = ses.host_name
,BlkBy = er.blocking_session_id
,DBName = DB_Name(er.database_id)
,CommandType = er.command
,SQLStatement =
SUBSTRING
(
qt.text,
er.statement_start_offset/2,
(CASE WHEN er.statement_end_offset = -1
THEN LEN(CONVERT(nvarchar(MAX), qt.text)) * 2
ELSE er.statement_end_offset
END - er.statement_start_offset)/2
)
,ObjectName = OBJECT_SCHEMA_NAME(qt.objectid,dbid) + '.' + OBJECT_NAME(qt.objectid, qt.dbid)
,ElapsedMS = er.total_elapsed_time
,CPUTime = er.cpu_time
,IOReads = er.logical_reads + er.reads
,IOWrites = er.writes
,LastWaitType = er.last_wait_type
,StartTime = er.start_time
,Protocol = con.net_transport
,transaction_isolation =
CASE ses.transaction_isolation_level
WHEN 0 THEN 'Unspecified'
WHEN 1 THEN 'Read Uncommitted'
WHEN 2 THEN 'Read Committed'
WHEN 3 THEN 'Repeatable'
WHEN 4 THEN 'Serializable'
WHEN 5 THEN 'Snapshot'
END
,ConnectionWrites = con.num_writes
,ConnectionReads = con.num_reads
,ClientAddress = con.client_net_address
,Authentication = con.auth_scheme
FROM sys.dm_exec_requests er
LEFT JOIN sys.dm_exec_sessions ses
ON ses.session_id = er.session_id
LEFT JOIN sys.dm_exec_connections con
ON con.session_id = ses.session_id
OUTER APPLY sys.dm_exec_sql_text(er.sql_handle) as qt
WHERE #SessionID IS NULL OR er.session_id = #SessionID
AND er.session_id > 50
ORDER BY
er.blocking_session_id DESC
,er.session_id
END

LINQ to SQL Take w/o Skip Causes Multiple SQL Statements

I have a LINQ to SQL query:
from at in Context.Transaction
select new {
at.Amount,
at.PostingDate,
Details =
from tb in at.TransactionDetail
select new {
Amount = tb.Amount,
Description = tb.Desc
}
}
This results in one SQL statement being executed. All is good.
However, if I attempt to return known types from this query, even if they have the same structure as the anonymous types, I get one SQL statement executed for the top level and then an additional SQL statement for each "child" set.
Is there any way to get LINQ to SQL to issue one SQL statement and use known types?
EDIT: I must have another issue. When I plugged a very simplistic (but still hieararchical) version of my query into LINQPad and used freshly created known types with just 2 or 3 members, I did get one SQL statement. I will post and update when I know more.
EDIT 2: This appears to be due to a bug in Take. See my answer below for details.
First - some reasoning for the Take bug.
If you just Take, the query translator just uses top. Top10 will not give the right answer if cardinality is broken by joining in a child collection. So the query translator doesn't join in the child collection (instead it requeries for the children).
If you Skip and Take, then the query translator kicks in with some RowNumber logic over the parent rows... these rownumbers let it take 10 parents, even if that's really 50 records due to each parent having 5 children.
If you Skip(0) and Take, Skip is removed as a non-operation by the translator - it's just like you never said Skip.
This is going to be a hard conceptual leap to from where you are (calling Skip and Take) to a "simple workaround". What we need to do - is force the translation to occur at a point where the translator can't remove Skip(0) as a non-operation. We need to call Skip, and supply the skipped number at a later point.
DataClasses1DataContext myDC = new DataClasses1DataContext();
//setting up log so we can see what's going on
myDC.Log = Console.Out;
//hierarchical query - not important
var query = myDC.Options.Select(option => new{
ID = option.ParentID,
Others = myDC.Options.Select(option2 => new{
ID = option2.ParentID
})
});
//request translation of the query! Important!
var compQuery = System.Data.Linq.CompiledQuery
.Compile<DataClasses1DataContext, int, int, System.Collections.IEnumerable>
( (dc, skip, take) => query.Skip(skip).Take(take) );
//now run the query and specify that 0 rows are to be skipped.
compQuery.Invoke(myDC, 0, 10);
This produces the following query:
SELECT [t1].[ParentID], [t2].[ParentID] AS [ParentID2], (
SELECT COUNT(*)
FROM [dbo].[Option] AS [t3]
) AS [value]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[ID]) AS [ROW_NUMBER], [t0].[ParentID]
FROM [dbo].[Option] AS [t0]
) AS [t1]
LEFT OUTER JOIN [dbo].[Option] AS [t2] ON 1=1
WHERE [t1].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p1 + #p2
ORDER BY [t1].[ROW_NUMBER], [t2].[ID]
-- #p0: Input Int (Size = 0; Prec = 0; Scale = 0) [0]
-- #p1: Input Int (Size = 0; Prec = 0; Scale = 0) [0]
-- #p2: Input Int (Size = 0; Prec = 0; Scale = 0) [10]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1
And here's where we win!
WHERE [t1].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p1 + #p2
I've now determined this is the result of a horrible bug. The anonymous versus known type turned out not to be the cause. The real cause is Take.
The following result in 1 SQL statement:
query.Skip(1).Take(10).ToList();
query.ToList();
However, the following exhibit the one sql statement per parent row problem.
query.Skip(0).Take(10).ToList();
query.Take(10).ToList();
Can anyone think of any simple workarounds for this?
EDIT: The only workaround I've come up with is to check to see if I'm on the first page (IE Skip(0)) and then make two calls, one with Take(1) and the other with Skip(1).Take(pageSize - 1) and addRange the lists together.
I've not had a chance to try this but given that the anonymous type isn't part of LINQ rather a C# construct I wonder if you could use:
from at in Context.Transaction
select new KnownType(
at.Amount,
at.PostingDate,
Details =
from tb in at.TransactionDetail
select KnownSubType(
Amount = tb.Amount,
Description = tb.Desc
)
}
Obviously Details would need to be an IEnumerable collection.
I could be miles wide on this but it might at least give you a new line of thought to pursue which can't hurt so please excuse my rambling.

Resources