select * from FOO.MBR_DETAILS where BAR= 'BAZ' and MBR_No = '123'
execution time = 0.25 seconds
CREATE PROCEDURE My.MEMBER_SEARCH
(
i_BAR varchar(3),
i_member_surname varchar(50),
i_member_code varchar(10),
i_member_given_name varchar(50)
)
RESULT SETS 1
LANGUAGE SQL
BEGIN
DECLARE c1 cursor with return for
select *
FROM FOO.MBR_DETAILS m
WHERE
BAR= i_BAR
and (i_member_code = '' or m.MBR_No = i_member_code)
and (i_member_surname = '' or m.surname = i_member_surname)
and (i_member_given_name = '' or m.given_names LIKE '%'||i_member_given_name||'%');
OPEN c1;
END
call My.MEMBER_SEARCH('BAZ','','123','')
execution time = 1.9 seconds
I thought both queries should have a similar time as i_member_surname and i_member_given_name are both empty they would not be evaulated.
The solution is to enable REOPT ALWAYS for any stored procedure that runs a flexible, parameter-driven search.
The REOPT ALWAYS option will force the optimizer to analyze the input parameter values and come up with a new access plan every time the procedure is executed, instead of just once when the procedure is compiled. Although REOPT ALWAYS adds a few extra milliseconds of optimizer overhead for each and every execution of the stored procedure, that is most likely faster than continually reusing the one-size-fits-all access plan that the optimizer guessed at while initially compiling the stored procedure.
Related
I am a neophyte to creating stored procedures and functions and I just can't figure out why one of these versions runs so much faster than the other. This is a function that just returns a string with a description when called. The original function relies on supplying about 10 variables (Version running in about 4 seconds). I wanted to cut that down to a single variable (version running long).
The code below the declaration of the variables is identical, the only difference is that I'm attempting to pull the variables from the appropriate within the function itself rather than having to supply them on the query side.
i.e. dbo.cf_NoRateReason(V1) as ReasonCode
rather than
dbo.cf_NoRateReason(V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12)
I apologize up front if I am not supplying enough information, as I said, new to functions/stored procedures.
This version runs in about 2.5 minutes to run
declare #Agencyid int
declare #ServiceCode varchar(10)
declare #Mod1 varchar(2)=null
declare #Mod2 varchar(2)=null
declare #Mod3 varchar(2)=null
declare #Mod4 varchar(2)=null
declare #POS int
declare #ServiceDate datetime
declare #ProvType varchar(1)
declare #PayerID int
declare #BirthDate datetime
declare #RenderingStaffID int
declare #SupervisingStaffID int
Select #Agencyid=s.agencyid, #ServiceCode = ServiceCode,
#Mod1 = ModifierCodeId, #Mod2 = ModifierCodeId2,
#Mod3 = ModifierCodeId3, #Mod4 = ModifierCodeId4,
#POS=PlaceOfServiceId, #ServiceDate = ServiceDate,
#RenderingStaffId=isnull(dbo.GetProviderStaffId('S',s.ServiceTransactionId,'82'),0),
#SupervisingStaffId=isnull(dbo.GetProviderStaffId('C',ClaimId,'DQ'),0),
#ProvType=s.servicetype, #Payerid=pmt.payerid,
#BirthDate=i.birthdate
From ServiceTransaction s
join individual i on s.servicetransactionid = i.individualid
join pmtadjdetail pmt on s.servicetransactionid = pmt.servicetransactionid
declare #Result Varchar(100) = ''
declare #Age int = dbo.getageatservicedate(#birthdate, #ServiceDate)
declare #ModString varchar(8) = dbo.sortmodifiers(#Mod1, #Mod2, #Mod3, #Mod4)
declare #DirectSupervision int = (iif(#Mod1 in ('U1','U6','U7','U9','UA')
or #Mod2 in ('U1','U6','U7','U9','UA')
or #Mod3 in ('U1','U6','U7','U9','UA')
or #Mod4 in ('U1','U6','U7','U9','UA'),1,0))
'************************************************************************************'
'This version takes about 4 seconds to run'
'************************************************************************************'
begin
declare #Result Varchar(100) = ''
declare #Age int = dbo.getageatservicedate(#birthdate, #ServiceDate)
declare #RenderingStaffID int = dbo.getstaffid(#STID,'DQ')
declare #SupervisingStaffID int = dbo.getstaffid(#STID,'82')
declare #ModString varchar(8) = dbo.sortmodifiers(#Mod1, #Mod2, #Mod3, #Mod4)
declare #DirectSupervision int = (iif(#Mod1 in ('U1','U6','U7','U9','UA')
or #Mod2 in ('U1','U6','U7','U9','UA')
or #Mod3 in ('U1','U6','U7','U9','UA')
or #Mod4 in ('U1','U6','U7','U9','UA'),1,0))
This kind of falls under "typo" or simple oversight, but....
When you see that big of a performance difference, for no discernible reason (those functions were used in the original version as well), that is usually when you need to start look for these kinds of mistakes: typos, missing conditions, incorrect conditions from leaning too hard on intellisense/code-completion, etc...
When replacing multiple parameters with one that can used to retrieve the others automatically, always make sure to actually use that parameter.
The version you listed first has no filter (no WHERE clause) on the SELECT it uses to get the "parameter" values it is normally passed. You're effectively getting the entire join resultset, with the cost of the function calls for every result row, and only taking last result's values.
You are correct - the only difference is using the function. Please see similar questions where this has been addressed.
In short, functions are going to be performed on a row-by-row basis whereas code on the query side is going to have other options with no overhead calls to the function.
You may be able to use a scalar function with schema binding and nulls return nulls for better performance.
Additional consideration for the schema plan would be valuable. There are also joins and other embedded logics here that aren't clear without sample data.
I am hitting an application slowness during a load test for NHibernate LINQ query in .NET application when using LINQ ANY().
Column FileContent is VARCHAR(max).
bool hasIllustrations = CensusIllustration.Linq()
.Any(c => c.Participant.Census.Id == census.Id && c.FileContent != null);
Above query started taking around 1 min from code while SSMS executes in 1 sec. I took the generated SQL and is as below
DECLARE #p0 AS SQL_VARIANT;
SET #p0 = NULL;
select top 1 censusillu0_.CensusIllustration_Id as CensusIl1_5_,
censusillu0_.FileName as FileName5_, censusillu0_.ParticipantId as Particip3_5_,
censusillu0_.FileContent as FileCont4_5_,
censusillu0_.CensusParticipant_Id as CensusPa5_5_ from CensusIllustration censusillu0_,
CensusParticipant censuspart1_ where censusillu0_.CensusParticipant_Id=censuspart1_.CensusParticipant_Id and censuspart1_.Census_id=#p0 and (censusillu0_.FileContent is not null)
If I replace the code as below, it executes in 1 sec also from code
bool hasIllustrations2 = CensusIllustration.Linq().Where(c => c.Participant.Census.Id == census.Id && c.FileContent != null).Count() > 0;
Generated SQL for this is
--Type and value data was not available for the following variables. Their values have been set to defaults.
DECLARE #p0 AS SQL_VARIANT;
SET #p0 = NULL;
select cast(count(*) as INT) as col_0_0_ from CensusIllustration censusillu0_,
CensusParticipant censuspart1_ where
censusillu0_.CensusParticipant_Id=censuspart1_.CensusParticipant_Id and
censuspart1_.Census_id=#p0 and (censusillu0_.FileContent is not null)
I tried spending time studying the slowness of ANY() on big data columns and every post suggests ANY() and WHERE().COUNT()>0 or FirstOrdefault() won't have any difference.
Can someone help me understand why the 1st query takes around 1 min from code and 2nd one 1 sec from code
Is because a count always is speeder than a select with a top.
Unfortunately, I have two tables to compare float datatypes between. I've read up on trying casts, converts, using a small difference and tried them all.
The strange part is, this only fails when I'm executing a stored procedure. If I cut-and-paste the body of the stored procedure into a SSMS window, it works just great.
Sample SQL:
set #newEnvRiskLevel = -1
select
#newEnvRiskLevel = rl.RiskLevelId
from
LookupTypes lt
inner join
RiskLevels rl on lt.LookupTypeId = rl.RiskLevelTypeFk
where
lt.Code = 'RISK_LEVEL_ENVIRONMENTAL'
and convert(numeric(1, 0), rl.RiskFactor) = #newEnvScore
set #errorCode = ##ERROR
if (#newEnvRiskLevel = -1 or #errorCode != 0)
begin
print 'newEnvScore = ' + cast(#newEnvScore as varchar) + ' and risk level = ' + cast(isnull(#newEnvRiskLevel, -1) as varchar)
print 'ERROR finding environmental risk level for code ' + #itemCode + ', skipping record'
set #recordsErrored = #recordsErrored + 1
goto NEXTREC
end
My #newEnvScore variable is also a float converted to numeric(1, 0). I've verified that there are only 0, 1, 2, and 3 for values in the RiskFactor column, and (via debug) that #newEnvScore has a value of 2. I've also verified that my query has a row with code = 'RISK_LEVEL_ENVIRONMENTAL' and RiskFactor = 2.
I've verified via debug that failure is due to #newEnvRiskLevel staying at -1 and that #errorCode is 0.
I've also tried cast to both decimal and int, convert to int, and "rl.RiskFactor - #newEnvScore < 1" in my where clause, none of which set newEnvRiskLevel.
As I say, it's only when running this as a stored procedure that failure happens, which is the part I really don't understand. I'd expect SQL Server to be deterministic, whether the SQL is running the body of a stored procedure, or running the exact same SQL in a SSMS tab.
It is unfortunate that you do post neither your stored procedure nor a complete script. It is difficult to diagnose a problem without a useful demonstration. But I see the use of "goto" which is concerning in many ways. I also see the use of a select statement to assign a local variable - which is often a problem because the developer might be assuming an assignment always occurs. To demonstrate - with a bonus at the end
set nocount on;
declare #risk smallint;
declare #risklevels table (risklevel float primary key, code varchar(10));
insert #risklevels(risklevel, code) values (1, 'test'), (2, 'test'), (-5, 'test');
-- here is your assignment logic. Notice that #risk is
-- never changed because there are no matching rows.
set #risk = 0;
select #risk = risklevel from #risklevels where code = 'zork';
select #risk;
-- here is a better IMO way to make the assignment. Note that
-- #risk is set to NULL when there are no matching rows.
set #risk = -1;
set #risk = (select risklevel from #risklevels where code = 'zork');
select #risk;
-- and a last misconception. What value is #risk set to? and why?
set #risk = -1;
select #risk = risklevel from #risklevels where code = 'test';
select #risk;
Whether this is the source of your problem (or contributes to it) I can't say. But it is a possibility. And storing integers in a floating point datatype is just a problem generally. Even if you cannot change your table, you can change your local variables and force the use of a more appropriate datatype. So perhaps that is another change you should consider.
This is on Windows SQL Server Cluster.
Query is coming from 3rd party application so I can not modify the query permanently.
Query is:
DECLARE #FromBrCode INT = 1001
DECLARE #ToBrCode INT = 1637
DECLARE #Cdate DATE = '31-mar-2017'
SELECT
a.PrdCd, a.Name, SUM(b.Balance4) as Balance
FROM
D009021 a, D010014 b
WHERE
a.PrdCd = LTRIM(RTRIM(SUBSTRING(b.PrdAcctId, 1, 8)))
AND substring(b.PrdAcctId, 9, 24) = '000000000000000000000000'
AND a.LBrCode = b.LBrCode
AND a.LBrCode BETWEEN #FromBrCode AND #ToBrCode
AND b.CblDate = (SELECT MAX(c.CblDate)
FROM D010014 c
WHERE c.PrdAcctId = b.PrdAcctId
AND c.LBrCode = b.LBrCode
AND c.CblDate <= #Cdate)
GROUP BY
a.PrdCd, a.Name
HAVING
SUM(b.Balance4) <> 0
ORDER BY
a.PrdCd
This particular query is taking too much time to complete execution. The same problem happens on a different SQL Server.
No table lock was found, processor and memory usage is normal while the query is running.
Normal "select top 1000" working and showing output instantly in both tables (D009021, D010014)
Reindex and rebuild / update stats done in both tables but problem did not resolve (D009021, D010014)
The same query is working if we reduce number of branch but slowly
(
DECLARE #FromBrCode INT =1001
DECLARE #ToBrCode INT =1001
)
The same query is working faster giving output within 2 mins if we replace any one variable and use the value directly
AND a.LBrCode BETWEEN #FromBrCode AND #ToBrCode
changed to
AND a.LBrCode BETWEEN 1001 AND #ToBrCode
The same query is working faster and giving output within 2 mins if we add "OPTION (RECOMPILE)" at end
I tried to clean cache query execution plan and optimized new one but problem still exists
Found that the query estimate plan and actual execution plan are different (see screenshots)
Table D010014 is aliased twice once as b and once as c
the they are joined to the same table.
Try toto remove the sub query below and create a temp table to store
the values you need. I added * to the fields you self join
SELECT MAX(c.CblDate)
FROM D010014 c
WHERE c.PrdAcctId = b.PrdAcctId
AND c.LBrCode = b.LBrCode
AND c.CblDate <= #Cdate
if you cant do that then try
SELECT TOP 1 c.CblDate
FROM D010014 c
WHERE c.PrdAcctId = b.PrdAcctId
AND c.LBrCode = b.LBrCode
AND c.CblDate <= #Cdate
ORDER BY c.CblDate DESC
I have a LINQ to SQL query:
from at in Context.Transaction
select new {
at.Amount,
at.PostingDate,
Details =
from tb in at.TransactionDetail
select new {
Amount = tb.Amount,
Description = tb.Desc
}
}
This results in one SQL statement being executed. All is good.
However, if I attempt to return known types from this query, even if they have the same structure as the anonymous types, I get one SQL statement executed for the top level and then an additional SQL statement for each "child" set.
Is there any way to get LINQ to SQL to issue one SQL statement and use known types?
EDIT: I must have another issue. When I plugged a very simplistic (but still hieararchical) version of my query into LINQPad and used freshly created known types with just 2 or 3 members, I did get one SQL statement. I will post and update when I know more.
EDIT 2: This appears to be due to a bug in Take. See my answer below for details.
First - some reasoning for the Take bug.
If you just Take, the query translator just uses top. Top10 will not give the right answer if cardinality is broken by joining in a child collection. So the query translator doesn't join in the child collection (instead it requeries for the children).
If you Skip and Take, then the query translator kicks in with some RowNumber logic over the parent rows... these rownumbers let it take 10 parents, even if that's really 50 records due to each parent having 5 children.
If you Skip(0) and Take, Skip is removed as a non-operation by the translator - it's just like you never said Skip.
This is going to be a hard conceptual leap to from where you are (calling Skip and Take) to a "simple workaround". What we need to do - is force the translation to occur at a point where the translator can't remove Skip(0) as a non-operation. We need to call Skip, and supply the skipped number at a later point.
DataClasses1DataContext myDC = new DataClasses1DataContext();
//setting up log so we can see what's going on
myDC.Log = Console.Out;
//hierarchical query - not important
var query = myDC.Options.Select(option => new{
ID = option.ParentID,
Others = myDC.Options.Select(option2 => new{
ID = option2.ParentID
})
});
//request translation of the query! Important!
var compQuery = System.Data.Linq.CompiledQuery
.Compile<DataClasses1DataContext, int, int, System.Collections.IEnumerable>
( (dc, skip, take) => query.Skip(skip).Take(take) );
//now run the query and specify that 0 rows are to be skipped.
compQuery.Invoke(myDC, 0, 10);
This produces the following query:
SELECT [t1].[ParentID], [t2].[ParentID] AS [ParentID2], (
SELECT COUNT(*)
FROM [dbo].[Option] AS [t3]
) AS [value]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[ID]) AS [ROW_NUMBER], [t0].[ParentID]
FROM [dbo].[Option] AS [t0]
) AS [t1]
LEFT OUTER JOIN [dbo].[Option] AS [t2] ON 1=1
WHERE [t1].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p1 + #p2
ORDER BY [t1].[ROW_NUMBER], [t2].[ID]
-- #p0: Input Int (Size = 0; Prec = 0; Scale = 0) [0]
-- #p1: Input Int (Size = 0; Prec = 0; Scale = 0) [0]
-- #p2: Input Int (Size = 0; Prec = 0; Scale = 0) [10]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1
And here's where we win!
WHERE [t1].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p1 + #p2
I've now determined this is the result of a horrible bug. The anonymous versus known type turned out not to be the cause. The real cause is Take.
The following result in 1 SQL statement:
query.Skip(1).Take(10).ToList();
query.ToList();
However, the following exhibit the one sql statement per parent row problem.
query.Skip(0).Take(10).ToList();
query.Take(10).ToList();
Can anyone think of any simple workarounds for this?
EDIT: The only workaround I've come up with is to check to see if I'm on the first page (IE Skip(0)) and then make two calls, one with Take(1) and the other with Skip(1).Take(pageSize - 1) and addRange the lists together.
I've not had a chance to try this but given that the anonymous type isn't part of LINQ rather a C# construct I wonder if you could use:
from at in Context.Transaction
select new KnownType(
at.Amount,
at.PostingDate,
Details =
from tb in at.TransactionDetail
select KnownSubType(
Amount = tb.Amount,
Description = tb.Desc
)
}
Obviously Details would need to be an IEnumerable collection.
I could be miles wide on this but it might at least give you a new line of thought to pursue which can't hurt so please excuse my rambling.