C# Performance of Linq.Any() vs Linq.Where(..).count()>0

C# Performance of Linq.Any() vs Linq.Where(..).count()>0 - sql-server

I am hitting an application slowness during a load test for NHibernate LINQ query in .NET application when using LINQ ANY().
Column FileContent is VARCHAR(max).
bool hasIllustrations = CensusIllustration.Linq()
.Any(c => c.Participant.Census.Id == census.Id && c.FileContent != null);
Above query started taking around 1 min from code while SSMS executes in 1 sec. I took the generated SQL and is as below
DECLARE #p0 AS SQL_VARIANT;
SET #p0 = NULL;
select top 1 censusillu0_.CensusIllustration_Id as CensusIl1_5_,
censusillu0_.FileName as FileName5_, censusillu0_.ParticipantId as Particip3_5_,
censusillu0_.FileContent as FileCont4_5_,
censusillu0_.CensusParticipant_Id as CensusPa5_5_ from CensusIllustration censusillu0_,
CensusParticipant censuspart1_ where censusillu0_.CensusParticipant_Id=censuspart1_.CensusParticipant_Id and censuspart1_.Census_id=#p0 and (censusillu0_.FileContent is not null)
If I replace the code as below, it executes in 1 sec also from code
bool hasIllustrations2 = CensusIllustration.Linq().Where(c => c.Participant.Census.Id == census.Id && c.FileContent != null).Count() > 0;
Generated SQL for this is
--Type and value data was not available for the following variables. Their values have been set to defaults.
DECLARE #p0 AS SQL_VARIANT;
SET #p0 = NULL;
select cast(count(*) as INT) as col_0_0_ from CensusIllustration censusillu0_,
CensusParticipant censuspart1_ where
censusillu0_.CensusParticipant_Id=censuspart1_.CensusParticipant_Id and
censuspart1_.Census_id=#p0 and (censusillu0_.FileContent is not null)
I tried spending time studying the slowness of ANY() on big data columns and every post suggests ANY() and WHERE().COUNT()>0 or FirstOrdefault() won't have any difference.
Can someone help me understand why the 1st query takes around 1 min from code and 2nd one 1 sec from code

Is because a count always is speeder than a select with a top.

Related

Microsoft SQL Server: wrong query execution plan taking too long

This is on Windows SQL Server Cluster.
Query is coming from 3rd party application so I can not modify the query permanently.
Query is:
DECLARE #FromBrCode INT = 1001
DECLARE #ToBrCode INT = 1637
DECLARE #Cdate DATE = '31-mar-2017'
SELECT
a.PrdCd, a.Name, SUM(b.Balance4) as Balance
FROM
D009021 a, D010014 b
WHERE
a.PrdCd = LTRIM(RTRIM(SUBSTRING(b.PrdAcctId, 1, 8)))
AND substring(b.PrdAcctId, 9, 24) = '000000000000000000000000'
AND a.LBrCode = b.LBrCode
AND a.LBrCode BETWEEN #FromBrCode AND #ToBrCode
AND b.CblDate = (SELECT MAX(c.CblDate)
FROM D010014 c
WHERE c.PrdAcctId = b.PrdAcctId
AND c.LBrCode = b.LBrCode
AND c.CblDate <= #Cdate)
GROUP BY
a.PrdCd, a.Name
HAVING
SUM(b.Balance4) <> 0
ORDER BY
a.PrdCd
This particular query is taking too much time to complete execution. The same problem happens on a different SQL Server.
No table lock was found, processor and memory usage is normal while the query is running.
Normal "select top 1000" working and showing output instantly in both tables (D009021, D010014)
Reindex and rebuild / update stats done in both tables but problem did not resolve (D009021, D010014)
The same query is working if we reduce number of branch but slowly
(
DECLARE #FromBrCode INT =1001
DECLARE #ToBrCode INT =1001
)
The same query is working faster giving output within 2 mins if we replace any one variable and use the value directly
AND a.LBrCode BETWEEN #FromBrCode AND #ToBrCode
changed to
AND a.LBrCode BETWEEN 1001 AND #ToBrCode
The same query is working faster and giving output within 2 mins if we add "OPTION (RECOMPILE)" at end
I tried to clean cache query execution plan and optimized new one but problem still exists
Found that the query estimate plan and actual execution plan are different (see screenshots)

Table D010014 is aliased twice once as b and once as c
the they are joined to the same table.
Try toto remove the sub query below and create a temp table to store
the values you need. I added * to the fields you self join
SELECT MAX(c.CblDate)
FROM D010014 c
WHERE c.PrdAcctId = b.PrdAcctId
AND c.LBrCode = b.LBrCode
AND c.CblDate <= #Cdate
if you cant do that then try
SELECT TOP 1 c.CblDate
FROM D010014 c
WHERE c.PrdAcctId = b.PrdAcctId
AND c.LBrCode = b.LBrCode
AND c.CblDate <= #Cdate
ORDER BY c.CblDate DESC

For loop cursor in teradata

In my Teradata Stored Procedure, I want to have a for loop cursor against a dynamic sql.
Below is the code snippet
SET get_exclude_condition = '';
SET colum_id = 'SELECT MIN (parent_criteria_id) ,MAX (parent_criteria_id) FROM arc_mdm_tbls.intnl_mtch_criteria WHERE act_ind = 1 AND criteria_typ = ''Exclude'' AND mtch_technique_id ='||mtch_technique_id||';' ;
PREPARE input_stmt FROM colum_id;
OPEN flex_cursor;
FETCH flex_cursor INTO parent_criteria_id_min , parent_criteria_id_max ;
CLOSE flex_cursor;
SET get_exclude_condition = '';
WHILE (parent_criteria_id_min <= parent_criteria_id_max)
DO
SET get_exclude_condition = get_exclude_condition || '( ';
SET for_loop_stmt = 'SELECT criteria FROM arc_mdm_tbls.intnl_mtch_criteria WHERE act_ind = 1 AND mtch_technique_id ='||mtch_technique_id||' AND criteria_typ= ''Exclude'' AND parent_criteria_id ='||parent_criteria_id_min||';';
FOR for_loop_rule AS c_cursor_rule CURSOR FOR
for_loop_stmt
DO
Can I declare a for loop cursor like this ?
Or do I need to have something like this only ?
FOR for_loop_rule AS c_cursor_rule CURSOR FOR
SELECT rule_id
FROM arc_stage_tbls.assmt_scoring_rules
WHERE rule_typ = :v_RuleType
ORDER BY rule_id
DO
I mean can I first frame the dynamic sql and then have a for loop cursor on top of that or with the cursor declaration only I need to have a static sql query ?
Please clarify.

While you haven't posted everything that the stored procedure is trying to accomplish, it does appear that what you are asking can be accomplished using SET based logic and not looping through a cursor. If you need to parameterize the 'mtch_technique_id' you can use a Teradata macro which will allow you to maintain a SET based approach.
Here is the SQL for creating a macro that returns a result set based on my interpretation of what your snippet of the stored procedure is trying to accomplish:
REPLACE MACRO {MyDB}.Intnl_Mtch_Criteria(mtch_technique_id INTEGER) AS
(
SELECT criteria
FROM arc_mdm_tbls.intnl_mtch_criteria
WHERE act_ind = 1
AND (much_technique_id, criteria_typ) IN
(SELECT MIN((parent_criteria_id), MAX (parent_criteria_id)
FROM arc_mdm_tbls.intnl_mtch_criteria
WHERE act_ind = 1
AND criteria_typ = 'Exclude'
AND mtch_technique_id = :mtch_technique_id;
);

Cannot understand how will Entity Framewrok generate a SQL statement for an Update operation using timestamp?

I have the following method inside my asp.net mvc web application :
var rack = IT.ITRacks.Where(a => !a.Technology.IsDeleted && a.Technology.IsCompleted);
foreach (var r in rack)
{
long? it360id = technology[r.ITRackID];
if (it360resource.ContainsKey(it360id.Value))
{
long? CurrentIT360siteid = it360resource[it360id.Value];
if (CurrentIT360siteid != r.IT360SiteID)
{
r.IT360SiteID = CurrentIT360siteid.Value;
IT.Entry(r).State = EntityState.Modified;
count = count + 1;
}
}
IT.SaveChanges();
}
When I checked SQL Server profiler I noted that EF will generated the following SQL statement:
exec sp_executesql N'update [dbo].[ITSwitches]
set [ModelID] = #0, [Spec] = null, [RackID] = #1, [ConsoleServerID] = null, [Description] = null, [IT360SiteID] = #2, [ConsoleServerPort] = null
where (([SwitchID] = #3) and ([timestamp] = #4))
select [timestamp]
from [dbo].[ITSwitches]
where ##ROWCOUNT > 0 and [SwitchID] = #3',N'#0 int,#1 int,#2 bigint,#3 int,#4 binary(8)',#0=1,#1=539,#2=1502,#3=1484,#4=0x00000000000EDCB2
I can not understand the purpose of having the following section :-
select [timestamp]
from [dbo].[ITSwitches]
where ##ROWCOUNT > 0 and [SwitchID] = #3',N'#0 int,#1 int,#2 bigint,#3 int,#4 binary(8)',#0=1,#1=539,#2=1502,#3=1484,#4=0x00000000000EDCB2
Can anyone advice?

Entity Framework uses timestamps to check whether a row has changed. If the row has changed since the last time EF retrieved it, then it knows it has a concurrency problem.
Here's an explanation:
http://www.remondo.net/entity-framework-concurrency-checking-with-timestamp/

This is because EF (and you) want to update the updated client-side object by the newly generated rowversion value.
First the update is executed. If this succeeds (because the rowversion is still the one you had in the client) a new rowversion is generated by the database and EF retrieves that value. Suppose you'd immediately want to make a second update. That would be impossible if you didn't have the new rowversion.
This happens with all properties that are marked as identity or computed (by DatabaseGenertedOption).

Stored Proc performance slow

select * from FOO.MBR_DETAILS where BAR= 'BAZ' and MBR_No = '123'
execution time = 0.25 seconds
CREATE PROCEDURE My.MEMBER_SEARCH
(
i_BAR varchar(3),
i_member_surname varchar(50),
i_member_code varchar(10),
i_member_given_name varchar(50)
)
RESULT SETS 1
LANGUAGE SQL
BEGIN
DECLARE c1 cursor with return for
select *
FROM FOO.MBR_DETAILS m
WHERE
BAR= i_BAR
and (i_member_code = '' or m.MBR_No = i_member_code)
and (i_member_surname = '' or m.surname = i_member_surname)
and (i_member_given_name = '' or m.given_names LIKE '%'||i_member_given_name||'%');
OPEN c1;
END
call My.MEMBER_SEARCH('BAZ','','123','')
execution time = 1.9 seconds
I thought both queries should have a similar time as i_member_surname and i_member_given_name are both empty they would not be evaulated.

The solution is to enable REOPT ALWAYS for any stored procedure that runs a flexible, parameter-driven search.
The REOPT ALWAYS option will force the optimizer to analyze the input parameter values and come up with a new access plan every time the procedure is executed, instead of just once when the procedure is compiled. Although REOPT ALWAYS adds a few extra milliseconds of optimizer overhead for each and every execution of the stored procedure, that is most likely faster than continually reusing the one-size-fits-all access plan that the optimizer guessed at while initially compiling the stored procedure.

LINQ to SQL Take w/o Skip Causes Multiple SQL Statements

I have a LINQ to SQL query:
from at in Context.Transaction
select new {
at.Amount,
at.PostingDate,
Details =
from tb in at.TransactionDetail
select new {
Amount = tb.Amount,
Description = tb.Desc
}
}
This results in one SQL statement being executed. All is good.
However, if I attempt to return known types from this query, even if they have the same structure as the anonymous types, I get one SQL statement executed for the top level and then an additional SQL statement for each "child" set.
Is there any way to get LINQ to SQL to issue one SQL statement and use known types?
EDIT: I must have another issue. When I plugged a very simplistic (but still hieararchical) version of my query into LINQPad and used freshly created known types with just 2 or 3 members, I did get one SQL statement. I will post and update when I know more.
EDIT 2: This appears to be due to a bug in Take. See my answer below for details.

First - some reasoning for the Take bug.
If you just Take, the query translator just uses top. Top10 will not give the right answer if cardinality is broken by joining in a child collection. So the query translator doesn't join in the child collection (instead it requeries for the children).
If you Skip and Take, then the query translator kicks in with some RowNumber logic over the parent rows... these rownumbers let it take 10 parents, even if that's really 50 records due to each parent having 5 children.
If you Skip(0) and Take, Skip is removed as a non-operation by the translator - it's just like you never said Skip.
This is going to be a hard conceptual leap to from where you are (calling Skip and Take) to a "simple workaround". What we need to do - is force the translation to occur at a point where the translator can't remove Skip(0) as a non-operation. We need to call Skip, and supply the skipped number at a later point.
DataClasses1DataContext myDC = new DataClasses1DataContext();
//setting up log so we can see what's going on
myDC.Log = Console.Out;
//hierarchical query - not important
var query = myDC.Options.Select(option => new{
ID = option.ParentID,
Others = myDC.Options.Select(option2 => new{
ID = option2.ParentID
})
});
//request translation of the query! Important!
var compQuery = System.Data.Linq.CompiledQuery
.Compile<DataClasses1DataContext, int, int, System.Collections.IEnumerable>
( (dc, skip, take) => query.Skip(skip).Take(take) );
//now run the query and specify that 0 rows are to be skipped.
compQuery.Invoke(myDC, 0, 10);
This produces the following query:
SELECT [t1].[ParentID], [t2].[ParentID] AS [ParentID2], (
SELECT COUNT(*)
FROM [dbo].[Option] AS [t3]
) AS [value]
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY [t0].[ID]) AS [ROW_NUMBER], [t0].[ParentID]
FROM [dbo].[Option] AS [t0]
) AS [t1]
LEFT OUTER JOIN [dbo].[Option] AS [t2] ON 1=1
WHERE [t1].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p1 + #p2
ORDER BY [t1].[ROW_NUMBER], [t2].[ID]
-- #p0: Input Int (Size = 0; Prec = 0; Scale = 0) [0]
-- #p1: Input Int (Size = 0; Prec = 0; Scale = 0) [0]
-- #p2: Input Int (Size = 0; Prec = 0; Scale = 0) [10]
-- Context: SqlProvider(Sql2005) Model: AttributedMetaModel Build: 3.5.30729.1
And here's where we win!
WHERE [t1].[ROW_NUMBER] BETWEEN #p0 + 1 AND #p1 + #p2

I've now determined this is the result of a horrible bug. The anonymous versus known type turned out not to be the cause. The real cause is Take.
The following result in 1 SQL statement:
query.Skip(1).Take(10).ToList();
query.ToList();
However, the following exhibit the one sql statement per parent row problem.
query.Skip(0).Take(10).ToList();
query.Take(10).ToList();
Can anyone think of any simple workarounds for this?
EDIT: The only workaround I've come up with is to check to see if I'm on the first page (IE Skip(0)) and then make two calls, one with Take(1) and the other with Skip(1).Take(pageSize - 1) and addRange the lists together.

I've not had a chance to try this but given that the anonymous type isn't part of LINQ rather a C# construct I wonder if you could use:
from at in Context.Transaction
select new KnownType(
at.Amount,
at.PostingDate,
Details =
from tb in at.TransactionDetail
select KnownSubType(
Amount = tb.Amount,
Description = tb.Desc
)
}
Obviously Details would need to be an IEnumerable collection.
I could be miles wide on this but it might at least give you a new line of thought to pursue which can't hurt so please excuse my rambling.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

C# Performance of Linq.Any() vs Linq.Where(..).count()>0 - sql-server

Is because a count always is speeder than a select with a top.

Related

Microsoft SQL Server: wrong query execution plan taking too long

For loop cursor in teradata

Cannot understand how will Entity Framewrok generate a SQL statement for an Update operation using timestamp?

Stored Proc performance slow

LINQ to SQL Take w/o Skip Causes Multiple SQL Statements

Categories

Resources