I have a simple, uncorrelated subquery that performs very poorly on SQL Server. I'm not very experienced at reading execution plans, but it looks like the inner query is being executed once for every row in the outer query, even though the results are the same each time. What can I do to tell SQL Server to execute the inner query only once?
The query looks like this:
select *
from Record record0_
where record0_.RecordTypeFK='c2a0ffa5-d23b-11db-9ea3-000e7f30d6a2'
and (
record0_.EntityFK in (
select record1_.EntityFK
from Record record1_
join RecordTextValue textvalues2_ on record1_.PK=textvalues2_.RecordFK
and textvalues2_.FieldFK = '0d323c22-0ec2-11e0-a148-0018f3dde540'
and (textvalues2_.Value like 'O%' escape '~')
)
)
Analyze your SQL Statement and put it in the Database Engine Tuning Advisor (2005+) and see what indexes it suggests.
I don't think you're giving SQL Server enough credit on determining the best way to run a query. Make sure you have indexes on the fields in your joins and where clauses.
This may be an alternative to your query, but will probably run the same:
select record0_.*
from Record record0_
inner join Record record1_
on record0_.EntityFK = record1_.EntityFK
inner join RecordTextValue textvalues2_
on record1_.PK=textvalues2_.RecordFK
and textvalues2_.FieldFK = '0d323c22-0ec2-11e0-a148-0018f3dde540'
and (textvalues2_.Value like 'O%' escape '~')
where record0_.RecordTypeFK='c2a0ffa5-d23b-11db-9ea3-000e7f30d6a2'
You should be able to change this to a straight forward join, does that help:
select r.*
from Record r
join ( select record1_.EntityFK
from Record record1_
join RecordTextValue textvalues2_ on record1_.PK=textvalues2_.RecordFK
and textvalues2_.FieldFK = '0d323c22-0ec2-11e0-a148-0018f3dde540'
and (textvalues2_.Value like 'O%' escape '~')
) s on s.EntityFK = r.EntityFK
where r.RecordTypeFK='c2a0ffa5-d23b-11db-9ea3-000e7f30d6a2'
This looks a lot more sensible.. (but pretty much the same query)
select r.*
from Record r
join ( select ri.EntityFK
from Record ri
join RecordTextValue t on ri.PK=t.RecordFK
where
t.FieldFK = '0d323c22-0ec2-11e0-a148-0018f3dde540'
and t.Value like 'O%'
) s on s.EntityFK = r.EntityFK
where r.RecordTypeFK='c2a0ffa5-d23b-11db-9ea3-000e7f30d6a2'
Related
I am struggling with figuring out what is happening with the T-SQL query shown below.
You will see two inner joins to the same table, although with different join criteria. The first join by itself runs in approximately 21 seconds and if I run the second join by itself it completes in approximately 27 seconds.
If I leave both joins in place, the query runs and runs and runs, until I finally stop the query. The appropriate indices appear to be in place and I know this query runs in a different environment with less horsepower, the only difference being the other server is running SQL Server 2012 and I am running SQL Server 2016, although the database is in 2012 compatibility mode:
This join runs in ~21 seconds.
SELECT
COUNT(*)
FROM
dbo.SPONSORSHIP as s
INNER JOIN
dbo.SPONSORSHIPTRANSACTION AS st
ON st.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
AND st.TRANSACTIONSEQUENCE = (SELECT MIN(TRANSACTIONSEQUENCE)
FROM dbo.SPONSORSHIPTRANSACTION AS ms
WHERE ms.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
AND ms.TARGETSPONSORSHIPID = s.ID)
This join runs in ~27 seconds.
SELECT
COUNT(*)
FROM
dbo.SPONSORSHIP AS s
INNER JOIN
dbo.SPONSORSHIPTRANSACTION AS lt ON lt.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
AND lt.TRANSACTIONSEQUENCE = (SELECT MAX(TRANSACTIONSEQUENCE)
FROM dbo.SPONSORSHIPTRANSACTION AS ms
WHERE ms.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
AND s.ID IN (ms.CONTEXTSPONSORSHIPID,
ms.TARGETSPONSORSHIPID,
ms.DECLINEDSPONSORSHIPID)
AND ms.ACTIONCODE <> 9)
These are both considered correlated subqueries. You should typically avoid this pattern, as it causes what is known as "RBAR"... which is "Row by Agonizing Row". Before you focus on troubleshooting this particular query, I'd suggest revisiting the query itself and see if you can solve this in a more set based approach. You'll find that in most cases you have other ways to accomplish this and cut cost down dramatically.
As one example:
select
total_count
,row_sequence
from
(
SELECT
total_count = COUNT(*)
,row_sequence = row_number() over(order by st.TRANSACTIONSEQUENCE asc)
FROM
dbo.SPONSORSHIP as s
INNER JOIN dbo.SPONSORSHIPTRANSACTION AS st
ON st.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
) as x
where
x.row_sequence = 1
This was a quick example that is not tested. For future reference, if you want the best answer, it's a great idea to generate a temp table or test data set that's able to be used so someone can provide a full working example.
The example I gave shows what is called a windowing function. Take a look more into them for helping with selecting results when you see the word sequence, need the the first/last in a group and more.
Hope this gives you some ideas! Welcome to Stack Overflow! 👋
I have a SQL statement that has a weird 2nd nested SQL statement that I think is causing this query to run for 6+ min and any suggestions/help would be appreciated. I tried creating a TEMP table for the values in the nested SQL statement and just do a simple join but there is nothing to join on in the SQL code so that is why they used a 1=1 in the ON statement for the join. Here is the SQL code:
Declare #TransactionEndDate datetime;
Select #TransactionEndDate = lastmonth_end from dbo.DTE_udfCommonDates(GETDATE());
Select ''''+TreatyName as Treaty,
cast(EndOfMonth as Date) as asOfDate,
Count(Distinct ClaimSysID) as ClaimCount,
Count(Distinct FeatureSysID) as FeatureCount,
Sum(OpenReserve) as OpenReserve
From (
Select
TreatyName,
EndOfMonth,
dbo.CMS_Claims.ClaimSysID,
FeatureSysID,
sum(IW_glGeneralLedger.TransactionAmount)*-1 as OpenReserve
From dbo.CMS_Claims
Inner Join dbo.CMS_Claimants
On dbo.CMS_Claims.ClaimSysID = dbo.CMS_Claimants.ClaimSysID
Inner Join dbo.CMS_Features
On dbo.CMS_Features.ClaimantSysID = dbo.CMS_Claimants.ClaimantSysID
Left Join dbo.IW_glGeneralLedger
On IW_glGeneralLedger.FeatureID = dbo.CMS_Features.FeatureSysID
Left Join dbo.IW_glSubChildAccount
On dbo.IW_glSubChildAccount.glSubChildAccountID = dbo.IW_glGeneralLedger.glSubChildAccountSysID
Left Join dbo.IW_glAccountGroup
On dbo.IW_glAccountGroup.glAccountGroupID = dbo.IW_glSubChildAccount.glAccountGroupSysID
Left Join dbo.IW_BankRegister
On dbo.IW_BankRegister.BankRegisterSysID = dbo.IW_glGeneralLedger.BankRegisterID
Left Join dbo.IW_BankRegisterStatus
On dbo.IW_BankRegisterStatus.BankRegisterStatusSysID = dbo.IW_BankRegister.BankRegisterStatusID
**Left Join (Select Distinct dbo.DTE_get_month_end(dt) as EndOfMonth
From IW_Calendar
Where dt Between '3/1/2004'
and #TransactionEndDate) as dates
on 1=1**
Left Join dbo.IW_ReinsuranceTreaty
On dbo.IW_ReinsuranceTreaty.TreatySysID = IW_glGeneralLedger.PolicyTreatyID
Where dbo.IW_glGeneralLedger.TransactionDate Between '1/1/2004 00:00:00' And EndOfMonth
And dbo.IW_glAccountGroup.Code In ('RESERVEINDEMNITY')
And (
(dbo.IW_glGeneralLedger.BankRegisterID Is Null)
Or (
(IW_BankRegister.PrintedDate Between '1/1/2004 00:00:00' And EndOfMonth Or dbo.IW_glGeneralLedger.BankRegisterID = 0)
And
(dbo.IW_BankRegisterStatus.EnumValue In ('Approved','Outstanding','Cleared','Void') Or dbo.IW_glGeneralLedger.BankRegisterID = 0))
)
Group By TreatyName, dbo.CMS_Claims.ClaimSysID, FeatureSysID, EndOfMonth
Having sum(IW_glGeneralLedger.TransactionAmount) <> 0
) As Data
Group By TreatyName,EndOfMonth
Order By EndOfMonth, TreatyName
This nested SQL code only provides a table of End of Month values in one column called EndOfMonth and this is what I'm trying to fix:
Select Distinct dbo.DTE_get_month_end(dt) as EndOfMonth
From IW_Calendar
Where dt Between '3/1/2004'
and #TransactionEndDate
Please use the below methods to increase the query performance.
Use temporary tables. ( load relevant data into temporary tables with necessary where conditions and then join).
Use clustered and non clustered indexes to your tables.
Create Multiple-Column Indexes.
Index the ORDER-BY / GROUP-BY / DISTINCT Columns for Better Response Time.
Use Parameterized Queries.
Use query hints accordingly.
NOLOCK: In the event that data is locked, this tells SQL Server to read data from the last known value available, also known as a dirty read. Since it is possible to use some old values and some new values, data sets can contain inconsistencies. Do not use this in any place in which data quality is important.
RECOMPILE: Adding this to the end of a query will result in a new execution plan being generated each time this query executed. This should not be used on a query that is executed often, as the cost to optimize a query is not trivial. For infrequent reports or processes, though, this can be an effective way to avoid undesired plan reuse. This is often used as a bandage when statistics are out of date or parameter sniffing is occurring.
MERGE/HASH/LOOP: This tells the query optimizer to use a specific type of join as part of a join operation. This is super-risky as the optimal join will change as data, schema, and parameters evolve over time. While this may fix a problem right now, it will introduce an element of technical debt that will remain for as long as the hint does.
OPTIMIZE FOR: Can specify a parameter value to optimize the query for. This is often used when we want performance to be controlled for a very common use case so that outliers do not pollute the plan cache. Similar to join hints, this is fragile and when business logic changes, this hint usage may become obsolete.
I have a query:
SELECT TOP 1 * FROM vwTimeBlocks
WHERE vwTimeBlocks.id = #id
AND vwTimeBlocks.organization = #org
organization is indexed in some of the tables within the view. I expected that the WHERE would be evaluated on the underlying tables in the view before any JOINs, but this isn't the case:
The execution plan suggests that the WHERE is being applied on the result of vwBlocks. Replacing the vwBlocks definition gives me this:
SELECT TOP 1 * FROM
( SELECT ... all the things...
FROM tblTimeBlocks
LEFT OUTER JOIN tblA
ON tblA.id = tblTimeBlocks.F
AND tblTimeBlocks.B = 1000
LEFT OUTER JOIN tblB
ON tblB.id = tblTimeBlocks.F
AND tblTimeBlocks.B = 2000
) AS subTimeBlocks
WHERE subTimeBlocks.id = #timeblock AND subTimeBlocks.organization = #org;
Comparing the execution plan of this against the original query shows they're identical. However, when I place the WHERE clause inside the subquery:
SELECT TOP 1 * FROM
( SELECT ... all the things...
FROM tblTimeBlocks
LEFT OUTER JOIN tblA
ON tblA.id = tblTimeBlocks.F
AND tblTimeBlocks.B = 1000
LEFT OUTER JOIN tblB
ON tblB.id = tblTimeBlocks.F
AND tblTimeBlocks.B = 2000
WHERE subTimeBlocks.id = #timeblock AND subTimeBlocks.organization = #org
) AS subTimeBlocks
The overall query cost drops, and the execution plan for the new query is much better; here's a comparison, with the top execution plan showing the WHERE outside the subquery, and the bottom plan with the WHERE inside the subquery.
Note that the relative costs of the queries is at 91%-9%, and the first filters on organization quite late, while the second uses two seeks.
Why is this the case? I expected the query optimizer to optimize better than it did.
Given this situation, I'd obviously like to use the second query, and somehow push the predicate into the view portion of the query. Because the predicate relies on a parameter, I can't just alter the view definition. One option is to replace the view with a table-defined function, but this situation is replicated all over my database. Is there any other way?
Is there any way I can hint, or otherwise ask sqlserver to use the predicate in my view? I don't know why the query optimizer doesn't look at the query and push the WHERE as early as possible.
In case it's relevant, this is running on:
Microsoft SQL Azure (RTM) - 12.0.2000.8
The idea of the below query is to use the CTE to get the primary key of all rows in [Archive].[tia_tia_object] that meet the filter.
The execution time for the query within the CTE is 0 seconds.
The second part is supposed to do joins on other tables, to filter the data some more, but only if there are any rows returned in the CTE. This was the only way I could get the SQL server to use the correct indexes.
Why does it spend time (see execution plan) looking in TIA_TIA_AGREEMENT_LINE and TIA_TIA_OBJECT, when CTE returns 0 rows?
WITH cte_vehicle
AS (SELECT O.[Seq_no],
O.Object_No
FROM [Archive].[tia_tia_object] O
WHERE O.RECORD_TIMESTAMP >
(SELECT LastLoadTimeStamp FROM staging.Ufngetlastloadtimestamp('Staging.CoveredObject'))
AND O.[Meta_iscurrent] = 1
AND O.OBJECT_TYPE IN ( 'BIO01', 'CAO01', 'DKV', 'GFO01',
'KMA', 'KNO01', 'MCO01', 'VEO01',
'SVO01', 'AUO01' ))
SELECT O.[Seq_no] AS [Bkey_CoveredObject],
Cast(O.[Agr_Line_No] AS BIGINT) AS [Agr_Line_No],
O.[Cover_Start_Date] AS [CoverageFrom],
O.[Cover_End_Date] AS [CoverageTo],
O.[Timestamp] AS [TIMESTAMP],
O.[Record_Timestamp] AS [RECORD_TIMESTAMP],
O.[Newest] AS [Newest],
O.LOCATION_ID AS LocationNo,
O.[Cust_no],
O.[N01]
FROM cte_vehicle AS T
INNER JOIN [Archive].[tia_tia_object] O
ON t.Object_No = O.Object_No
AND t.Seq_No = O.Seq_No
INNER JOIN [Archive].[tia_tia_agreement_line] AL
ON O.Agr_line_no = AL.Agr_line_no
INNER JOIN [Archive].[tia_tia_policy] P
ON AL.Policy_no = P.Policy_no
WHERE P.[Transaction_type] <> 'D'
Execution plan:
Because it still needs to check and look for records. Even if there are no records in that table, it doesn't know that until it actually checks.
Much like if someone gives you a sealed box, you don't know it's empty or not till you open it.
I have a SQL Server 2005 database that is linked to an Oracle database. What I want to do is run a query to pull some ID numbers out of it, then find out which ones are in Oracle.
So I want to take the results of this query:
SELECT pidm
FROM sql_server_table
And do something like this to query the Oracle database (assuming that the results of the previous query are stored in #pidms):
OPENQUERY(oracledb,
'
SELECT pidm
FROM table
WHERE pidm IN (' +
#pidms + ')')
GO
But I'm having trouble thinking of a good way to do this. I suppose that I could do an inner join of queries similar to these two. Unfortunately, there are a lot of records to pull within a limited timeframe so I don't think that will be a very performant option to choose.
Any suggestions? I'd ideally like to do this with as little Dynamic SQL as possible.
Ahhhh, pidms. Brings back bad memories! :)
You could do the join, but you would do it like this:
select sql.pidm,sql.field2 from sqltable as sql
inner join
(select pidm,field2 from oracledb..schema.table) as orcl
on
sql.pidm = orcl.pidm
I'm not sure if you could write a PL/SQL procedure that would take a table variable from sql...but maybe.....no, I doubt it.
Store openquery results in a temp table, then do an inner join between the SQL table and the temp table.
I don't think you can do a join since OPENQUERY requires a pure string (as you wrote above).
BG: Actually JOIN IN SQLServer to Oracle by OpenQuery works, avoiding #tmp table and allowing JOIN to SQL without Param* - ex.
[SQL SP] LEFT JOIN OPENQUERY(ORADB,
'SELECT COUNT(distinct O.ORD_NUM) LCNT,
O.ORD_MAIN_NUM
FROM CUSTOMER.CUST_FILE C
JOIN CUSTOMER.ORDER_NEW O
ON C.ID = O.ORD_ID
WHERE C.CUS_ID NOT IN (''2'',''3'')
GROUP BY O.ORD_MAIN_MACNUM') LC
ON T.ID = LC.ORD_MAIN_ID*
Cheers, Bill Gibbs