I am experiencing a strange performance issue. I have a view based on a CTE. It's a view that I wrote years ago, and it has been running without issue. Suddenly, 4 days ago, the query that ran in 1 - 2 minutes, ran for hours before we identified the long running query and halted it.
The CTE produces a time-stamped list of transactions that an agent performs. I then Select from the CTE, left joining back to the CTE using the timestamp of the subsequent transaction to determine the length of time an agent spend on each transaction.
WITH [CTE_TABLE] (COLUMNS) AS
(
SELECT [INDEXED COLUMNS]
,[WINDOWED FUNCTION] AS ROWNUM
FROM [DB_TABLE]
WHERE [EMPLOYEE_ID] = 111213
)
SELECT [T1].[EMPLOYEE_ID]
,[T1].[TRANSACTION_NAME]
,[T1].[TIMESTAMP] AS [START_TIME]
,[T2].[TIMESTAMP] AS [END_TIME]
FROM [CTE_TABLE] [T1]
LEFT OUTER JOIN [CTE_TABLE] [T2] ON
(
[T1].[EMPLOYEE_ID] = [T2].[EMPLOYEE_ID]
AND [T1].[ROWNUM] = [T2].[ROWNUM] + 1
)
In testing I filter for a specific agent. If it run the inner portion of the CTE it produces 500 records in less than a second. (When not filtering for a single agent, it produces 95K records in 14 seconds. This is the normal running timeframe.) If I run the CTE with a simple SELECT * FROM [CTE_TABLE], it also runs in less than a second. When I run it using an INNER JOIN back to itself, again, runs in less than a second. Finally, when I run it as a LEFT OUTER JOIN it takes over a minute and a half just for the 500 records of a single agent. I need the LEFT OUTER JOIN because the final record of the day is the agent's log-off the system, and it never has a record to join to.
The table that I pull from is over 22GB in size, and has 500 Million rows. Selecting the records from this table for a single day takes 14 seconds, or a single agent in less than a second, so I don't think the performance bottleneck comes from the source table. The bottleneck occurs in the LEFT OUTER JOIN back to the CTE, but I have always had the LEFT OUTER JOIN. Again, the very strange aspect is that this only began 4 days ago. I have checked space on the server, there is plenty. The CPU spikes to approx. 25% and remains there until the query ends running, either on its own, or halted by an admin.
I am hoping someone has some ideas as to what could have caused this. It appears to have cropped up from nowhere.
Again, the very strange aspect is that this only began 4 days ago
I recommend updating statistics on the tables involved and also try rebuilding indexes
The bottleneck occurs in the LEFT OUTER JOIN back to the CTE
CTE will not have any statistics,i would recommend materalizing the CTE into a Temp table to see if this helps
Related
I have two tables: Client and Transaction (they can be seen as an example in this db-fiddle). A Client may have thousands of transactions.
I'm creating a query to get a list of clients and their last transaction to know which ones are inactive (eg which have not made transactions in the last 30/90/180 days). And for that I'm using this query:
SELECT C.*, T.[CreationDate] AS LastTransactionDate FROM Client AS C
OUTER APPLY (
SELECT TOP 1 T.CreationDate
FROM T AS T
WHERE T.ClientId = C.ClientId
ORDER BY T.CreationDate DESC
) AS T;
And it works very well, but as the data grows so does the query delay. I've tested it on a table with approximately 50 million transactions and it took about 1 minute. What strategy can I adopt here to improve this performance?
I am struggling with figuring out what is happening with the T-SQL query shown below.
You will see two inner joins to the same table, although with different join criteria. The first join by itself runs in approximately 21 seconds and if I run the second join by itself it completes in approximately 27 seconds.
If I leave both joins in place, the query runs and runs and runs, until I finally stop the query. The appropriate indices appear to be in place and I know this query runs in a different environment with less horsepower, the only difference being the other server is running SQL Server 2012 and I am running SQL Server 2016, although the database is in 2012 compatibility mode:
This join runs in ~21 seconds.
SELECT
COUNT(*)
FROM
dbo.SPONSORSHIP as s
INNER JOIN
dbo.SPONSORSHIPTRANSACTION AS st
ON st.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
AND st.TRANSACTIONSEQUENCE = (SELECT MIN(TRANSACTIONSEQUENCE)
FROM dbo.SPONSORSHIPTRANSACTION AS ms
WHERE ms.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
AND ms.TARGETSPONSORSHIPID = s.ID)
This join runs in ~27 seconds.
SELECT
COUNT(*)
FROM
dbo.SPONSORSHIP AS s
INNER JOIN
dbo.SPONSORSHIPTRANSACTION AS lt ON lt.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
AND lt.TRANSACTIONSEQUENCE = (SELECT MAX(TRANSACTIONSEQUENCE)
FROM dbo.SPONSORSHIPTRANSACTION AS ms
WHERE ms.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
AND s.ID IN (ms.CONTEXTSPONSORSHIPID,
ms.TARGETSPONSORSHIPID,
ms.DECLINEDSPONSORSHIPID)
AND ms.ACTIONCODE <> 9)
These are both considered correlated subqueries. You should typically avoid this pattern, as it causes what is known as "RBAR"... which is "Row by Agonizing Row". Before you focus on troubleshooting this particular query, I'd suggest revisiting the query itself and see if you can solve this in a more set based approach. You'll find that in most cases you have other ways to accomplish this and cut cost down dramatically.
As one example:
select
total_count
,row_sequence
from
(
SELECT
total_count = COUNT(*)
,row_sequence = row_number() over(order by st.TRANSACTIONSEQUENCE asc)
FROM
dbo.SPONSORSHIP as s
INNER JOIN dbo.SPONSORSHIPTRANSACTION AS st
ON st.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
) as x
where
x.row_sequence = 1
This was a quick example that is not tested. For future reference, if you want the best answer, it's a great idea to generate a temp table or test data set that's able to be used so someone can provide a full working example.
The example I gave shows what is called a windowing function. Take a look more into them for helping with selecting results when you see the word sequence, need the the first/last in a group and more.
Hope this gives you some ideas! Welcome to Stack Overflow! 👋
I have the next query which returns 1550 rows.
SELECT *
FROM V_InventoryMovements -- 2 seconds
ORDER BY V_InventoryMovements.TransDate -- 23 seconds
It takes about 2 seconds to return the results.
But when I include the ORDER BY clause, then it takes about 23 seconds.
It is a BIG change just for adding an ORDER BY.
I would like to know what is happening, and a way to improve the query with the ORDER BY. To quit the ORDER BY should not be the solution.
Here a bit of information, please let me know if you need more info.
V_InventoryMovements
CREATE VIEW [dbo].[V_InventoryMovements]
AS
SELECT some_fields
FROM FinTime
RIGHT OUTER JOIN V_Outbound ON FinTime.StdDate = dbo.TruncateDate(V_Outbound.TransDate)
LEFT OUTER JOIN ReasonCode_Grouping ON dbo.V_Outbound.ReasonCode = dbo.ReasonCode_Grouping.ReasonCode
LEFT OUTER JOIN Items ON V_Outbound.ITEM = Items.Item
LEFT OUTER JOIN FinTime ON V_Outbound.EventDay = FinTime.StdDate
V_Outbound
CREATE VIEW [dbo].[V_Outbound]
AS
SELECT V_Outbound_WMS.*
FROM V_Outbound_WMS
UNION
SELECT V_Transactions_Calc.*
FROM V_Transactions_Calc
V_OutBound_WMS
CREATE VIEW [dbo].[V_OutBound_WMS]
AS
SELECT some_fields
FROM Transaction_Log
INNER JOIN MFL_StartDate ON Transaction_Log.TransDate >= MFL_StartDate.StartDate
LEFT OUTER JOIN Rack ON Transaction_Log.CHARGE = Rack.CHARGE AND Transaction_Log.CHARGE_LFD = Rack.CHARGE_LFD
V_Transactions_Calc
CREATE VIEW [dbo].[V_Transactions_Calc]
AS
SELECT some_fields
FROM Transactions_Calc
INNER JOIN MFL_StartDate ON dbo.Transactions_Calc.EventDay >= dbo.MFL_StartDate.StartDate
And here I will also share a part of the execution plan (the part where you can see the main cost). I don't know exactly how to read it and improve the query. Let me know if you need to see the rest of the execution plan. But all the other parts are 0% of Cost. The main Cost is in the: Nested Loops (Left Outer Join) Cost 95%.
Execution Plan With ORDER BY
Execution Plan Without ORDER BY
I think the short answer is that the optimizer is executing in a different order in an attempt to minimize the cost of the sorting, and doing a poor job. Its job is made very hard by the views within views within views, as GuidoG suggests. You might be able to convince it to execute differently by creating some additional index or statistics, but its going to be hard to advise on that remotely.
A possible workaround might be to select into a temp table, then apply the ordering afterwards:
SELECT *
INTO #temp
FROM V_InventoryMovements;
SELECT *
FROM #temp
ORDER BY TransDate
We have a huge database with over 100 tables and millions of rows.
I created a stored procedure for a job, tested it local and got 500'000 results in less than 10sec. I tested the same query on a second pc and waited about 1 hours for the same result.
The simple version of the query is:
select * from Table1
inner join Table2 on Table1.Table2Id = Table2.Id
where Table1.Segment = #segment
Table1 38'553'864 Rows
Table2 10'647'167 Rows
I used the execution plan and got the following result:
On the local PC I got the result:
(I could send the whole execution plan if needed)
The second PC is a virtual Server(testystem). It has a lot more memory, more space... I also stopped every Job on the server and only tried the sql query, but got the same result. So there aren't any sql query which blocks the tables.
Later I created a Index on the foreign key of table1 and tried to use it, but can't improve the query.
Does anyone have an idea where the problem could be and how I could solve it?
It would take a while to create a execution plan for both querys. But here are a few steps, who already helped a lot. Thanks guys for your help.
The statistics on the tables are from september last year on the second PC. We don't use the query that much on the server. An update on the statistics is a good point.
https://msdn.microsoft.com/en-us/library/ms190397(v=sql.120).aspx
Another thing is to improve my sql query. I removed the where and add it as a condition on the first inner join. So it filter the rows in the first table and after join the huge amount of rows from the second. (The where filters about 90% of the first table and table 3 is really small)
select * from Table1
inner join Table3 on Table1.Segment = #segment
and Table1.Table3Id = Table3.Id
inner join Table2 on Table1.Table2Id = Table2.Id
A next step is. I created an SQL Job which rebuild all Indexes. So they are up to date.
It already a lot better, but I'm still open for other inputs.
We had an issue since a recent update on our database (I made this update, I am guilty here), one of the query used was much slower since then. I tried to modify the query to get faster result, and managed to achieve my goal with temp tables, which is not bad, but I fail to understand why this solution performs better than a CTE based one, which does the same queries. Maybe it has to do that some tables are in a different DB ?
Here's the query that performs badly (22 minutes on our hardware) :
WITH CTE_Patterns AS (
SELECT
PEL.iId_purchased_email_list,
PELE.sEmail
FROM OtherDb.dbo.Purchased_Email_List PEL WITH(NOLOCK)
INNER JOIN OtherDb.dbo.Purchased_Email_List_Email AS PELE WITH(NOLOCK) ON PELE.iId_purchased_email_list = PEL.iId_purchased_email_list
WHERE PEL.bPattern = 1
),
CTE_Emails AS (
SELECT
ILE.iId_newsletterservice_import_list,
ILE.iId_newsletterservice_import_list_email,
ILED.sEmail
FROM dbo.NewsletterService_import_list_email AS ILE WITH(NOLOCK)
INNER JOIN dbo.NewsletterService_import_list_email_distinct AS ILED WITH(NOLOCK) ON ILED.iId_newsletterservice_import_list_email_distinct = ILE.iId_newsletterservice_import_list_email_distinct
WHERE ILE.iId_newsletterservice_import_list = 1000
)
SELECT I.iId_newsletterservice_import_list,
I.iId_newsletterservice_import_list_email,
BL.iId_purchased_email_list
FROM CTE_Patterns AS BL WITH(NOLOCK)
INNER JOIN CTE_Emails AS I WITH(NOLOCK) ON I.sEmail LIKE BL.sEmail
When running both CTE queries separately, it's super fast (0 secs in SSMS, returns 122 rows and 13k rows), when running the full query, with INNER JOIN on sEmail, it's super slow (22 minutes)
Here's the query that performs well, with temp tables (0 sec on our hardware) and which does the eaxct same thing, returns the same result :
SELECT
PEL.iId_purchased_email_list,
PELE.sEmail
INTO #tb1
FROM OtherDb.dbo.Purchased_Email_List PEL WITH(NOLOCK)
INNER JOIN OtherDb.dbo.Purchased_Email_List_Email PELE ON PELE.iId_purchased_email_list = PEL.iId_purchased_email_list
WHERE PEL.bPattern = 1
SELECT
ILE.iId_newsletterservice_import_list,
ILE.iId_newsletterservice_import_list_email,
ILED.sEmail
INTO #tb2
FROM dbo.NewsletterService_import_list_email AS ILE WITH(NOLOCK)
INNER JOIN dbo.NewsletterService_import_list_email_distinct AS ILED ON ILED.iId_newsletterservice_import_list_email_distinct = ILE.iId_newsletterservice_import_list_email_distinct
WHERE ILE.iId_newsletterservice_import_list = 1000
SELECT I.iId_newsletterservice_import_list,
I.iId_newsletterservice_import_list_email,
BL.iId_purchased_email_list
FROM #tb1 AS BL WITH(NOLOCK)
INNER JOIN #tb2 AS I WITH(NOLOCK) ON I.sEmail LIKE BL.sEmail
DROP TABLE #tb1
DROP TABLE #tb2
Tables stats :
OtherDb.dbo.Purchased_Email_List : 13 rows, 2 rows flagged bPattern = 1
OtherDb.dbo.Purchased_Email_List_Email : 324289 rows, 122 rows with patterns (which are used in this issue)
dbo.NewsletterService_import_list_email : 15.5M rows
dbo.NewsletterService_import_list_email_distinct ~1.5M rows
WHERE ILE.iId_newsletterservice_import_list = 1000 retrieves ~ 13k rows
I can post more info about tables on request.
Can someone help me understand this ?
UPDATE
Here is the query plan for the CTE query :
Here is the query plan with temp tables :
As you can see in the query plan, with CTEs, the engine reserves the right to apply them basically as a lookup, even when you want a join.
If it isn't sure enough it can run the whole thing independently, in advance, essentially generating a temp table... let's just run it once for each row.
This is perfect for the recursion queries they can do like magic.
But you're seeing - in the nested Nested Loops - where it can go terribly wrong.
You're already finding the answer on your own by trying the real temp table.
Parallelism. If you noticed in your TEMP TABLE query, the 3rd Query indicates Parallelism in both distributing and gathering the work of the 1st Query. And Parallelism when combining the results of the 1st and 2nd Query. The 1st Query also incidentally has a relative cost of 77%. So the Query Engine in your TEMP TABLE example was able to determine that the 1st Query can benefit from Parallelism. Especially when the Parallelism is Gather Stream and Distribute Stream, so its allowing the divying up of work (join) because the data is distributed in such a way that allows for divying up the work then recombining. Notice the cost of the 2nd Query is 0% so you can ignore that as no cost other than when it needs to be combined.
Looking at the CTE, that is entirely processed Serially and not in Parallel. So somehow with the CTE it could not figure out the 1st Query can be run in Parallel, as well as the relationship of the 1st and 2nd query. Its possible that with multiple CTE expressions it assumes some dependency and did not look ahead far enough.
Another test you can do with the CTE is keep the CTE_Patterns but eliminate the CTE_Emails by putting that as a "subquery derived" table to the 3rd Query in the CTE. It would be curious to see the Execution Plan, and see if there is Parallelism when expressed that way.
In my experience it's best to use CTE's for recursion and temp tables when you need to join back to the data. Makes for a much faster query typically.