slow query performance issue with partition and max - sql-server

this a poor performancing query I have ... what have I done so wrong?
Please help me it is executed tons of times in my system, solving that will give me a ladder to heaven
I gave a check on the system with sp_Blitz and no mortal issues found
Here is the query :
SELECT MAX(F.id) OVER (PARTITION BY idstato ORDER BY F.id DESC) AS id
FROM jfel_tagxml_invoicedigi F
INNER JOIN jfel_invoice_state S ON F.id = S.idinvoice
WHERE S.idstato = #idstato
AND S.id = F.idstatocorrente
AND F.sequence_invoice % #number_service_installed = #idServizio
ORDER BY F.id DESC,
F.idstatocorrente OFFSET 0 ROWS FETCH NEXT 1 ROWS ONLY;
Here is the query plan
https://www.brentozar.com/pastetheplan/?id=SyYL5JOeE
I can send you privately my system properties
update:
Made some modification , it is better , but I think it could be better ...
here is the new query :
SELECT MAX(F.id) AS id
FROM jfel_tagxml_invoicedigi F
INNER JOIN jfel_invoice_state S ON F.id = S.idinvoice
WHERE S.idstato = #idstato
AND S.id = F.idstatocorrente
AND F.sequence_invoice % #number_service_installed = #idServizio;
And the new plan:
https://www.brentozar.com/pastetheplan/?id=SJ-5GDqeE
update:
Made some modification , it is better , but I think it could be better ...
here is the new query :
SELECT top 1 F.id as id
FROM jfel_tagxml_invoicedigi AS F
INNER JOIN jfel_invoice_state AS S
ON F.idstatocorrente = S.id
WHERE S.idstato= 1 AND S.id = F.idstatocorrente
and S.datastato > dateadd(DAY,-5,getdate())
AND F.progressivo_fattura % 1 = 0
ORDER BY S.datastato
And the new new plan
https://www.brentozar.com/pastetheplan/?id=S1xRkL51S

Filtering by calculated fields used to affect performance negatively. You can do your other filters first, and as a last step do the calculated filter, to have less rows to match. Maybe it will fill TEMPDB because it will store the intermediate recordset there, but in this case you either increase the size of it, or use another method.
Here is your second query written like this (maybe you need to adjust it, I just wrote it in Notepad++:
SELECT MAX(id) AS id
FROM (
SELECT F.id, F.sequence_invoice % #number_service_installed as [idServizio]
FROM jfel_tagxml_invoicedigi F
INNER JOIN jfel_invoice_state S ON F.id = S.idinvoice
WHERE S.idstato = #idstato
AND S.id = F.idstatocorrente
-- AND F.sequence_invoice % #number_service_installed = #idServizio
)
WHERE idServizio = #idServizio
;
Instead of the subquery, you can try a temp table or CTE as well, maybe one is the clear winner above the others, worth a try for all if you want maximum performance.

The data calculation is Non-Sargable, you could try using a variable with OPTION RECOMPILE:
DECLARE #d Date
SET #d = dateadd(DAY,-5,getdate())
SELECT top 1 F.id as id
FROM jfel_tagxml_invoicedigi AS F
INNER JOIN jfel_invoice_state AS S
ON F.idstatocorrente = S.id
WHERE S.idstato= 1 AND S.id = F.idstatocorrente
and S.datastato > #d
AND F.progressivo_fattura % 1 = 0
ORDER BY S.datastato
OPTION (RECOMPILE)

I think you need a NONCLUSTERED INDEX for your query that you describes above.
If you don't have any idea about INDEX, I mean you can not identify a witch field of your table NONCLUSTERED INDEXneed then simply, you just create an execution plan from SQL Server 2008 Management Studio and SQL Server intelligence gives you missing index details
and shows a green color text that is details of the missing index.
you can move your mouse pointer on missing Index text and SQL Server 2008 Management Studio intelligence will show the T-SQL code that is required to create the missing index or you can press your mouse to right-click on missing index text then select the missing index details option from the list to see the details of the missing index.
For more information, you can visit this article Create Missing Index From the Actual Execution Plan
I hope this solution helps you.

All Window Aggregation has a very big performance penalty. Try to take this window sliding mechanism outside the database (i.e. in your application RAM) will be the universal way of optimizing it.
Otherwise, you may try to give more RAM to each database section (in PostgreSQL, you can tweak this via a parameter. In other database, you may or may not able to).
The main reason why it is taking very long (slow) is that it invokes sorting and materializing of the sorted table.

Related

Slow performing T-SQL query with two joins to the same table

I am struggling with figuring out what is happening with the T-SQL query shown below.
You will see two inner joins to the same table, although with different join criteria. The first join by itself runs in approximately 21 seconds and if I run the second join by itself it completes in approximately 27 seconds.
If I leave both joins in place, the query runs and runs and runs, until I finally stop the query. The appropriate indices appear to be in place and I know this query runs in a different environment with less horsepower, the only difference being the other server is running SQL Server 2012 and I am running SQL Server 2016, although the database is in 2012 compatibility mode:
This join runs in ~21 seconds.
SELECT
COUNT(*)
FROM
dbo.SPONSORSHIP as s
INNER JOIN
dbo.SPONSORSHIPTRANSACTION AS st
ON st.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
AND st.TRANSACTIONSEQUENCE = (SELECT MIN(TRANSACTIONSEQUENCE)
FROM dbo.SPONSORSHIPTRANSACTION AS ms
WHERE ms.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
AND ms.TARGETSPONSORSHIPID = s.ID)
This join runs in ~27 seconds.
SELECT
COUNT(*)
FROM
dbo.SPONSORSHIP AS s
INNER JOIN
dbo.SPONSORSHIPTRANSACTION AS lt ON lt.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
AND lt.TRANSACTIONSEQUENCE = (SELECT MAX(TRANSACTIONSEQUENCE)
FROM dbo.SPONSORSHIPTRANSACTION AS ms
WHERE ms.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
AND s.ID IN (ms.CONTEXTSPONSORSHIPID,
ms.TARGETSPONSORSHIPID,
ms.DECLINEDSPONSORSHIPID)
AND ms.ACTIONCODE <> 9)
These are both considered correlated subqueries. You should typically avoid this pattern, as it causes what is known as "RBAR"... which is "Row by Agonizing Row". Before you focus on troubleshooting this particular query, I'd suggest revisiting the query itself and see if you can solve this in a more set based approach. You'll find that in most cases you have other ways to accomplish this and cut cost down dramatically.
As one example:
select
total_count
,row_sequence
from
(
SELECT
total_count = COUNT(*)
,row_sequence = row_number() over(order by st.TRANSACTIONSEQUENCE asc)
FROM
dbo.SPONSORSHIP as s
INNER JOIN dbo.SPONSORSHIPTRANSACTION AS st
ON st.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
) as x
where
x.row_sequence = 1
This was a quick example that is not tested. For future reference, if you want the best answer, it's a great idea to generate a temp table or test data set that's able to be used so someone can provide a full working example.
The example I gave shows what is called a windowing function. Take a look more into them for helping with selecting results when you see the word sequence, need the the first/last in a group and more.
Hope this gives you some ideas! Welcome to Stack Overflow! 👋

Why does sql server do a scan on joins when there are no records in source table

The idea of the below query is to use the CTE to get the primary key of all rows in [Archive].[tia_tia_object] that meet the filter.
The execution time for the query within the CTE is 0 seconds.
The second part is supposed to do joins on other tables, to filter the data some more, but only if there are any rows returned in the CTE. This was the only way I could get the SQL server to use the correct indexes.
Why does it spend time (see execution plan) looking in TIA_TIA_AGREEMENT_LINE and TIA_TIA_OBJECT, when CTE returns 0 rows?
WITH cte_vehicle
AS (SELECT O.[Seq_no],
O.Object_No
FROM [Archive].[tia_tia_object] O
WHERE O.RECORD_TIMESTAMP >
(SELECT LastLoadTimeStamp FROM staging.Ufngetlastloadtimestamp('Staging.CoveredObject'))
AND O.[Meta_iscurrent] = 1
AND O.OBJECT_TYPE IN ( 'BIO01', 'CAO01', 'DKV', 'GFO01',
'KMA', 'KNO01', 'MCO01', 'VEO01',
'SVO01', 'AUO01' ))
SELECT O.[Seq_no] AS [Bkey_CoveredObject],
Cast(O.[Agr_Line_No] AS BIGINT) AS [Agr_Line_No],
O.[Cover_Start_Date] AS [CoverageFrom],
O.[Cover_End_Date] AS [CoverageTo],
O.[Timestamp] AS [TIMESTAMP],
O.[Record_Timestamp] AS [RECORD_TIMESTAMP],
O.[Newest] AS [Newest],
O.LOCATION_ID AS LocationNo,
O.[Cust_no],
O.[N01]
FROM cte_vehicle AS T
INNER JOIN [Archive].[tia_tia_object] O
ON t.Object_No = O.Object_No
AND t.Seq_No = O.Seq_No
INNER JOIN [Archive].[tia_tia_agreement_line] AL
ON O.Agr_line_no = AL.Agr_line_no
INNER JOIN [Archive].[tia_tia_policy] P
ON AL.Policy_no = P.Policy_no
WHERE P.[Transaction_type] <> 'D'
Execution plan:
Because it still needs to check and look for records. Even if there are no records in that table, it doesn't know that until it actually checks.
Much like if someone gives you a sealed box, you don't know it's empty or not till you open it.

Why LEFT JOIN increase query time so much?

I'm using SQL Server 2012 and encountered strange problem.
This is the original query I've been using:
DELETE FROM [TABLE_TEMP]
INSERT INTO [TABLE_TEMP]
SELECT H.*, NULL
FROM [TABLE_Accounts_History] H
INNER JOIN [TABLE_For_Filtering] A ON H.[RSIN] = A.[RSIN]
WHERE
H.[NUM] = (SELECT TOP 1 [NUM] FROM [TABLE_Accounts_History]
WHERE [RSIN] = H.[RSIN]
AND [AccountSys] = H.[AccountSys]
AND [Cl_Acc_Typ] = H.[Cl_Acc_Typ]
AND [DATE_DEAL] < #dte
ORDER BY [DATE_DEAL] DESC)
AND H.[TYPE_DEAL] <> 'D'
Table TABLE_Accounts_History consists of 3 200 000 records.
Table TABLE_For_Filtering is around 1 500 records.
Insert took me 2m 40s and inserted 1 600 000 records for further work.
But then I decided to attach a column from pretty small table TABLE_Additional (only around 100 recs):
DELETE FROM [TABLE_TEMP]
INSERT INTO [TABLE_TEMP]
SELECT H.*, P.[prof_type]
FROM [TABLE_Accounts_History] H
INNER JOIN [TABLE_For_Filtering] A ON H.[RSIN] = A.[RSIN]
LEFT JOIN [TABLE_Additional] P ON H.[ACCOUNTSYS] = P.[AccountSys]
WHERE H.[NUM] = ( SELECT TOP 1 [NUM]
FROM [TABLE_Accounts_History]
WHERE [RSIN] = H.[RSIN]
AND [AccountSys] = H.[AccountSys]
AND [Cl_Acc_Typ] = H.[Cl_Acc_Typ]
AND [DATE_DEAL] < #dte
ORDER BY [DATE_DEAL] DESC)
AND H.[TYPE_DEAL] <> 'D'
And now it takes ages this query to complete. Why is it so? How such small left join possibly can dump performance? How can I improve it?
An update: no luck so far with LEFT JOIN. Indexes, no indexes, hinted indexes.. For now I've found a workaround by using my first query and UPDATE after it:
UPDATE [TABLE_TEMP]
SET [PROF_TYPE] = P1.[prof_type]
FROM [TABLE_TEMP] A1
LEFT JOIN
[TABLE_Additional] P1
ON A1.[ACCOUNTSYS] = P1.[AccountSys]
Takes only 5s and does pretty much the same I've been trying to achieve. Still SQL Server performance is mystery to me.
The 'small' left join is actually doing a lot of extra work for you. SQL Server has to go back to TABLE_Additional for each row from your inner join between and TABLE_Accounts_History and TABLE_For_Filtering. You can help SQL Server a few ways to speed this up by trying some indexing. You could:
1) Ensure TABLE_Accounts_History has an index on the Foreign Key H.[ACCOUNTSYS]
2) If you think that TABLE_Additional will always be accessed by the AccountSys, i.e. you will be requesting AccountSys in ordered groups, you could create a Clustered Index on TABLE_Additional.AccountSys. (in orther words physically order the table on disk in order of AccountSys)
3) You could also ensure there is a foreign key index on TABLE_Accounts_History.
left outer join selects all rows from left table. In Your case your left table has 3 200 000 this much rows and then comparing with each record to your right table. One solution is to use Indexes which will reduce retrieval time.

How to improve SQL Query Performance

I have the following DB Structure (simplified):
Payments
----------------------
Id | int
InvoiceId | int
Active | bit
Processed | bit
Invoices
----------------------
Id | int
CustomerOrderId | int
CustomerOrders
------------------------------------
Id | int
ApprovalDate | DateTime
ExternalStoreOrderNumber | nvarchar
Each Customer Order has an Invoice and each Invoice can have multiple Payments.
The ExternalStoreOrderNumber is a reference to the order from the external partner store we imported the order from and the ApprovalDate the timestamp when that import happened.
Now we have the problem that we had a wrong import an need to change some payments to other invoices (several hundert, so too mach to do by hand) according to the following logic:
Search the Invoice of the Order which has the same external number as the current one but starts with 0 instead of the current digit.
To do that I created the following query:
UPDATE DB.dbo.Payments
SET InvoiceId=
(SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
WHERE I.CustomerOrderId=
(SELECT TOP 1 O.Id FROM DB.dbo.CustomerOrders AS O
WHERE O.ExternalOrderNumber='0'+SUBSTRING(
(SELECT TOP 1 OO.ExternalOrderNumber FROM DB.dbo.CustomerOrders AS OO
WHERE OO.Id=I.CustomerOrderId), 1, 10000)))
WHERE Id IN (
SELECT P.Id
FROM DB.dbo.Payments AS P
JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'
Now I started that query on a test system using the live data (~250.000 rows in each table) and it is now running since 16h - did I do something completely wrong in the query or is there a way to speed it up a little?
It is not required to be really fast, as it is a one time task, but several hours seems long to me and as I want to learn for the (hopefully not happening) next time I would like some feedback how to improve...
You might as well kill the query. Your update subquery is completely un-correlated to the table being updated. From the looks of it, when it completes, EVERY SINGLE dbo.payments record will have the same value.
To break down your query, you might find that the subquery runs fine on its own.
SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
WHERE I.CustomerOrderId=
(SELECT TOP 1 O.Id FROM DB.dbo.CustomerOrders AS O
WHERE O.ExternalOrderNumber='0'+SUBSTRING(
(SELECT TOP 1 OO.ExternalOrderNumber FROM DB.dbo.CustomerOrders AS OO
WHERE OO.Id=I.CustomerOrderId), 1, 10000))
That is always a BIG worry.
The next thing is that it is running this row-by-row for every record in the table.
You are also double-dipping into payments, by selecting from where ... the id is from a join involving itself. You can reference a table for update in the JOIN clause using this pattern:
UPDATE P
....
FROM DB.dbo.Payments AS P
JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'
Moving on, another mistake is to use TOP without ORDER BY. That's asking for random results. If you know there's only one result, you wouldn't even need TOP. In this case, maybe you're ok with randomly choosing one from many possible matches. Since you have three levels of TOP(1) without ORDER BY, you might as well just mash them all up (join) and take a single TOP(1) across all of them. That would make it look like this
SET InvoiceId=
(SELECT TOP 1 I.Id
FROM DB.dbo.Invoices AS I
JOIN DB.dbo.CustomerOrders AS O
ON I.CustomerOrderId=O.Id
JOIN DB.dbo.CustomerOrders AS OO
ON O.ExternalOrderNumber='0'+SUBSTRING(OO.ExternalOrderNumber,1,100)
AND OO.Id=I.CustomerOrderId)
However, as I mentioned very early on, this is not being correlated to the main FROM clause at all. We move the entire search into the main query so that we can make use of JOIN-based set operations rather than row-by-row subqueries.
Before I show the final query (fully commented), I think your SUBSTRING is supposed to address this logic but starts with 0 instead of the current digit. However, if that means how I read it, it means that for an order number '5678', you're looking for '0678' which would also mean that SUBSTRING should be using 2,10000 instead of 1,10000.
UPDATE P
SET InvoiceId=II.Id
FROM DB.dbo.Payments AS P
-- invoices for payments
JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
-- orders for invoices
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
-- another order with '0' as leading digit
JOIN DB.dbo.CustomerOrders AS OO
ON OO.ExternalOrderNumber='0'+substring(O.ExternalOrderNumber,2,1000)
-- invoices for this other order
JOIN DB.dbo.Invoices AS II ON OO.Id=II.CustomerOrderId
-- conditions for the Payments records
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'
It is worth noting that SQL Server allows UPDATE ..FROM ..JOIN which is less supported by other DBMS, e.g. Oracle. This is because for a single row in Payments (update target), I hope you can see that it is evident it could have many choices of II.Id to choose from from all the cartesian joins. You will get a random possible II.Id.
I think something like this will be more efficient ,if I understood your query right. As i wrote it by hand and didn't run it, it may has some syntax error.
UPDATE DB.dbo.Payments
set InvoiceId=(SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
inner join DB.dbo.CustomerOrders AS O ON I.CustomerOrderId=O.Id
inner join DB.dbo.CustomerOrders AS OO On OO.Id=I.CustomerOrderId
and O.ExternalOrderNumber='0'+SUBSTRING(OO.ExternalOrderNumber, 1, 10000)))
FROM DB.dbo.Payments
JOIN DB.dbo.Invoices AS I ON I.Id=Payments.InvoiceId and
Payments.Active=0
AND Payments.Processed=0
AND O.ApprovalDate='2012-07-19 00:00:00'
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
Try to re-write using JOINs. This will highlight some of the problems. Will the following function do just the same? (The queries are somewhat different, but I guess this is roughly what you're trying to do)
UPDATE Payments
SET InvoiceId= I.Id
FROM DB.dbo.Payments
CROSS JOIN DB.dbo.Invoices AS I
INNER JOIN DB.dbo.CustomerOrders AS O
ON I.CustomerOrderId = O.Id
INNER JOIN DB.dbo.CustomerOrders AS OO
ON O.ExternalOrderNumer = '0' + SUBSTRING(OO.ExternalOrderNumber, 1, 10000)
AND OO.Id = I.CustomerOrderId
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00')
As you see, two problems stand out:
The undonditional join between Payments and Invoices (of course, you've caught this off by a TOP 1 statement, but set-wise it's still unconditional) - I'm not really sure if this really is a problem in your query. Will be in mine though :).
The join on a 10000-character column (SUBSTRING), embodied in a condition. This is highly inefficient.
If you need a one-time speedup, just take the queries on each table, try to store the in-between-results in temporary tables, create indices on those temporary tables and use the temporary tables to perform the update.

Need help optimizing database query. Very unexperienced with indices

I need to optimize this query. The professor recommends using indices, but i'm very confused about how. If I could get just one example of what a good index is and why, and the actual code needed, I could definitely do the rest by myself. Any help would be awesome. (PSQL btw)
SELECT
x.enteredBy
, x.id
, count(DISTINCT xr.id)
, count(DISTINCT c.id)
, 'l'
FROM
((locationsV x left outer join locationReviews xr on x.id = xr.lid)
left outer join reviews r on r.id = xr.id)
left outer join comments c on xr.id = c.reviewId
WHERE
x.vNo = 0
AND (r.enteredBy IS NULL OR
(r.enteredBy <> x.enteredBy
AND c.enteredBy <> x.enteredBy
AND r.enteredBY NOT IN
(SELECT requested FROM friends WHERE requester = x.enteredBY)
AND r.enteredBY NOT IN
(SELECT requester FROM friends WHERE requested = x.enteredBY)))
AND (c.enteredBy IS NULL OR
(c.enteredBY NOT IN
(SELECT requested FROM friends WHERE requester = x.enteredBY)
AND c.enteredBY NOT IN
(SELECT requester FROM friends WHERE requested = x.enteredBY)))
GROUP BY
x.enteredBy
, x.id
I tried adding something like this to the beginning, but the overall time it took didn't change.
CREATE INDEX friends1_idx ON friends(requested);
CREATE INDEX friends2_idx ON friends(requester);
I think the SQL itself could be optimized to improve performance in addition to looking at indexes. Having those IN clauses in the WHERE clause may cause the optimizer do full table scans. So if you could move those to be tables in the FROM section you would have better performance. Also, having the COUNT(DISTINCT ...) clauses in in the SELECT statement seems problematic. You would likely be better off if you could make changes so the DISTINCT clauses were necessary there and simply use the COUNT aggregate function.
Consider using a SQL statement in the FROM clause before you do the left join--a structure something like this:
SELECT ...
FROM Table1 LEFT JOIN
(SELECT ... FROM Table2 INNER JOIN Table3 ON ...) AS Table4 ON
Table1.somecolumn = Table4.somecolumn
...
I know this isn't giving you the solution, but hopefully it will help you to think about other aspects of the problem and to explore other ways to address performance.

Resources