Optimising query - sql-server

This view query is taking 2min to load 500 000 lines.
I'm using EF 4.0 and thread load this view on a DataGrid.
How can I optimize it so it can load in shorter time ?
Update : I updated the query to this and now it takes 55 seconds but still too long !
SELECT ROW_NUMBER() OVER ( ORDER BY ss.IDStore_CardStore DESC ) AS Id_PK ,
SC.IDStockCardIndex ,
SC.Designation ,
ISNULL(P.PMP, 0) PMP ,
ISNULL(SS.Quantity, 0) Quantity ,
SS.UnitePrice * SS.Quantity AS SubTotalStockCard ,
S.StoreName ,
SS.IDPurchaseInvoice
FROM dbo.Stores S
INNER JOIN dbo.StockCardsStores ss ON S.IDStore = ss.IDStore
RIGHT OUTER JOIN dbo.StockCard SC ON ss.IDStockCardIndex = SC.IDStockCardIndex
LEFT OUTER JOIN ( SELECT SUM(UnitePrice * Quantity) / SUM(Quantity) AS PMP ,
IDStockCardIndex
FROM dbo.StockCardsStores AS SCS
GROUP BY IDStockCardIndex
) AS P ON P.IDStockCardIndex = SC.IDStockCardIndex

Use Estimate Execution Plan in SSMS. If you're using 2008 R2, SSMS will suggest "missing index" that will possibly improve the time overall. 500,000 rows at 55 seconds suggest that one or more table scans are kicking in. The Estimated Execution Plan will identify these. The Plan will also show what part of the query "costs" the most overall, helping you to zero in.
Highlight that inner sub-query and look at the plan for that first. Then work your way outwards.

Related

Slow performing T-SQL query with two joins to the same table

I am struggling with figuring out what is happening with the T-SQL query shown below.
You will see two inner joins to the same table, although with different join criteria. The first join by itself runs in approximately 21 seconds and if I run the second join by itself it completes in approximately 27 seconds.
If I leave both joins in place, the query runs and runs and runs, until I finally stop the query. The appropriate indices appear to be in place and I know this query runs in a different environment with less horsepower, the only difference being the other server is running SQL Server 2012 and I am running SQL Server 2016, although the database is in 2012 compatibility mode:
This join runs in ~21 seconds.
SELECT
COUNT(*)
FROM
dbo.SPONSORSHIP as s
INNER JOIN
dbo.SPONSORSHIPTRANSACTION AS st
ON st.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
AND st.TRANSACTIONSEQUENCE = (SELECT MIN(TRANSACTIONSEQUENCE)
FROM dbo.SPONSORSHIPTRANSACTION AS ms
WHERE ms.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
AND ms.TARGETSPONSORSHIPID = s.ID)
This join runs in ~27 seconds.
SELECT
COUNT(*)
FROM
dbo.SPONSORSHIP AS s
INNER JOIN
dbo.SPONSORSHIPTRANSACTION AS lt ON lt.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
AND lt.TRANSACTIONSEQUENCE = (SELECT MAX(TRANSACTIONSEQUENCE)
FROM dbo.SPONSORSHIPTRANSACTION AS ms
WHERE ms.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
AND s.ID IN (ms.CONTEXTSPONSORSHIPID,
ms.TARGETSPONSORSHIPID,
ms.DECLINEDSPONSORSHIPID)
AND ms.ACTIONCODE <> 9)
These are both considered correlated subqueries. You should typically avoid this pattern, as it causes what is known as "RBAR"... which is "Row by Agonizing Row". Before you focus on troubleshooting this particular query, I'd suggest revisiting the query itself and see if you can solve this in a more set based approach. You'll find that in most cases you have other ways to accomplish this and cut cost down dramatically.
As one example:
select
total_count
,row_sequence
from
(
SELECT
total_count = COUNT(*)
,row_sequence = row_number() over(order by st.TRANSACTIONSEQUENCE asc)
FROM
dbo.SPONSORSHIP as s
INNER JOIN dbo.SPONSORSHIPTRANSACTION AS st
ON st.SPONSORSHIPCOMMITMENTID = s.SPONSORSHIPCOMMITMENTID
) as x
where
x.row_sequence = 1
This was a quick example that is not tested. For future reference, if you want the best answer, it's a great idea to generate a temp table or test data set that's able to be used so someone can provide a full working example.
The example I gave shows what is called a windowing function. Take a look more into them for helping with selecting results when you see the word sequence, need the the first/last in a group and more.
Hope this gives you some ideas! Welcome to Stack Overflow! 👋

slow query performance issue with partition and max

this a poor performancing query I have ... what have I done so wrong?
Please help me it is executed tons of times in my system, solving that will give me a ladder to heaven
I gave a check on the system with sp_Blitz and no mortal issues found
Here is the query :
SELECT MAX(F.id) OVER (PARTITION BY idstato ORDER BY F.id DESC) AS id
FROM jfel_tagxml_invoicedigi F
INNER JOIN jfel_invoice_state S ON F.id = S.idinvoice
WHERE S.idstato = #idstato
AND S.id = F.idstatocorrente
AND F.sequence_invoice % #number_service_installed = #idServizio
ORDER BY F.id DESC,
F.idstatocorrente OFFSET 0 ROWS FETCH NEXT 1 ROWS ONLY;
Here is the query plan
https://www.brentozar.com/pastetheplan/?id=SyYL5JOeE
I can send you privately my system properties
update:
Made some modification , it is better , but I think it could be better ...
here is the new query :
SELECT MAX(F.id) AS id
FROM jfel_tagxml_invoicedigi F
INNER JOIN jfel_invoice_state S ON F.id = S.idinvoice
WHERE S.idstato = #idstato
AND S.id = F.idstatocorrente
AND F.sequence_invoice % #number_service_installed = #idServizio;
And the new plan:
https://www.brentozar.com/pastetheplan/?id=SJ-5GDqeE
update:
Made some modification , it is better , but I think it could be better ...
here is the new query :
SELECT top 1 F.id as id
FROM jfel_tagxml_invoicedigi AS F
INNER JOIN jfel_invoice_state AS S
ON F.idstatocorrente = S.id
WHERE S.idstato= 1 AND S.id = F.idstatocorrente
and S.datastato > dateadd(DAY,-5,getdate())
AND F.progressivo_fattura % 1 = 0
ORDER BY S.datastato
And the new new plan
https://www.brentozar.com/pastetheplan/?id=S1xRkL51S
Filtering by calculated fields used to affect performance negatively. You can do your other filters first, and as a last step do the calculated filter, to have less rows to match. Maybe it will fill TEMPDB because it will store the intermediate recordset there, but in this case you either increase the size of it, or use another method.
Here is your second query written like this (maybe you need to adjust it, I just wrote it in Notepad++:
SELECT MAX(id) AS id
FROM (
SELECT F.id, F.sequence_invoice % #number_service_installed as [idServizio]
FROM jfel_tagxml_invoicedigi F
INNER JOIN jfel_invoice_state S ON F.id = S.idinvoice
WHERE S.idstato = #idstato
AND S.id = F.idstatocorrente
-- AND F.sequence_invoice % #number_service_installed = #idServizio
)
WHERE idServizio = #idServizio
;
Instead of the subquery, you can try a temp table or CTE as well, maybe one is the clear winner above the others, worth a try for all if you want maximum performance.
The data calculation is Non-Sargable, you could try using a variable with OPTION RECOMPILE:
DECLARE #d Date
SET #d = dateadd(DAY,-5,getdate())
SELECT top 1 F.id as id
FROM jfel_tagxml_invoicedigi AS F
INNER JOIN jfel_invoice_state AS S
ON F.idstatocorrente = S.id
WHERE S.idstato= 1 AND S.id = F.idstatocorrente
and S.datastato > #d
AND F.progressivo_fattura % 1 = 0
ORDER BY S.datastato
OPTION (RECOMPILE)
I think you need a NONCLUSTERED INDEX for your query that you describes above.
If you don't have any idea about INDEX, I mean you can not identify a witch field of your table NONCLUSTERED INDEXneed then simply, you just create an execution plan from SQL Server 2008 Management Studio and SQL Server intelligence gives you missing index details
and shows a green color text that is details of the missing index.
you can move your mouse pointer on missing Index text and SQL Server 2008 Management Studio intelligence will show the T-SQL code that is required to create the missing index or you can press your mouse to right-click on missing index text then select the missing index details option from the list to see the details of the missing index.
For more information, you can visit this article Create Missing Index From the Actual Execution Plan
I hope this solution helps you.
All Window Aggregation has a very big performance penalty. Try to take this window sliding mechanism outside the database (i.e. in your application RAM) will be the universal way of optimizing it.
Otherwise, you may try to give more RAM to each database section (in PostgreSQL, you can tweak this via a parameter. In other database, you may or may not able to).
The main reason why it is taking very long (slow) is that it invokes sorting and materializing of the sorted table.

SQL Server how to optimise this query which gets related records with a max value

I have a database view that needs to get values from two rows in a table, which we will call Jobs for our purposes here.
Alongside the main job (J1) the query needs to find the job (J2) which has the same CustomerID, but where of all such jobs J2 is the one with the task with the earliest deadline. Please note that there isn't a customers table, it is just matching the IDs. If this doesn't make sense, see the query below!
SELECT J1.ID, J2.ID
FROM Jobs J1, Jobs J2
WHERE J2.ID = (SELECT Job_ID FROM Tasks T1
WHERE T1.Job_ID = J2.ID
AND T1.Deadline = (SELECT MIN(Deadline)
FROM Tasks T2, Jobs J3
WHERE T2.Job_ID = J3.ID
AND J3.CustomerID = J1.CustomerID))
The above query is correct but very slow: 38 seconds if you restrict it with TOP 5. I don't know how to optimise this, can anyone help please?
Edits:
Below is the execution plan. I've had to change it in an image editor since my table names have been changed due to confidentiality.
The main view (which this will be a part of) uses LEFT OUTER JOIN instead. If you use this revised query, the execution time drops to 9 seconds for the top 5 rows, and 10 seconds for the entire database:
SELECT TOP 5 J1.ID, J2.ID
FROM Jobs J1 LEFT OUTER JOIN Jobs J2
ON J2.ID = (SELECT Job_ID FROM Tasks T1
WHERE T1.Job_ID = J2.ID
AND T1.Deadline = (SELECT MIN(Deadline)
FROM Tasks T2
LEFT OUTER JOIN Jobs J3
ON T2.Job_ID = J3.ID
AND CustomerID = CustomerID))
Creating a clustered index on the Tasks table on Job_ID saves about 10-20%.
My solution was to create a view:
ALTER VIEW EarliestJob AS
SELECT J1.ID AS JobID, J2.ID AS EarliestJobID, ROW_NUMBER() OVER(PARTITION BY J1.ID ORDER BY T.Deadline) AS RowNumber
FROM Jobs J1
LEFT OUTER JOIN Jobs J2 ON J1.CustomerID = J2.CustomerID
LEFT OUTER JOIN Tasks T ON J2.ID = T.Job_ID
The query then just becomes:
SELECT JobID, EarliestJobID
FROM EarliestJob
WHERE RowNumber=1
The execution time for 1884 rows then decreases from 8 seconds to 0.056 seconds!
I'd be interested if anyone knows how to do it without using a view, I've tried putting the ROW_NUMBER() function as a where clause with =1, but it doesn't work - it just returns all the rows.

Why does sql server do a scan on joins when there are no records in source table

The idea of the below query is to use the CTE to get the primary key of all rows in [Archive].[tia_tia_object] that meet the filter.
The execution time for the query within the CTE is 0 seconds.
The second part is supposed to do joins on other tables, to filter the data some more, but only if there are any rows returned in the CTE. This was the only way I could get the SQL server to use the correct indexes.
Why does it spend time (see execution plan) looking in TIA_TIA_AGREEMENT_LINE and TIA_TIA_OBJECT, when CTE returns 0 rows?
WITH cte_vehicle
AS (SELECT O.[Seq_no],
O.Object_No
FROM [Archive].[tia_tia_object] O
WHERE O.RECORD_TIMESTAMP >
(SELECT LastLoadTimeStamp FROM staging.Ufngetlastloadtimestamp('Staging.CoveredObject'))
AND O.[Meta_iscurrent] = 1
AND O.OBJECT_TYPE IN ( 'BIO01', 'CAO01', 'DKV', 'GFO01',
'KMA', 'KNO01', 'MCO01', 'VEO01',
'SVO01', 'AUO01' ))
SELECT O.[Seq_no] AS [Bkey_CoveredObject],
Cast(O.[Agr_Line_No] AS BIGINT) AS [Agr_Line_No],
O.[Cover_Start_Date] AS [CoverageFrom],
O.[Cover_End_Date] AS [CoverageTo],
O.[Timestamp] AS [TIMESTAMP],
O.[Record_Timestamp] AS [RECORD_TIMESTAMP],
O.[Newest] AS [Newest],
O.LOCATION_ID AS LocationNo,
O.[Cust_no],
O.[N01]
FROM cte_vehicle AS T
INNER JOIN [Archive].[tia_tia_object] O
ON t.Object_No = O.Object_No
AND t.Seq_No = O.Seq_No
INNER JOIN [Archive].[tia_tia_agreement_line] AL
ON O.Agr_line_no = AL.Agr_line_no
INNER JOIN [Archive].[tia_tia_policy] P
ON AL.Policy_no = P.Policy_no
WHERE P.[Transaction_type] <> 'D'
Execution plan:
Because it still needs to check and look for records. Even if there are no records in that table, it doesn't know that until it actually checks.
Much like if someone gives you a sealed box, you don't know it's empty or not till you open it.

SQL Server : pagination into Excel

I've got a large data set that I need to get into excel to get some pivot tables and analysis going.
I normally am able to do this as the data never reaches the 1 million line mark. I just do a SQL Server data import and specify my SQL statement.
Here is my current SQL
WITH n AS (
Select A1.AccountID, A1.ParentAccountID, A1.Name
FROM Account AS A1
WHERE A1.ParentAccountID = 92
UNION ALL
SELECT A2.AccountID, A2.ParentAccountID, A2.Name
FROM Account AS A2
JOIN n
ON A2.ParentAccountID=n.AccountID
)
select n.*, D.DeviceID, A.*, P.*
FROM n
LEFT OUTER JOIN
Device AS D
ON D.AccountID = n.AccountID
LEFT OUTER JOIN
Audit as A
ON A.AccountID = n.AccountID
RIGHT OUTER JOIN
DeviceAudit As P
ON P.AuditID = A.AuditID
WHERE A.AuditDate > CAST('2013-03-11' AS DATETIME)
ORDER BY n.AccountID ASC, P.DeviceID ASC, A.AuditDate DESC
This right now is returning to me 100% of what I need. 18 million records for the past 30 days. I was hoping there would be a simple way to find the next 100,000 or 500,000 records.
I can use TOP 100000 to get my first chunk, though I do not seem to have an offset available to me.
At present this runs and completes in 20 minutes. This is 1 of many account hierarchies that I have to perform this for. Hopefully this pagination will not be too expensive cpu wise.
I did try exporting to a CSV in hopes of importing it, though that just gives me a 12GB csv file that I do not have time to and break apart.
Yes, you can do paginated subqueries on the row number since SQL 2005. Add a row number to the select clause of your original query:
, ROW_NUMBER() OVER (ORDER BY {whatever id}) AS row
Then you can make your old query a subquery and query against the row:
SELECT TOP {results per page} *
FROM ({your previous sql statement})
WHERE row > {page# * results per page}

Resources