I am using a similar query. I cannot post the actual query and the execution plans here. I tried adding an execution plan suggesting a non-clustered index but it slowed down the query further.
I know it's incomplete information, but can you please suggest what I can try please? I am out of options!!
I am putting the below condition in the where clause, the date seems fine but as soon as I add any of the other 2, the query takes hours. The where condition is used when I try to query the view.
where Date_Time between '2021-11-01 00:00:00.000' and '2022-11-02 00:00:00.000'
and Visit_code not in ('12', '13')
and mode_code <>'99'
Execution plan XML
CREATE VIEW [dbo].[vw_Test] AS
select fields
from table1 ed
left join table2 e on ed.field1_id = e.field1_id
left join table3 et on et.field1_id = ed.field1_id
left join table4 etf on etf.field1_id = e.field1_id
and etf.field2_cd= 85429041
and etf.dt_tm_field >= '2025-01-01 00:00:00.0000000'
left join table5 etf_dt on etf_dt.field1 = e.field1
and etf_dt.field3= 85429039
and etf_dt.dt_tm_field >= '2025-01-01 00:00:00.0000000'
left join table6 ei on ei.field1 = ed.field1
and ei.field4_cd = 123485.00
left join table7 cvo_ModeOfArrival on cvo_ModeOfArrival.field = ed.field6
and cvo_ModeOfArrival.field5 = 12345
left join table7 cvo_ModeOfSep on cvo_ModeOfSep.field = ei.field7
and cvo_ModeOfSep.field5 = 23456
left join table7 cvo_FinancialClass on cvo_FinancialClass.field = e.field8
and cvo_FinancialClass.field5 = 34567
left join table7 cvo_Specialty on cvo_Specialty.field = e.field9
and cvo_Specialty.field5 = 45678
left join table8 ea on ea.field1_id = e.field1_id
left join table7 cvo_ea on cvo_ea.field = ea.field10
and cvo_ea.field11 = 345666
GO
Looking at your code I can't see anything that can be improved in the context of T-SQL statement.
I will advice the following:
check each table and which columns you need in the fields part - it is possible the engine to be reading the whole row, instead the needed columns as index is missing; you can create nonclustered indexes in order to reduce the IO
check if any of these new indexes can be filtered index, as you have a lot of hard coded criteria (ei.field4_cd = 123485.00)
If the above is not enough, you may think of creating separate table for storing this information and populate it in advanced.
In order to debug, you can add the following line before the query:
SET STATISTICS IO ON;
and then past the results from the messages tab here - it will give you some details about for which tables most IO is consumed. You can start with them.
I would investigate breaking this query into multiple parts using derived tables. There are plenty of examples for this online. I always try to use SELECT TOP (2147483647) ....
I have a SQL query as follows - pretty straight forward.
SELECT
p.ProgressClaimID,
min(p.ClaimDate) as ClaimDate,
min(p.PClaimValue) as PClaimValue,
sum(d.total) as TotalDaycost,
sum(i.amount) as TotalInvoice,
sum(round(pcd.QtyClaimed * pcd.SellRate,2)) as SellClaim
FROM
(ProgressClaim as p
LEFT JOIN
ProgressClaimDetail as pcd ON p.ProgressClaimID = pcd.ProgressClaimID)
LEFT JOIN
[DayCost] as d ON p.ProgressClaimID = d.ReportPeriod
LEFT JOIN
Invoice as i ON p.ProgressClaimID = i.ReportPeriod
WHERE
p.projectID = 4
GROUP BY
p.ProgressClaimID
But it is running very slowly (a couple of seconds) with very few rows (a few hundred at most) in SQL Server 2014. To make it more strange, this query runs as expected (pretty much instant) on my identical data on a SQL Server CE database.
In the SQL Server install, if I take out any join - it runs as expected with the remaining 3 tables - regardless of which one is removed.
I have checked FK, indexes etc. Nothing seems obvious. Any pointers appreciated.
***Edit
Execution plan at http://textuploader.com/5eurg (XML)
The problem is in my query design. This is what happens when you have been writing SQL for a while. Essentially the query takes a long time because each left join makes subsequent multiple copies of the rows which are then summed - there is a lot of computation going on by the third join.
I rewrote my query to do what I meant it to do and it is all good.
Embarassing mistake - here it is for the curious
SELECT p.ProgressClaimID, p.ClaimDate, p.PClaimValue,
pcd.pcdTotal,
d.TotalDaycost,
i.TotalInvoice
FROM ProgressClaim as p
left join (select ProgressClaimID, sum(round(QtyClaimed * SellRate,2)) as pcdTotal from ProgressClaimDetail group by ProgressClaimID) as pcd on p.ProgressClaimID=pcd.ProgressClaimID
left join (select ReportPeriod, sum(total) as TotalDaycost from DayCost group by ReportPeriod) as d on p.ProgressClaimID= d.ReportPeriod
left join (select ReportPeriod, sum(amount) as TotalInvoice from Invoice group by ReportPeriod) as i on p.ProgressClaimID= i.ReportPeriod
where p.projectID=4
I ran into a problem with my T-SQL query. I have warehouse database, and I want to add currency exchange rates to my transactions, to see them in EUR and USD.
To do that, I am using Europe Central bank currency rates.
My query looks like this:
SELECT
Companys.Companys_name,
Warehouse_oper.Pavad_num,
[Items]![Quantity]*[Items]![Price] AS Expr1,
[Items]![Quantity]*[Items]![Price]*[Exchange_rates]![USD] AS Expr2
FROM
(Companys
LEFT JOIN
(Exchange_rates
RIGHT JOIN
Warehouse_oper ON Exchange_rates.Date = Warehouse_oper.Date)
ON Companys.Companys_num_d_b = Warehouse_oper.Companys_nr_d_b)
LEFT JOIN
Items ON Warehouse_oper.Warehouse_oper_num_d_b = Items.Warehouse_oper_num_d_b;
Sorry if its hard to understand, because I translated all variables to English.
Anyways this query works fine, if LEFT JOIN (Exchange_rates RIGHT JOIN Warehouse_oper ON Exchange_rates.Date = Warehouse_oper.Date, but bank does not provide them on holidays, so on those dates I have NULL values.
How can I edit this query (I know it's messy, but it's from Access) to SELECT most recent available date?
I tried:
LEFT JOIN
(Exchange_rates RIGHT JOIN Warehouse_oper ON
(SELECT TOP 1 Exchange_rates.Date FROM Exchange_rates WHERE
Exchange_rates.Date <= Warehouse_oper.Date) = Warehouse_oper.Date)
but with no success.
I have a big query for an ETL view that has a cartesian join (see below) which is then left joined to 5 other tables.
SELECT W.Field1, W.Field2
FROM datedim AS d
INNER JOIN employee AS W
ON 1 = 1
The query takes 5 minutes to run hence I'm trying to optimise it. The cartesian join is having a big impact on performance.
Any ideas?
-- Additional info
The Cartesian results are then used in an join below. There several joins very similar to the one below.
LEFT OUTER JOIN detail AS det
ON det.id = W.id
AND d.datevalue >= det.validfrom
AND d.datevalue <= det.validto
I am building a view in SQL Server 2000 (and 2005) and I've noticed that the order of the join statements greatly affects the execution plan and speed of the query.
select sr.WTSASessionRangeID,
-- bunch of other columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStream srs on srs.WTSASessionRangeID = sr.WTSASessionRangeID
--left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
On SQL Server 2000, the query above consistently generates a plan of cost 946. If I uncomment the MO_Stream join in the middle of the query and comment out the one at the bottom, the cost drops to 263. The execution speed drops accordingly. I always thought that the query optimizer would interpret the query appropriately without considering join order, but it seems that order matters.
So since order does seem to matter, is there a join strategy I should be following for writing faster queries?
(Incidentally, on SQL Server 2005, with almost identical data, the query plan costs were 0.675 and 0.631 respectively.)
Edit: On SQL Server 2000, here are the profiled stats:
946-cost query: 9094ms CPU, 5121 reads, 0 writes, 10123ms duration
263-cost query: 172ms CPU, 7477 reads, 0 writes, 170ms duration
Edit: Here is the logical structure of the tables.
SessionRange ---+--- SessionRangeTutor
|--- SessionRangeClass
|--- SessionRangeStream --- MO_Stream
|--- SessionRangeEnrolmentPeriod
|--- SessionRangeStudent
+----SessionSubrange --- SessionSubrangeRoom
Edit: Thanks to Alex and gbn for pointing me in the right direction. I also found this question.
Here's the new query:
select sr.WTSASessionRangeID // + lots of columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRangeStream is a many-to-many mapping table between SessionRange and MO_Stream
left outer join (
WTSA_SessionRangeStream srs
inner join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
) on srs.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRanges MAY have Subranges and Subranges MAY have Rooms
left outer join (
WTSA_SessionSubrange ssr
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
SQLServer2000 cost: 24.9
I have to disagree with all previous answers, and the reason is simple: if you change the order of your left join, your queries are logically different and as such they produce different result sets. See for yourself:
SELECT 1 AS a INTO #t1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4;
SELECT 1 AS b INTO #t2
UNION ALL SELECT 2;
SELECT 1 AS c INTO #t3
UNION ALL SELECT 3;
SELECT a, b, c
FROM #t1 LEFT JOIN #t2 ON #t1.a=#t2.b
LEFT JOIN #t3 ON #t2.b=#t3.c
ORDER BY a;
SELECT a, b, c
FROM #t1 LEFT JOIN #t3 ON #t1.a=#t3.c
LEFT JOIN #t2 ON #t3.c=#t2.b
ORDER BY a;
a b c
----------- ----------- -----------
1 1 1
2 2 NULL
3 NULL NULL
4 NULL NULL
(4 row(s) affected)
a b c
----------- ----------- -----------
1 1 1
2 NULL NULL
3 NULL 3
4 NULL NULL
The join order does make a difference to the resulting query. This is documented in BOL in the docs for FROM:
<joined_table>
Is a result set that is the product of two or more tables. For multiple joins, use parentheses to change the natural order of the joins.
You can alter the join order using parenthesis around the joins (BOL does show this in the syntax at the top of the docs, but it is easy to miss).
This is known as chiastic behaviour. You can also use the query hint OPTION (FORCE ORDER) to force a specific join order, but this can result in what are called "bushy plans" which may not be the most optimal for the query being executed.
Obviously, the SQL Server 2005 optimizer is a lot better than the SQL Server 2000 one.
However, there's a lot of truth in your question. Outer joins will cause execution to vary wildly based on order (inner joins tend to be optimized to the most efficient route, but again, order matters). If you think about it, as you build up left joins, you need to figure out what the heck is on the left. As such, each join must be calculated before every other join can be done. It becomes sequential, and not parallel. Now, obviously, there are things you can do to combat this (such as indexes, views, etc). But, the point stands: The table needs to know what's on the left before it can do a left outer join. And if you just keep adding joins, you're getting more and more abstraction to what, exactly is on the left (especially if you use joined tables as the left table!).
With inner joins, however, you can parallelize those quite a bit, so there's less of a dramatic difference as far as order's concerned.
A general strategy for optimizing queries containing JOINs is to look at your data model and the data and try to determine which JOINs will reduce number of records that must be considered the most quickly. The fewer records that must be considered, the faster the query will run. The server will generally produce a better query plan too.
Along with the above optimization make sure that any fields used in JOINs are indexed
You query is probably wrong anyway. Alex is correct. Eric may be correct too, but the query is wrong.
Lets' take this subset:
WTSA_SessionRange sr
left outer join
WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join
WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
You are joining WTSA_SessionSubrangeRoom onto WTSA_SessionSubrange. You may have no rows from WTSA_SessionSubrange.
The join should be this:
WTSA_SessionRange sr
left outer join
(SELECT WTSASessionRangeID, columns I need
FROM
WTSA_SessionSubrange ssr
left outer join
WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) foo on foo.WTSASessionRangeID = sr.WTSASessionRangeID
This is why the join order is affecting results because it's a different query, declaratively speaking.
You'd also need to change the MO_Stream and WTSA_SessionRangeStream join too.
it depends on which of the join fields are indexed - if it has to table scan the first field, but use an index on the second, it's slow. If your first join field is an index, it'll be quicker. My guess is that 2005 optimizes it better by determining the indexed fields and performing those first
At DevConnections a few years ago a session on SQL Server performance stated that (a) order of outer joins DOES matter, and (b) when a query has a lot of joins, it will not look at all of them before making a determination on a plan. If you know you have joins that will help speed up a query, they should be early on in the FROM list (if you can).