Custom order for multiple joins in SQL Server - sql-server

I am trying to write a query in SQL Server that replicates following figure:
I want the result of first left join (order_defect & ncdef) to be left join with third table (filter) and again the result of these three left join with last one (nsdic).
Each of these tables are huge, so I'm trying to find most efficient way to do it because i have limited space and I get "out of memory" error... any suggestion for an efficient query?
If I do:
Select *
from A
left join B on a.id = B.id
left join C on a.id = c.id
it's joining A and B first and then A and C...but I want the result of "A & B" to be join with "C".
Basically my question is how to use a result of one join, to join with another table.
Thank you

select
c.id
,c.colum1
,c.colum2
,c.colum3
,c.colum4
,t3.colum1
from
(
select
t1.id as id
,t1.colum1 as colum1
,t1.colum2 as colum2
,t2.column1 as colum3
,t2.colum2 as colum4
from table1 t1
left join table2 t2
on t1.id = t2.id
) as c
left join table3 t3
on c.id = t3.id

It's dificult to help you without the tables design and fields|columns, keys, ...
But I'll considerate:
1 - Primary keys fields, and how to relation the tables
2 - How to add left joins with "filters", or how to reduce the number of results
3 - Evaluate if it'll be better to use Sub-querys
Plus: Try the query with TOP 100 <--- to make test.
And remember: sometimes it's imposible to optimizate querys because of the hardware limits, like the RAM, in those case you have to show the data in sections.

Related

MS SQL Server trouble with JOIN

I'm new to SQL and need a push in the right direction.
I currently have a working SQL query accessing 3 tables in a database, and I need to add a JOIN using a 4th table, but the syntax escapes me. To simplify, what I have now is:
SELECT
t1_col1, t1_col2, t2_col1, t2_col2, t3_col1
FROM
table1, table2, table3
WHERE
{some conditions}
ORDER BY
t1_col1 ASC;
What I need to do is to add a LEFT OUTER JOIN selecting 2 columns from table4 and have ON t1_field1 = t4_field1, but whatever I try, I'm getting syntax errors all over the place. I don't seem to understand the correct syntax.
I tried
SELECT *
FROM table1
LEFT OUTER JOIN table2;
which has no errors, but as soon as I start SELECTing columns and adding conditions, I get stuck.
I would greatly appreciate any assistance with this.
You do not specify the join criteria. Those would be in your WHERE clause under "some conditions". So, I will make up so that I can show syntax. The syntax you show is often termed "old". It has been discouraged for 15 years or more in the SQL Server documentation. Microsoft consistently threatens to stop recognizing the syntax. But, apparently they have not followed through on that threat.
The syntax errors you are getting occur because you are mixing the old style joins (comma separated with WHERE clause) with the new style (LEFT OUTER JOIN) with ON clauses.
Your existing query should be changed to something like this. The tables are aliased because it makes it easier to read and is customary. I just made up the JOIN criteria.
SELECT t1_col1, t1_col2, t2_col1, t2_col2, t3_col1
FROM table1 t1
INNER JOIN table2 t2 ON t2.one_ID = t1.one_ID
INNER JOIN table3 t3 ON t3.two_ID = t2.two_ID
LEFT OUTER JOIN table4 t4 ON t4.three_ID = t3.three_ID
I hope that helps with "a push in the right direction."
You may also want to read this post that explains the different ways to join tables in a query. What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?
Also for the record the "OLD STYLE" of joining tables (NOT RECOMMENDED) would look like this (but do NOT do this - it is a horrible way to write SQL). And it does not work for left outer joins. Get familiar with using the ...JOIN...ON... syntax:
SELECT t1_col1, t1_col2, t2_col1, t2_col2, t3_col1
FROM table1 t1, table2 t2, table3 t3
LEFT OUTER JOIN table4 t4 ON t4.three_ID = t3.three_ID
WHERE
t2.one_ID = t1.one_ID
AND t3.two_ID = t2.two_ID

LEFT OUTER JOIN 3 TABLES -SQL SERVER

I need to join 3 tables using left outer join .Let me give you an example using 3 tables .In the below image we can see Table:[A.TableMaster] the drugDescription Column must be equal to Table:[C.Table2]-DrugDescription column [A.drugDescription=C.DrugDescription] and [C.drug=B.drug] according to the drug the Price is to be assigned from Table:[B.Table1] .
and also with the b.date
in simple english the user selects a date for the particular drug and price assigned for the drug
Image of the Tables:TableMaster,Table1,Table2
e.g
case WHEN Drug='OCTAGAM' THEN [b.price],
but i m not able to relate with the outer join and the three table seems confusing please Help..
I guess the query is simple as that:
SELECT a.*, c.Price
FROM TableMaster AS a
LEFT OUTER JOIN Table2 AS b ON b.DrugDescription = a.DrugDescription
LEFT OUTER JOIN Table1 AS c ON c.Drug = b.Drug
AND c.Date = a.Date

Does this query working like cross join?

I am trying to run this query but couldn't understand its working process.
SELECT *
FROM TABLE1 T1
INNER JOIN TABLE2 T2 ON T1.ID = T2.ID
LEFT JOIN TABLE1 T3 ON T1.ID = T2.ID
table1 contains 3 records with sequential id 1, 2, 3 and table2 contains 4 records with sequential id 1, 2, 3, 4
and one another thing i also want to know that is, does this query executing from right to left? i mean left join process executes first then inner join? i am saying this according to query execution plan.
Here is your current query:
SELECT *
FROM table1 t1
INNER JOIN table2 t2
ON t1.id = t2.id
LEFT JOIN table1 t3
ON t1.id = t2.id
The first join is just a normal inner join between table1 and table2. The second join is a left join, but the ON condition is superfluous, and I believe the behavior would be the same without this condition even being there. The reason for this is that the join condition t1.id = t2.id will already always be true at that point in the query, for every record in the intermediate table. Hence, it appears that the second join would effectively be a cross join with table1.
Typically, your join condition will involve the two tables being joined.
Yes it's working like a cross join. If this is the intention you should rewrite it correctly. What you have there is misleading and confusing (where did it come from originally?)
select * from table1 t1
inner join table2 t2
on t1.id=t2.id
cross join table1 t3
The order that tables, filters, joins are evalulated are dictated by the query plan (press CTRL-L). This may change at any time. You shouldn't be concerned about the ordered these run in - you just need to know that you will get the same results no matter how it is executed. The query planner might choose one method over the other if it thinks it will be faster

SQL Server speed: left outer join vs inner join

In theory, why would inner join work remarkably faster then left outer join given the fact that both queries return same result set. I had a query which would take long time to describe, but this is what I saw changing single join: left outer join - 6 sec, inner join - 0 sec (the rest of the query is the same). Result set: the same
Actually depending on the data, left outer join and inner join would not return the same results..most likely left outer join will have more result and again depends on the data..
I'd be worried if I changed a left join to an inner join and the results were not different. I would suspect that you have a condition on the left side of the table in the where clause effectively (and probably incorrectly) turning it into an inner join.
Something like:
select *
from table1 t1
left join table2 t2 on t1.myid = t2.myid
where t2.somefield = 'something'
Which is not the same thing as
select *
from table1 t1
left join table2 t2
on t1.myid = t2.myid and t2.somefield = 'something'
So first I would be worried that my query was incorrect to begin with, then I would worry about performance. An inner join is NOT a performance enhancement for a Left Join, they mean two different things and should return different results unless you have a table where there will always be a match for every record. In this case you change to an inner join because the other is incorrect not to improve performance.
My best guess as to the reason the left join takes longer is that it is joining to many more rows that then get filtered out by the where clause. But that is just a wild guess. To know you need to look at the Execution plans.

Why does the order of join clauses affect the query plan in SQL Server?

I am building a view in SQL Server 2000 (and 2005) and I've noticed that the order of the join statements greatly affects the execution plan and speed of the query.
select sr.WTSASessionRangeID,
-- bunch of other columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStream srs on srs.WTSASessionRangeID = sr.WTSASessionRangeID
--left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
On SQL Server 2000, the query above consistently generates a plan of cost 946. If I uncomment the MO_Stream join in the middle of the query and comment out the one at the bottom, the cost drops to 263. The execution speed drops accordingly. I always thought that the query optimizer would interpret the query appropriately without considering join order, but it seems that order matters.
So since order does seem to matter, is there a join strategy I should be following for writing faster queries?
(Incidentally, on SQL Server 2005, with almost identical data, the query plan costs were 0.675 and 0.631 respectively.)
Edit: On SQL Server 2000, here are the profiled stats:
946-cost query: 9094ms CPU, 5121 reads, 0 writes, 10123ms duration
263-cost query: 172ms CPU, 7477 reads, 0 writes, 170ms duration
Edit: Here is the logical structure of the tables.
SessionRange ---+--- SessionRangeTutor
|--- SessionRangeClass
|--- SessionRangeStream --- MO_Stream
|--- SessionRangeEnrolmentPeriod
|--- SessionRangeStudent
+----SessionSubrange --- SessionSubrangeRoom
Edit: Thanks to Alex and gbn for pointing me in the right direction. I also found this question.
Here's the new query:
select sr.WTSASessionRangeID // + lots of columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRangeStream is a many-to-many mapping table between SessionRange and MO_Stream
left outer join (
WTSA_SessionRangeStream srs
inner join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
) on srs.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRanges MAY have Subranges and Subranges MAY have Rooms
left outer join (
WTSA_SessionSubrange ssr
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
SQLServer2000 cost: 24.9
I have to disagree with all previous answers, and the reason is simple: if you change the order of your left join, your queries are logically different and as such they produce different result sets. See for yourself:
SELECT 1 AS a INTO #t1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4;
SELECT 1 AS b INTO #t2
UNION ALL SELECT 2;
SELECT 1 AS c INTO #t3
UNION ALL SELECT 3;
SELECT a, b, c
FROM #t1 LEFT JOIN #t2 ON #t1.a=#t2.b
LEFT JOIN #t3 ON #t2.b=#t3.c
ORDER BY a;
SELECT a, b, c
FROM #t1 LEFT JOIN #t3 ON #t1.a=#t3.c
LEFT JOIN #t2 ON #t3.c=#t2.b
ORDER BY a;
a b c
----------- ----------- -----------
1 1 1
2 2 NULL
3 NULL NULL
4 NULL NULL
(4 row(s) affected)
a b c
----------- ----------- -----------
1 1 1
2 NULL NULL
3 NULL 3
4 NULL NULL
The join order does make a difference to the resulting query. This is documented in BOL in the docs for FROM:
<joined_table>
Is a result set that is the product of two or more tables. For multiple joins, use parentheses to change the natural order of the joins.
You can alter the join order using parenthesis around the joins (BOL does show this in the syntax at the top of the docs, but it is easy to miss).
This is known as chiastic behaviour. You can also use the query hint OPTION (FORCE ORDER) to force a specific join order, but this can result in what are called "bushy plans" which may not be the most optimal for the query being executed.
Obviously, the SQL Server 2005 optimizer is a lot better than the SQL Server 2000 one.
However, there's a lot of truth in your question. Outer joins will cause execution to vary wildly based on order (inner joins tend to be optimized to the most efficient route, but again, order matters). If you think about it, as you build up left joins, you need to figure out what the heck is on the left. As such, each join must be calculated before every other join can be done. It becomes sequential, and not parallel. Now, obviously, there are things you can do to combat this (such as indexes, views, etc). But, the point stands: The table needs to know what's on the left before it can do a left outer join. And if you just keep adding joins, you're getting more and more abstraction to what, exactly is on the left (especially if you use joined tables as the left table!).
With inner joins, however, you can parallelize those quite a bit, so there's less of a dramatic difference as far as order's concerned.
A general strategy for optimizing queries containing JOINs is to look at your data model and the data and try to determine which JOINs will reduce number of records that must be considered the most quickly. The fewer records that must be considered, the faster the query will run. The server will generally produce a better query plan too.
Along with the above optimization make sure that any fields used in JOINs are indexed
You query is probably wrong anyway. Alex is correct. Eric may be correct too, but the query is wrong.
Lets' take this subset:
WTSA_SessionRange sr
left outer join
WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join
WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
You are joining WTSA_SessionSubrangeRoom onto WTSA_SessionSubrange. You may have no rows from WTSA_SessionSubrange.
The join should be this:
WTSA_SessionRange sr
left outer join
(SELECT WTSASessionRangeID, columns I need
FROM
WTSA_SessionSubrange ssr
left outer join
WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) foo on foo.WTSASessionRangeID = sr.WTSASessionRangeID
This is why the join order is affecting results because it's a different query, declaratively speaking.
You'd also need to change the MO_Stream and WTSA_SessionRangeStream join too.
it depends on which of the join fields are indexed - if it has to table scan the first field, but use an index on the second, it's slow. If your first join field is an index, it'll be quicker. My guess is that 2005 optimizes it better by determining the indexed fields and performing those first
At DevConnections a few years ago a session on SQL Server performance stated that (a) order of outer joins DOES matter, and (b) when a query has a lot of joins, it will not look at all of them before making a determination on a plan. If you know you have joins that will help speed up a query, they should be early on in the FROM list (if you can).

Resources