SQL Cartesian join optimisation - sql-server

I have a big query for an ETL view that has a cartesian join (see below) which is then left joined to 5 other tables.
SELECT W.Field1, W.Field2
FROM datedim AS d
INNER JOIN employee AS W
ON 1 = 1
The query takes 5 minutes to run hence I'm trying to optimise it. The cartesian join is having a big impact on performance.
Any ideas?
-- Additional info
The Cartesian results are then used in an join below. There several joins very similar to the one below.
LEFT OUTER JOIN detail AS det
ON det.id = W.id
AND d.datevalue >= det.validfrom
AND d.datevalue <= det.validto

Related

joining a small table significantly slows down a query

Problem: Joining a relatively small table into a query with many joins doubles the execution time of a query.
Challenge: How could query or data structures be optimized so that the query executes much faster?
There are indexes everywhere, and statistics are actual.
3 passes, measured with SQL Server Profiler:
Without extra join: avg 3741ms
With extra join: avg 6733ms - this is +80%
Basically, the query is about filtering and aggregating values from a large budget table.
Table budgetcontains around 700,000 records
For filtering, we have tables f1,f2,f3 which contain values selected by user. These filter tables have 1 to max. 70 records
For aggregating, we have tables a1,a2,a3 which are used for aggregating the dimensions to group aggregates.
Budget Table: 700,000 records
Final Result: 2,400 records
Original query:
select a1.agg1, a2.agg2, a3.agg3, sum(b.value)
from
budget b -- 700.000 records
inner join f1 on b.dim1 = f1.dim1
inner join f2 on b.dim2 = f2.dim2
inner join f3 on b.dim3 = f3.dim3
inner join a1 on b.dim1 = a1.dim1
inner join a2 on b.dim2 = a2.dim2
inner join a3 on b.dim3 = a3.dim3
group by a1.agg1, a2.agg2, a3.agg3
order by a1.agg1, a2.agg2, a3.agg3
Now I'm adding the left join. It brings in an additional count of an extra table which holds comments to some figures.
So this join adds a column in the result, but does not change the rows.
The table comments only has around 1,500 records.
New query:
select a1.agg1, a2.agg2, a3.agg3, sum(b.value), count(c.CommentText) as commentcount
from
budget b -- 700.000 records
inner join f1 on b.dim1 = f1.dim1
inner join f2 on b.dim2 = f2.dim2
inner join f3 on b.dim3 = f3.dim3
inner join a1 on b.dim1 = a1.dim1
inner join a2 on b.dim2 = a2.dim2
inner join a3 on b.dim3 = a3.dim3
left join comments c on b.dim1= c.dim1 and b.dim2=c.dim2 and b.dim3=c.dim3
group by a1.agg1, a2.agg2, a3.agg3
order by a1.agg1, a2.agg2, a3.agg3
I compared the execution plans of both queries in detail. The extra left join leads to 3 extra operations with the following costs:
Clustered Index scan (on the joined table): 1%
Merge Join (left outer join): 10%
Sort: 13%
The Sort is an extra sort which appears only with the join. With extra join, I have two sorts (one before and one after the merge join), otherwise only one sort.
Interestingly, the costs are not really matching with reality. But anyway. The basic question is: How could that be improved? Any ideas?

Lookup on huge table is not happening sql-server

I am using sql-server this is my query:
select asst_id,camp_asst.amp_asst_id,asst.camp_asst_id,lyty_no,campaign_id
into camp.asst_respy
from camp.asst_respy respy
inner join camp.camp_wave wave on wave.wave_cd=resp.camp_id
inner join camp.camp_cust custy on cust.cust_lyty_no=resp.big_id
inner join camp.camp_asst assty on asst.sst_trck_url=resp.dum_url
inner join camp.camp_camp_assty camp_asst on camp_asst.camp_asst_id=asst.asst_id
inner join camp.camp_cust_assty cust_asst on cust_asst.camp_camp_asst_id=camp_asst.asst_id -- this table has about 16 billion rows.
inner join camp.camp_camp_custy camp_cust on camp_cust.camp_camp_cust_id=cust_asst.cust_id
please somebody guide me in doing the join,the join is taking very long time. to happen
and there are indexes defined on table,I am looking to partition the table to make this happen please guide
remaining all tables used have about >10 Million rows.

SQL Server speed: left outer join vs inner join

In theory, why would inner join work remarkably faster then left outer join given the fact that both queries return same result set. I had a query which would take long time to describe, but this is what I saw changing single join: left outer join - 6 sec, inner join - 0 sec (the rest of the query is the same). Result set: the same
Actually depending on the data, left outer join and inner join would not return the same results..most likely left outer join will have more result and again depends on the data..
I'd be worried if I changed a left join to an inner join and the results were not different. I would suspect that you have a condition on the left side of the table in the where clause effectively (and probably incorrectly) turning it into an inner join.
Something like:
select *
from table1 t1
left join table2 t2 on t1.myid = t2.myid
where t2.somefield = 'something'
Which is not the same thing as
select *
from table1 t1
left join table2 t2
on t1.myid = t2.myid and t2.somefield = 'something'
So first I would be worried that my query was incorrect to begin with, then I would worry about performance. An inner join is NOT a performance enhancement for a Left Join, they mean two different things and should return different results unless you have a table where there will always be a match for every record. In this case you change to an inner join because the other is incorrect not to improve performance.
My best guess as to the reason the left join takes longer is that it is joining to many more rows that then get filtered out by the where clause. But that is just a wild guess. To know you need to look at the Execution plans.

Why does the order of join clauses affect the query plan in SQL Server?

I am building a view in SQL Server 2000 (and 2005) and I've noticed that the order of the join statements greatly affects the execution plan and speed of the query.
select sr.WTSASessionRangeID,
-- bunch of other columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStream srs on srs.WTSASessionRangeID = sr.WTSASessionRangeID
--left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
left outer join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
On SQL Server 2000, the query above consistently generates a plan of cost 946. If I uncomment the MO_Stream join in the middle of the query and comment out the one at the bottom, the cost drops to 263. The execution speed drops accordingly. I always thought that the query optimizer would interpret the query appropriately without considering join order, but it seems that order matters.
So since order does seem to matter, is there a join strategy I should be following for writing faster queries?
(Incidentally, on SQL Server 2005, with almost identical data, the query plan costs were 0.675 and 0.631 respectively.)
Edit: On SQL Server 2000, here are the profiled stats:
946-cost query: 9094ms CPU, 5121 reads, 0 writes, 10123ms duration
263-cost query: 172ms CPU, 7477 reads, 0 writes, 170ms duration
Edit: Here is the logical structure of the tables.
SessionRange ---+--- SessionRangeTutor
|--- SessionRangeClass
|--- SessionRangeStream --- MO_Stream
|--- SessionRangeEnrolmentPeriod
|--- SessionRangeStudent
+----SessionSubrange --- SessionSubrangeRoom
Edit: Thanks to Alex and gbn for pointing me in the right direction. I also found this question.
Here's the new query:
select sr.WTSASessionRangeID // + lots of columns
from WTSAVW_UserSessionRange us
inner join WTSA_SessionRange sr on sr.WTSASessionRangeID = us.WTSASessionRangeID
left outer join WTSA_SessionRangeTutor srt on srt.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeClass src on src.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeEnrolmentPeriod srep on srep.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join WTSA_SessionRangeStudent stsd on stsd.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRangeStream is a many-to-many mapping table between SessionRange and MO_Stream
left outer join (
WTSA_SessionRangeStream srs
inner join MO_Stream ms on ms.MOStreamID = srs.MOStreamID
) on srs.WTSASessionRangeID = sr.WTSASessionRangeID
// SessionRanges MAY have Subranges and Subranges MAY have Rooms
left outer join (
WTSA_SessionSubrange ssr
left outer join WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
SQLServer2000 cost: 24.9
I have to disagree with all previous answers, and the reason is simple: if you change the order of your left join, your queries are logically different and as such they produce different result sets. See for yourself:
SELECT 1 AS a INTO #t1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4;
SELECT 1 AS b INTO #t2
UNION ALL SELECT 2;
SELECT 1 AS c INTO #t3
UNION ALL SELECT 3;
SELECT a, b, c
FROM #t1 LEFT JOIN #t2 ON #t1.a=#t2.b
LEFT JOIN #t3 ON #t2.b=#t3.c
ORDER BY a;
SELECT a, b, c
FROM #t1 LEFT JOIN #t3 ON #t1.a=#t3.c
LEFT JOIN #t2 ON #t3.c=#t2.b
ORDER BY a;
a b c
----------- ----------- -----------
1 1 1
2 2 NULL
3 NULL NULL
4 NULL NULL
(4 row(s) affected)
a b c
----------- ----------- -----------
1 1 1
2 NULL NULL
3 NULL 3
4 NULL NULL
The join order does make a difference to the resulting query. This is documented in BOL in the docs for FROM:
<joined_table>
Is a result set that is the product of two or more tables. For multiple joins, use parentheses to change the natural order of the joins.
You can alter the join order using parenthesis around the joins (BOL does show this in the syntax at the top of the docs, but it is easy to miss).
This is known as chiastic behaviour. You can also use the query hint OPTION (FORCE ORDER) to force a specific join order, but this can result in what are called "bushy plans" which may not be the most optimal for the query being executed.
Obviously, the SQL Server 2005 optimizer is a lot better than the SQL Server 2000 one.
However, there's a lot of truth in your question. Outer joins will cause execution to vary wildly based on order (inner joins tend to be optimized to the most efficient route, but again, order matters). If you think about it, as you build up left joins, you need to figure out what the heck is on the left. As such, each join must be calculated before every other join can be done. It becomes sequential, and not parallel. Now, obviously, there are things you can do to combat this (such as indexes, views, etc). But, the point stands: The table needs to know what's on the left before it can do a left outer join. And if you just keep adding joins, you're getting more and more abstraction to what, exactly is on the left (especially if you use joined tables as the left table!).
With inner joins, however, you can parallelize those quite a bit, so there's less of a dramatic difference as far as order's concerned.
A general strategy for optimizing queries containing JOINs is to look at your data model and the data and try to determine which JOINs will reduce number of records that must be considered the most quickly. The fewer records that must be considered, the faster the query will run. The server will generally produce a better query plan too.
Along with the above optimization make sure that any fields used in JOINs are indexed
You query is probably wrong anyway. Alex is correct. Eric may be correct too, but the query is wrong.
Lets' take this subset:
WTSA_SessionRange sr
left outer join
WTSA_SessionSubrange ssr on ssr.WTSASessionRangeID = sr.WTSASessionRangeID
left outer join
WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
You are joining WTSA_SessionSubrangeRoom onto WTSA_SessionSubrange. You may have no rows from WTSA_SessionSubrange.
The join should be this:
WTSA_SessionRange sr
left outer join
(SELECT WTSASessionRangeID, columns I need
FROM
WTSA_SessionSubrange ssr
left outer join
WTSA_SessionSubrangeRoom ssrr on ssrr.WTSASessionSubrangeID = ssr.WTSASessionSubrangeID
) foo on foo.WTSASessionRangeID = sr.WTSASessionRangeID
This is why the join order is affecting results because it's a different query, declaratively speaking.
You'd also need to change the MO_Stream and WTSA_SessionRangeStream join too.
it depends on which of the join fields are indexed - if it has to table scan the first field, but use an index on the second, it's slow. If your first join field is an index, it'll be quicker. My guess is that 2005 optimizes it better by determining the indexed fields and performing those first
At DevConnections a few years ago a session on SQL Server performance stated that (a) order of outer joins DOES matter, and (b) when a query has a lot of joins, it will not look at all of them before making a determination on a plan. If you know you have joins that will help speed up a query, they should be early on in the FROM list (if you can).

LEFT JOIN vs. LEFT OUTER JOIN in SQL Server

What is the difference between LEFT JOIN and LEFT OUTER JOIN?
As per the documentation: FROM (Transact-SQL):
<join_type> ::=
[ { INNER | { { LEFT | RIGHT | FULL } [ OUTER ] } } [ <join_hint> ] ]
JOIN
The keyword OUTER is marked as optional (enclosed in square brackets). In this specific case, whether you specify OUTER or not makes no difference. Note that while the other elements of the join clause is also marked as optional, leaving them out will make a difference.
For instance, the entire type-part of the JOIN clause is optional, in which case the default is INNER if you just specify JOIN. In other words, this is legal:
SELECT *
FROM A JOIN B ON A.X = B.Y
Here's a list of equivalent syntaxes:
A LEFT JOIN B A LEFT OUTER JOIN B
A RIGHT JOIN B A RIGHT OUTER JOIN B
A FULL JOIN B A FULL OUTER JOIN B
A INNER JOIN B A JOIN B
Also take a look at the answer I left on this other SO question: SQL left join vs multiple tables on FROM line?.
To answer your question there is no difference between LEFT JOIN
and LEFT OUTER JOIN, they are exactly same that said...
At the top level there are mainly 3 types of joins:
INNER
OUTER
CROSS
INNER JOIN - fetches data if present in both the tables.
OUTER JOIN are of 3 types:
LEFT OUTER JOIN - fetches data if present in the left table.
RIGHT OUTER JOIN - fetches data if present in the right table.
FULL OUTER JOIN - fetches data if present in either of the two tables.
CROSS JOIN, as the name suggests, does [n X m] that joins everything to everything.
Similar to scenario where we simply lists the tables for joining (in the FROM clause of the SELECT statement), using commas to separate them.
Points to be noted:
If you just mention JOIN then by default it is a INNER JOIN.
An OUTER join has to be LEFT | RIGHT | FULL you can not simply say OUTER JOIN.
You can drop OUTER keyword and just say LEFT JOIN or RIGHT JOIN or FULL JOIN.
For those who want to visualise these in a better way, please go to this link:
A Visual Explanation of SQL Joins
What is the difference between left join and left outer join?
Nothing. LEFT JOIN and LEFT OUTER JOIN are equivalent.
Left Join and Left Outer Join are one and the same. The former is the shorthand for the latter. The same can be said about the Right Join and Right Outer Join relationship. The demonstration will illustrate the equality. Working examples of each query have been provided via SQL Fiddle. This tool will allow for hands on manipulation of the query.
Given
Left Join and Left Outer Join
Results
Right Join and Right Outer Join
Results
I'm a PostgreSQL DBA, as far as I could understand the difference between outer or not outer joins difference is a topic that has considerable discussion all around the internet. Until today I never saw a difference between those two; So I went further and I try to find the difference between those.
At the end I read the whole documentation about it and I found the answer for this,
So if you look on documentation (at least in PostgreSQL) you can find this phrase:
"The words INNER and OUTER are optional in all forms. INNER is the default; LEFT, RIGHT, and FULL imply an outer join."
In another words,
LEFT JOIN and LEFT OUTER JOIN ARE THE SAME
RIGHT JOIN and RIGHT OUTER JOIN ARE THE SAME
I hope it can be a contribute for those who are still trying to find the answer.
I find it easier to think of Joins in the following order:
CROSS JOIN - a Cartesian product of both tables. ALL joins begin here
INNER JOIN - a CROSS JOIN with a filter added.
OUTER JOIN - an INNER JOIN with missing elements (from either LEFT or RIGHT table)
added afterward.
Until I figured out this (relatively) simple model, JOINS were always a bit more of a black art. Now they make perfect sense.
Hope this helps more than it confuses.
To answer your question
In Sql Server joins syntax OUTER is optional
It is mentioned in msdn article : https://msdn.microsoft.com/en-us/library/ms177634(v=sql.130).aspx
So following list shows join equivalent syntaxes with and without OUTER
LEFT OUTER JOIN => LEFT JOIN
RIGHT OUTER JOIN => RIGHT JOIN
FULL OUTER JOIN => FULL JOIN
Other equivalent syntaxes
INNER JOIN => JOIN
CROSS JOIN => ,
Strongly Recommend Dotnet Mob Artice : Joins in Sql Server
Why are LEFT/RIGHT and LEFT OUTER/RIGHT OUTER the same? Let's explain why this vocabulary.
Understand that LEFT and RIGHT joins are specific cases of the OUTER join, and therefore couldn't be anything else than OUTER LEFT/OUTER RIGHT. The OUTER join is also called FULL OUTER as opposed to LEFT and RIGHT joins that are PARTIAL results of the OUTER join. Indeed:
Table A | Table B Table A | Table B Table A | Table B Table A | Table B
1 | 5 1 | 1 1 | 1 1 | 1
2 | 1 2 | 2 2 | 2 2 | 2
3 | 6 3 | null 3 | null - | -
4 | 2 4 | null 4 | null - | -
null | 5 - | - null | 5
null | 6 - | - null | 6
OUTER JOIN (FULL) LEFT OUTER (partial) RIGHT OUTER (partial)
It is now clear why those operations have aliases, as well as it is clear only 3 cases exist: INNER, OUTER, CROSS. With two sub-cases for the OUTER.
The vocabulary, the way teachers explain this, as well as some answers above, often make it looks like there are lots of different types of join. But it's actually very simple.
There are only 3 joins:
A) Cross Join = Cartesian (E.g: Table A, Table B)
B) Inner Join = JOIN (E.g: Table A Join/Inner Join Table
B)
C) Outer join:
There are three type of outer join
Left Outer Join = Left Join
Right Outer Join = Right Join
Full Outer Join = Full Join
There are mainly three types of JOIN
Inner: fetches data, that are present in both tables
Only JOIN means INNER JOIN
Outer: are of three types
LEFT OUTER - - fetches data present only in left table & matching condition
RIGHT OUTER - - fetches data present only in right table & matching condition
FULL OUTER - - fetches data present any or both table
(LEFT or RIGHT or FULL) OUTER JOIN can be written w/o writing "OUTER"
Cross Join: joins everything to everything
Syntactic sugar, makes it more obvious to the casual reader that the join isn't an inner one.
Just in the context of this question, I want to post the 2 'APPLY' operators as well:
JOINS:
INNER JOIN = JOIN
OUTER JOIN
LEFT OUTER JOIN = LEFT JOIN
RIGHT OUTER JOIN = RIGHT JOIN
FULL OUTER JOIN = FULL JOIN
CROSS JOIN
SELF-JOIN: This is not exactly a separate type of join. This is basically joining a table to itself using one of the above joins. But I felt it is worth mentioning in the context JOIN discussions as you will hear this term from many in the SQL Developer community.
APPLY:
CROSS APPLY -- Similar to INNER JOIN (But has added advantage of being able to compute something in the Right table for each row of the Left table and would return only the matching rows)
OUTER APPLY -- Similar to LEFT OUTER JOIN (But has added advantage of being able to compute something in the Right table for each row of the Left table and would return all the rows from the Left table irrespective of a match on the Right table)
https://www.mssqltips.com/sqlservertip/1958/sql-server-cross-apply-and-outer-apply/
https://sqlhints.com/2016/10/23/outer-apply-in-sql-server/
Real life example, when to use OUTER / CROSS APPLY in SQL
I find APPLY operator very beneficial as they give better performance than having to do the same computation in a subquery. They are also replacement of many Analytical functions in older versions of SQL Server. That is why I believe that after being comfortable with JOINS, one SQL developer should try to learn the APPLY operators next.

Resources