how to improve performance when rewrite join SQL?

how to improve performance when rewrite join SQL? - sql-server

Suppose I have 2 table need to join. There are 2 way to write the sql:
select * from taba a join tabb b on a.id =b.id where ...
select * from taba a, tabb b where a.id = b.id and ...
which one has better performance or this is only syntax issue with different SQL standard regardless of performance?

Has been already answered here
stackoverflow.com/questions/1129923/is-a-join-faster-than-a-where
The query optimizer usually use more a join than a where clause (so in theory is better the join) but the last word is said by the db engine you're using
The best advice is to try

Related

Merge Join over sorted columns instead of Hash Join

I have two tables
Table A
(
id int,
name varchar(39),
lname varchar (49),
...
)
Table B
(
id int,
city varchar(39),
...
)
Both tables are sorted on column ID. IDs are simply identities and are populated by auto incremented integers 1 to n.
However, if I input a query e.g.,
SELECT *
FROM A, B
WHERE A.id = B.id;
I get a hash join instead of the efficient merge join. How can I enforce the merge join in SQL Server instead? I don't want to use an index, thus no index-based plans.
Note that I don't want a merge-join with a sort-enforcer either, I know that one can hint the planner by rewriting the query to
SELECT *
FROM A
INNER MERGE JOIN B ON A.ID = B.ID;
By the way I'm using SQL Server Express edition. But I can change to any open source DB if the latter supports the query plan that I'm aiming.
Thanks in advance

If you believe you are smarted then the SQL Engine :-) you can use hits like this:
SELECT *
FROM A
INNER HASH JOIN B
ON A.id = B.id
OR
SELECT *
FROM A
INNER MERGE JOIN B
ON A.id = B.id
At least, you can test if the MERGE will be really better. And even it is better in this case, does not mean that it will be the best choice always. It can reduce the performance in other cases, so generally it will be better to leave this work to the engine.

SQLite join multiple values from two tables [duplicate]

Is there any difference (performance, best-practice, etc...) between putting a condition in the JOIN clause vs. the WHERE clause?
For example...
-- Condition in JOIN
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND CUS.FirstName = 'John'
-- Condition in WHERE
SELECT *
FROM dbo.Customers AS CUS
INNER JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE CUS.FirstName = 'John'
Which do you prefer (and perhaps why)?

The relational algebra allows interchangeability of the predicates in the WHERE clause and the INNER JOIN, so even INNER JOIN queries with WHERE clauses can have the predicates rearrranged by the optimizer so that they may already be excluded during the JOIN process.
I recommend you write the queries in the most readable way possible.
Sometimes this includes making the INNER JOIN relatively "incomplete" and putting some of the criteria in the WHERE simply to make the lists of filtering criteria more easily maintainable.
For example, instead of:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
AND c.State = 'NY'
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
AND a.Status = 1
Write:
SELECT *
FROM Customers c
INNER JOIN CustomerAccounts ca
ON ca.CustomerID = c.CustomerID
INNER JOIN Accounts a
ON ca.AccountID = a.AccountID
WHERE c.State = 'NY'
AND a.Status = 1
But it depends, of course.

For inner joins I have not really noticed a difference (but as with all performance tuning, you need to check against your database under your conditions).
However where you put the condition makes a huge difference if you are using left or right joins. For instance consider these two queries:
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderDate >'20090515'
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'
The first will give you only those records that have an order dated later than May 15, 2009 thus converting the left join to an inner join.
The second will give those records plus any customers with no orders. The results set is very different depending on where you put the condition. (Select * is for example purposes only, of course you should not use this in production code.)
The exception to this is when you want to see only the records in one table but not the other. Then you use the where clause for the condition not the join.
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
WHERE ORD.OrderID is null

Most RDBMS products will optimize both queries identically. In "SQL Performance Tuning" by Peter Gulutzan and Trudy Pelzer, they tested multiple brands of RDBMS and found no performance difference.
I prefer to keep join conditions separate from query restriction conditions.
If you're using OUTER JOIN sometimes it's necessary to put conditions in the join clause.

WHERE will filter after the JOIN has occurred.
Filter on the JOIN to prevent rows from being added during the JOIN process.

I prefer the JOIN to join full tables/Views and then use the WHERE To introduce the predicate of the resulting set.
It feels syntactically cleaner.

I typically see performance increases when filtering on the join. Especially if you can join on indexed columns for both tables. You should be able to cut down on logical reads with most queries doing this too, which is, in a high volume environment, a much better performance indicator than execution time.
I'm always mildly amused when someone shows their SQL benchmarking and they've executed both versions of a sproc 50,000 times at midnight on the dev server and compare the average times.

Agree with 2nd most vote answer that it will make big difference when using LEFT JOIN or RIGHT JOIN. Actually, the two statements below are equivalent. So you can see that AND clause is doing a filter before JOIN while the WHERE clause is doing a filter after JOIN.
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN dbo.Orders AS ORD
ON CUS.CustomerID = ORD.CustomerID
AND ORD.OrderDate >'20090515'
SELECT *
FROM dbo.Customers AS CUS
LEFT JOIN (SELECT * FROM dbo.Orders WHERE OrderDate >'20090515') AS ORD
ON CUS.CustomerID = ORD.CustomerID

Joins are quicker in my opinion when you have a larger table. It really isn't that much of a difference though especially if you are dealing with a rather smaller table. When I first learned about joins, i was told that conditions in joins are just like where clause conditions and that i could use them interchangeably if the where clause was specific about which table to do the condition on.

Putting the condition in the join seems "semantically wrong" to me, as that's not what JOINs are "for". But that's very qualitative.
Additional problem: if you decide to switch from an inner join to, say, a right join, having the condition be inside the JOIN could lead to unexpected results.

It is better to add the condition in the Join. Performance is more important than readability. For large datasets, it matters.

Join with Or Condition

Is there a more efficient way to write this? I'm not sure this is the best way to implement this.
select *
from stat.UniqueTeams uTeam
Left Join stat.Matches match
on match.AwayTeam = uTeam.Id or match.HomeTeam = uTeam.id

OR in JOINS is a bad practice, because MSSQL can not use indexes in right way.
Better way - use two selects with UNION:
SELECT *
FROM stat.UniqueTeams uTeam
LEFT JOIN stat.Matches match
ON match.AwayTeam = uTeam.Id
UNION
SELECT *
FROM stat.UniqueTeams uTeam
LEFT JOIN stat.Matches match
ON match.HomeTeam = uTeam.id

Things to be noted while using LEFT JOIN in query:
1) First of all, left join can introduce NULL(s) that can be a performance issue because NULL(s) are treated separately by server engine.
2) The table being join as null-able should not be bulky otherwise it will be costly to execute (performance + resource).
3) Try to include column(s) that has been already indexed. Otherwise, if you need to include such column(s) than better first you build some index(es) for them.
In your case you have two columns from the same table to be left joined to another table. So, in this case a good approach would be if you can have a single table with same column of required data as I have shown below:
; WITH Match AS
(
-- Select all required columns and alise the key column(s) as shown below
SELECT match1.*, match1.AwayTeam AS TeamId FROM stat.Matches match1
UNION
SELECT match2.*, match2.HomeTeam AS TeamId FROM stat.Matches match2
)
SELECT
*
FROM
stat.UniqueTeams uTeam
OUTER APPLY Match WHERE Match.TeamId = uTeam.Id
I have used OUTER APPLY which is almost similar to LEFT OUTER JOIN but it is different during query execution. It works as Table-Valued Function that can preform better in your case.

my answer is not to the point, but i found this question seeking for "or" condition for inner join, so it maybe be useful for the next seeker
we can use legacy syntax for case of inner join:
select *
from stat.UniqueTeams uTeam, stat.Matches match
where match.AwayTeam = uTeam.Id or match.HomeTeam = uTeam.id
note - this query has bad perfomance (cross join at first, then filter). but it can work with lot of conditions, and suitable for dirty data research(for example t1.id=t2.id or t1.name=t2.name)

Shortcut for adding table to column name SQL-server 2014

Stupidly simple question, but I just don't know what to google!
If I create a query like this:
Select id, data
from table1
Now I want to join with table2. I can immediately see that the id column is no longer unique and I have to change it to
table1.id
Is there any smart way (like a keyboard-shortcut) to do this, instead of manually adding table1 to every column? Either before I add the Join to secure that all columns will be unique, or after with suggestions based on the different possible tables.

No, there is no helper.
But do not you can alias the table name:
select x.Col1, y.Col2
from ALongTableName x
inner join AReallyReallyLongTableName y on x.Id = y.OtherId
which can also make queries clearer, and is very much necessary when doing self joins.

First of all, you should start using aliases:
SQL aliases are used to give a database table, or a column in a table,
a temporary name.
Basically aliases are created to make column names more readable.
This will narrow down your problem and make your code maintenance easier. If that's not enough, I guess you could start using auto-completion tools, such as these:
SQL Complete
SQL Prompt
ApexSQL Complete
These have your desired functionality, however, they do not always work as expected (at least for me).

Oh! You can use alias table name. Like this:
SELECT A.ID, A.data
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID
You just only use A. or B. if two table have same this column selected. If they different, you don't need: Like this:
SELECT A.ID, data -- if Table B not have column data
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID
Or:
Select A.*, B.ID
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID

Is moving a constraint into a join more efficient than join and a where clause?

I have been trying to test this, but I have doubts about my tests as the timings vary so much.
-- Scenario 1
SELECT * FROM Foo f
INNER JOIN Bar b ON f.id = b.id
WHERE b.flag = true;
-- Scenario 2
SELECT * FROM Foo f
INNER JOIN Bar b ON b.flag = true AND f.id = b.id;
Logically it seems like scenario 2 would be more efficient, but I wasn't sure if SQL server is smart enough to optimize this or not.

Not sure why you think scenario 2 would "logically" be more efficient. On an INNER JOIN everything is basically a filter so SQL Server can collapse the logic to the exact same underlying plan shape. Here's an example from AdventureWorks2012 (click to enlarge):
I prefer separating the join criteria from the filter criteria, so will always write the query in the format on the left. However #HLGEM makes a good point, these clauses are interchangeable in this case only because it's an INNER JOIN. For an OUTER JOIN, it is very important to place the filters on the outer table in the join criteria, else you unwittingly end up with an INNER JOIN and drastically change the semantics of the query. So my advice about how the plan can be collapsed only holds true for inner joins.
If you're worried about performance, I'd start by getting rid of SELECT * and only pulling the columns you actually need (and make sure there's a covering index).
Four months later, another answer has emerged claiming that there usually will be a difference in performance, and that putting filter criteria in the ON clause will be better. While I won't dispute that it is certainly plausible that this could happen, I contend that it certainly isn't the norm and shouldn't be something you use as an excuse to always put all filter criteria in the ON clause.

The accepted answer is correct only for your test case.
An answer to the headline question as stated is yes, moving the constraint to the join condition can greatly improve the query and ensures. I have seen forms similar to this (but perhaps not exactly)...
select *
from A
inner join B
on B.id = a.id
inner join C
on C.id = A.id
where B.z = 1 and C.z = 2;
...not optimize to the same plan as the "on join" equivalents so I tend to use the "on join" constraints as a best practice even for the simpler cases that might have resolved optimally either way.