Joins - Difference in condition : Placed to left or right - sql-server

In the following two queries, the only difference is the condition is swapped.
Will it make any performance difference?
Which one is advisable? I have searched over web with no luck. Please help.
First Query :
select order_date, order_amount
from customers
join orders
on customers.customer_id = orders.customer_id
where customer_id = 3
Second Query :
select order_date, order_amount
from customers
join orders
on orders.customer_id = customers.customer_id
where customer_id = 3

Prdp's comment sums up the answer beautifully. The answer is no. But to further clarify and give you some more info:
SQL Server uses TSQL which is a declarative language. To steal from this post, the definition of declarative is:
Programming paradigm that expresses the desired result of computation
without describing the steps to achieve it (also abbreviated with
"describe what, not how")
What this basically translates to is that you tell SQL Server what you want returned and provide the logic for things like the joins, and SQL Server will figure out the best way to do it. If it has to rearrange joins, do implicit conversions, it will in order to produce an optimal plan.

Related

SELF Referential SQL Query

I have a table in my MS SQL Database called PolicyTransactions. This table has two important columns:
trans_id INT IDENTITY(1,1),
policy_id INT NOT NULL,
I need help writing a query that will, for each trans_id/policy_id in the table, join it to the last previous trans_id for that policy_id. This seems like a simple enough query, but for some reason I can't get it the gel in my brain right now.
Thanks!
I cooked this up for you.... Hopefully its what you're looking for: http://sqlfiddle.com/#!6/e7dc39/8
Basically, a cross apply is different from a subquery or regular join. It is a query that gets executed per each row that the outer portion of the query returns. This is why it has visibility into the outer tables (a subquery would not have this ability) and this is why its using the old school join syntax (old school meaning the join condition on _ = _ is in the where clause).
Just be really careful with this solution as cross apply isn't necessarily the fastest thing on earth. However, if the indexing on the tables is decent, that tiny query should run pretty quickly.
Its the only way I could think of to solve it, but it doesn't mean its the only way!
just a super quick edit: If you notice, some rows are not returned because they are the FIRST policy and therefore don't have a tran_id less than them with the same policy number. If you want to simulate an outer join with an apply, use outer apply :)
If you are using SQL Server 2012 or later you should use the LAG() function. See snippet below, I feel that its much cleaner than the other answer given here.
SELECT trans_id, policy_id, LAG(trans_id) OVER (PARTITION BY policy_id ORDER BY trans_id)
FROM PolicyTransaction

Execution of TSQL statement

I am aware of the sequence of the execution of SQL statements but I still want to make sure few things with the help of SQL experts here. I have a big SQL query which returns thousands of rows. Here is the minimized version of the query which I wrote and think that it is correct.
Select *
from property
inner join tenant t on (t.hproperty = p.hmy **and p.hmy = 7**)
inner join commtenant ct on ct.htenant = t.hmyperson
where 1=1
My colleague says that above query is equivalent to below query performance wise(He is very confident about it)
Select *
from property
inner join tenant t on (t.hproperty = p.hmy)
inner join commtenant ct on ct.htenant = t.hmyperson
where **p.hmy = 7**
Could anybody help me with the explanation about why above queries are not equivalent or equivalent? Thanks.
If you want to know if two queries are equivalent, learn how to look at the execution plans in SQL Server Management Studio. You can put the two queries in different windows, look at the estimated execution plans, and see for yourself if they are the same.
In this case, they probably are the same. SQL is intended to be a descriptive language, not a procedural language. That is, it describes the output you want, but the SQL engine is allowed to rewrite the query to be as efficient as possible. The two forms you have describe the same output. Do note that if there were a left outer join instead of an inner join, then the queries would be different.
In all likelihood, the engine will read the table and filter the records during the read or use an index for the read. The key idea, though, is that the output is the same and SQL Server can recognize this.
"p.hmy = 7" is not a join condition, as it relates only to a single table. As such, it doesn't really belong in the ON clause of the join. Since you are not adding any information by placing the condition in the ON clause, having it in the WHERE clause (in which it really belongs) will not make any difference to the query plan generated. If in doubt, look at the query plans.

is index still effective after data has been selected?

I have two tables that I want to join, they both have index on the column I am trying to join.
QUERY 1
SELECT * FROM [A] INNER JOIN [B] ON [A].F = [B].F;
QUERY 2
SELECT * FROM (SELECT * FROM [A]) [A1] INNER JOIN (SELECT * FROM B) [B1] ON [A1].F=[B1].F
the first query clearly will utilize the index, what about the second one?
after the two select statements in the brackets are executed, then join would occur, but my guess is the index wouldn't help to speed up the query because it is pretty much a new table..
The query isn't executed quite so literally as you suggest, where the inner queries are executed first and then their results are combined with the outer query. The optimizer will take your query and will look at many possible ways to get your data through various join orders, index usages, etc. etc. and come up with a plan that it feels is optimal enough.
If you execute both queries and look at their respective execution plans, I think you will find that they use the exact same one.
Here's a simple example of the same concept. I created my schema as so:
CREATE TABLE A (id int, value int)
CREATE TABLE B (id int, value int)
INSERT INTO A (id, value)
VALUES (1,900),(2,800),(3,700),(4,600)
INSERT INTO B (id, value)
VALUES (2,800),(3,700),(4,600),(5,500)
CREATE CLUSTERED INDEX IX_A ON A (id)
CREATE CLUSTERED INDEX IX_B ON B (id)
And ran queries like the ones you provided.
SELECT * FROM A INNER JOIN B ON A.id = B.id
SELECT * FROM (SELECT * FROM A) A1 INNER JOIN (SELECT * FROM B) B1 ON A1.id = B1.id
The plans that were generated looked like this:
Which, as you can see, both utilize the index.
Chances are high that the SQL Server Query Optimizer will be able to detect that Query 2 is in fact the same as Query 1 and use the same indexed approach.
Whether this happens depends on a lot of factors: your table design, your table statistics, the complexity of your query, etc. If you want to know for certain, let SQL Server Query Analyzer show you the execution plan. Here are some links to help you get started:
Displaying Graphical Execution Plans
Examining Query Execution Plans
SQL Server uses predicate pushing (a.k.a. predicate pushdown) to move query conditions as far toward the source tables as possible. It doesn't slavishly do things in the order you parenthesize them. The optimizer uses complex rules--what is essentially a kind of geometry--to determine the meaning of your query, and restructure its access to the data as it pleases in order to gain the most performance while still returning the same final set of data that your query logic demands.
When queries become more and more complicated, there is a point where the optimizer cannot exhaustively search all possible execution plans and may end up with something that is suboptimal. However, you can pretty much assume that a simple case like you have presented is going to always be "seen through" and optimized away.
So the answer is that you should get just as good performance as if the two queries were combined. Now, if the values you are joining on are composite, that is they are the result of a computation or concatenation, then you are almost certainly not going to get the predicate push you want that will make the index useful, because the server won't or can't do a seek based on a partial string or after performing reverse arithmetic or something.
May I suggest that in the future, before asking questions like this here, you simply examine the execution plan for yourself to validate that it is using the index? You could have answered your own question with a little experimentation. If you still have questions, then come post, but in the meantime try to do some of your own research as a sign of respect for the people who are helping you.
To see execution plans, in SQL Server Management Studio (2005 and up) or SQL Query Analyzer (SQL 2000) you can just click the "Show Execution Plan" button on the menu bar, run your query, and switch to the tab at the bottom that displays a graphical version of the execution plan. Some little poking around and hovering your mouse over various pieces will quickly show you which indexes are being used on which tables.
However, if things aren't as you expect, don't automatically think that the server is making a mistake. It may decide that scanning your main table without using the index costs less--and it will almost always be right. There are many reasons that scanning can be less expensive, one of which is a very small table, another of which is that the number of rows the server statistically guesses it will have to return exceeds a significant portion of the table.
These both queries are same. The second query will be transformed just same as first one during transformation.
However, if you have specific requirement I would suggest that you put the whole code.Then It would be much easier to answer your question.

SQL Server performance - Subselect or Inner Join?

I've been pondering the question which of those 2 Statements might have a higher performance (and why):
select * from formelement
where formid = (select id from form where name = 'Test')
or
select *
from formelement fe
inner join form f on fe.formid = f.id
where f.name = 'Test'
One form contains several form elements, one form element is always part of one form.
Thanks,
Dennis
look at the execution plan, most likely it will be the same if you add the filtering to the join, that said the join will return everything from both tables, the in will not
I actually prefer EXISTS over those two
select * from formelement fe
where exists (select 1 from form f
where f.name='Test'
and fe.formid =f.id)
The performance depends on the query plan choosen by the SQL Server Engine. The query plan depends on a lot of factors, including (but not limited to) the SQL, the exact table structure, the statistics of the tables, available indexes, etc.
Since your two queries are quite simple, my guess would be that they result in the same (or a very similar) execution plan, thus yielding comparable performance.
(For large, complicated queries, the exact wording of the SQL can make a difference, the book SQL Tuning by Dan Tow gives a lot of great advice on that.)

SQL Server: Design: Embedded Select statement or INNER JOIN?

I've the following table structure -
Site: Master tablefor site
Org: Master table for Org
User: Master table for User (each user links to a unique Org via
User.OrgId)
OrgSite: Store some 'Org specific' Site details (OrgId, SiteId, SiteName,
SiteCode). Not ALL sites but only
those which are accessible to Org.
UserSite: Link User to his accessible Site(s) (UserId, SiteId).
As a user is linked to an Org UserSite
will be a subset of the OrgSite table.
ItemSite: Table which stores some Item & Site specific details (ItemID,
SiteId, OrgId, ...)
Now, I've to filter\display records from the 'ItemSite' and in that I also need to display the Sitecode. So, I see the following two options -
1. Create a VIEW: vw_ItemSite_UserSite_OrgSite (INNER JOIN all the tables on SiteId) - this will give me access to ALL the Org specific details available in the 'OrgSite' table (i.e. SiteCode, etc..)
If you can notice I've to include the
'OrgSite' in the view only because I
want Org specific SiteCode & SiteName.
Because the UserSite is already
filtering the Sites - so I can
'exclude' the OrgSite table and
eliminate an unnecessary INNER JOIN.
2. Based on the above note - the second option is to create a VIEW: vw_ItemSite_UserSite and in the 'SELECT' statement of the VIEW I can embed the following SELECT like -
CREATE VIEW vw_ItemSite_UserSite AS
SELECT ItemSite.SiteID,
(SELECT TOP 1 [SiteCode] FROM OrgSite WHERE OrgId = ItemSite.OrgId) AS SiteCode,
...
FROM ItemSite INNER JOIN UserSite ON ItemSite.SiteId = UserSite.SiteId
My only intention is that - I believe the INNER JOIN and WHERE will be evaluted before the evalution of the embedded select statement. So, does this save me some performance? Or is the idea of having the vw_ItemSite_UserSite_OrgSite is better.
Option#1 or option#2?
Thank you.
Beware of Premature optimization. If both queries return the same result, use the one that is easier for you to understand and maintain. It's SQL Server's task to make sure that the query operations (join, select, ...) are performed in the order which optimizes performance. And, usually, SQL Server does quite a good job on that.
That said, there are some occasions where the SQL Server query analyzer does not find the optimal query plan and you need to fine-tune yourself. However, these are rare cases. Unless you already have performance problems with your query (and they cannot be fixed by introducing missing indexes), this is something you should not worry about right now.
I'll take the easy answer approach.Create some tests and check them for performance and see which one really performs best for your given environment.
Option 1 will almost certainly be faster, the embedded SELECT is usually a bad idea for performance.
BUT - don't take our word for it. Code up both and try them, checking the query plans. It's probably premature optimisation in this case, but it's also a good simple test case on which to learn so you know properly how to do it and what the implications are for when you have a problem that really needs the right way to do it. There are sometimes huge performance differences between different ways of writing the same query that the optimiser can do nothing about so learn the general rules up front and your life will be happier.

Resources