My Database Professor told us to use:
SELECT A.a1, B.b1 FROM A, B WHERE A.a2 = B.b2;
Rather than:
SELECT A.a1, B.b1 FROM A INNER JOIN B ON A.a2 = B.b2;
Supposedly Oracle don't likes JOIN-Syntaxes, because these JOIN-syntaxes are harder to optimize than the WHERE restriction of the Cartesian Product.
I can't imagine why this should be the case. The only Performance issue could be that the parser Needs to parse a few characters more. But that is negligible in my eyes.
I found this Stack Overflow Questions:
Is there an Oracle official recommendation on the use of explicit ANSI JOINs vs implicit joins?
Explicit vs implicit SQL joins
And this sentence in a Oracle Documentation: https://docs.oracle.com/cd/B19306_01/server.102/b14200/queries006.htm
Oracle recommends that you use the FROM clause OUTER JOIN syntax rather than the Oracle join operator.
Can someone give me up-to-date recommendations from Oracle with link. Because she don't acknowledges StackOverflow (here can answer everyone) and the 10g Documentation is outdated in here eyes.
If i am wrong and Oracle realy don't likes JOINS now than thats also ok, but i don't find articles. I just want to know who is Right.
Thanks a lot to everyone who can help me!
Your professor should speak with Gordon Linoff, who is a computer science professor at Columbia University. Gordon, and most SQL enthusiasts on this site, will almost always tell you to use explicit join syntax. The reasons for this are many, including (but not limited to):
Explicit joins make it easy to see what the actual join logic is. Implicit joins, on the other hand, obfuscate the join logic, by spreading it out across both the FROM and WHERE clauses.
The ANSI 92 standard recommends using modern explicit joins, and in fact deprecated the implicit join which your professor seems to be pushing
Regarding performance, as far as I know, both versions of the query you wrote would be optimized to the same thing under the hood. You can always check the execution plans of both, but I doubt you would see a significant difference very often.
An average sql query you will encounter in real business has 7-8 joins with 12-16 join conditions. One every 10 or 20 queries may involve nested joins or other more advanced cases.
Explicit join syntax is simply far easier to maintain, debug and develop. And those factors are critical for business software - the faster and safer the better.
Implicit join are somewhat easier to code if you create statements dynamically through application code. Perhaps there are other uses that i am unaware.
As with many non-trivial things there is no simple yes / no answer.
The first thing is, for trivial queries (as yours example in the question) it doesn’t matter which syntax you use. The classic syntax is even more compact for simple queries.
First for non-trivial queries (say more than five joins) you will learn the benefits of the ANSI syntax. The main benefit is that the join predicates are separated and divided from the WHERE condition.
Simple example – this is a complete valid query in the pre-ANSI syntax
SELECT A.a1, B.b1
FROM A, B
WHERE A.a1 = B.b1 and
A.a1 = B.b1(+);
Is it inner or outer join? Furthermore if this construct is scattered in a predicate with 10 other join condition in the WHERE clause, it is even very easy to misread it.
Anyway, it would be very naïve to assume that those two syntax options are only a syntax sugar and that the resulting execution plan is for all queries, any data and all Oracle versions identical.
Yes, and there were times (about Oracle 10) you should be careful. But in times of 12 and 18 versions I do not see a reason to be defensive and I'm convinced it is safe to use the ANSI syntax from the above reason of better overview and readability.
Final remark for your professor: if you get in the position of optimizing the WHERE restriction of the Cartesian Product you typically encounters a performance problem. Make a thought experiment with a Cartesian Product of four tables with 1.000 rows each…
There are rare occasions when the optimiser suffers from a bug when using the explicit JOIN syntax as opposed to the implicit one. For example, I once could not achieve to profit from a join elimination optimisation in Oracle 12c when using explicit joins, whereas the join was properly eliminated with the implicit join syntax. When working with views querying views querying views, lack of join elimination can indeed cause performance issues. I've explained the concept of join elimination in a blog post, here.
That was a bug (and a rare one at that, these days), and not a good reason to avoid the explicit join syntax in general. I think in current versions of Oracle, there's no reason in favour of one or the other syntax other than personal taste, when join trees are simple. With complex join trees, the explicit syntax tends to be superior, as it is more clear, and some relationships (e.g. full outer joins or joins with complex join predicates) are not possible otherwise. But neither of these arguments is about performance.
Related
My DBA tells me that I should always use OPTION (FORCE ORDER) in my SQL statements when accessing a particular set of views. I understand this is to prevent the server vetoing the order of his joins.
Fair enough - it's worth while keeping the DBA happy and I am happy to comply.
However, I would like to write a couple of views in my own schema, but this isn't supported apparently.
How then, can I achieve the same when writing my views, ie having OPTION (FORCE ORDER) being enforced?
Thanks
Fred
Blindly appending OPTION (FORCE ORDER) onto all queries that reference a particular view is extremely poor blanket advice.
OPTION (FORCE ORDER) is a query hint and these are not valid inside a view - you would need to put it on the outer level on all queries referencing your own views.
It is valid to use Join hints inside views though and
If a join hint is specified for any two tables, the query optimizer
automatically enforces the join order for all joined tables in the
query, based on the position of the ON keywords.
So
SELECT v1.Foo,
v2.Bar
FROM v1
INNER HASH JOIN v2
ON v1.x = v2.x;
Would enforce the join order inside v1 and v2 (as well as enforcing the join ordering and algorithm between them).
But I would not recommend this. These types of hints should only be used in an extremely targeted manner in a last resort after not being able to get a satisfactory plan any other way. Not as a matter of policy without even testing alternatives.
I am converting lots of access queries to sql server stored procedure. So the sql need to meet the t-sql standard. For example IIF etc
Is there a tool that can convert big access queries to t-sql ? What is the best way of doing this ?
As far as a "tool" that will just convert the queries for you, I'm not aware of one. Neither is anyone on this thread or this site.
There are a couple places I can direct you, though, that can possibly help with the transition.
Here is a cheat sheet you can use as a quick glance when converting your queries.
If your queries use any [Forms]! references, there could also be an issue with that. (I've never tried it, but I am going to assume it doesn't work.)
This resource has probably the most detailed explanations on things you might need to learn in SQL Server. From stored queries, to handling NULLs to some of the other differences. There are also differences in MS Access SQL compared to T-SQL. Gordon Linoff briefly describes 10 important differences in his blog.
Access does not support the case statement, so conditional logic is
done with the non-standard IIf() or Switch() functions.
Access requires parentheses around each pair-wise join, resulting in
a proliferation of nesting in from clauses that only serves to
confuse people learning SQL.
Access join syntax requires the INNER for INNER JOIN. While it may
be a good idea to use inner for clarify, it is often omitted in
practice (in other databases).
Access does not support full outer join.
Access does not allow union or union all in subqueries.
Access requires the AS for table aliases. In most databases, this
is optional, and I prefer to only use as for column aliases.
Ironically, the use of as for table aliases is forbidden in Oracle.
Access uses double quotes to delimit strings (as opposed to single
quotes) and is the only database (to my knowledge) that uses & as a
string concatenation operator.
Access uses * for the wildcard in like rather than %.
Access allows BETWEEN AND . This is allowed in other databases, but
will always evaluate to false.
Access does not support window/analytic functions (using the over
and partition by clauses).
In sum, no, there is no tool that I have seen.
So we are migrating from Informix to Sql Server. And I have noticed that in Informix the queries are written in this manner:
select [col1],[col2],[col3],[col4],[col5]
from tableA, tableB
where tableA.[col1] = table.[gustavs_custom_chrome_id]
Whereas all the queries I write in SQL Server are written as:
select [col1],[col2],[col3],[col4],[col5]
from tableA
inner join tableB on tableA.[col1] = table.[gustavs_custom_chrome_id]
Now, my first thought was: that first query is bad. It probably creates this huge record set then whittles to the actual record set using the Where clause. Therefore, it's bad for performance. And it's non-ansi. So it's double bad.
However, after some googling, it seems that they both are, in theory, pretty much the same. And they both are ANSI compliant.
So my questions are:
Do both queries perform the same? IE. runs just as fast and always gives the same answer.
Are both really ANSI-compliant?
Are there any outstanding reasons why I should push for one style over another? Or should I just leave good enough alone?
Note: These are just examples of the queries. I've seen some queries (of the first kind) join up to 5 tables at a time.
Well, "better" is subjective. There is some style here. But I'll address your questions directly.
Both perform the same
Both are ANSI-compliant.
The problem with the first example is that
it is very easy to inadvertently derive the cross product (since it is easier to leave out join criteria)
it also becomes difficult to debug the join criteria as you add more and more tables to the join
since the old-style outer join (*=) syntax has been deprecated (it has long been documented to return incorrect results), when you need to introduce outer joins, you need to mix new style and old style joins ... why promote inconsistency?
while it's not exactly the authority on best practices, Microsoft recommends explicit INNER/OUTER JOIN syntax
with the latter method:
you are using consistent join syntax regardless of inner / outer
it is tougher (not impossible) to accidentally derive the cross product
isolating the join criteria from the filter criteria can make debugging easier
I wrote the post Kevin pointed to.
I'm looking for a high-level, algorithmic understanding so that I can get a Big-O sense of what SQL-Server is doing to perform joins. Feel free to be concise, I'm not looking for the extremely nitty gritty. The thing that prompted me to understand how joins are implemented better is the scenario behind this question that I also just posted. I felt like they were ultimately two separate questions though, which is why I didn't combine them.
Thanks!
EDIT (2020-03-11): The links in this 9+ year old answer are all invalid today. I would delete the answer, but SO won't let me since it was accepted back when it was actually useful.
Original Answer:
Here's some reading to get you started.
Nested Loop Join ( Wayback )
Merge Join ( Wayback )
Hash Join ( Wayback )
Summary of Join Properties ( Wayback )
Honestly if you are interested at that level of detail I would suggest you read
Microsoft SQL Server 2008 Internals.
And learn to read execution plans. SQL Server has a pretty good optimization engine. It doesn't always do things the way we humans would expect though or even the same way for two queries that appear to us be similar.
SQL Server can choose from a variety of different joins: the most common ones are merge, loop and hash. See this KB article.
I have my business-logic in ~7000 lines of T-SQL stored procedures, and most of them has next JOIN syntax:
SELECT A.A, B.B, C.C
FROM aaa AS A, bbb AS B, ccc AS C
WHERE
A.B = B.ID
AND B.C = C.ID
AND C.ID = #param
Will I get performance growth if I will replace such query with this:
SELECT A.A, B.B, C.C
FROM aaa AS A
JOIN bbb AS B
ON A.B = B.ID
JOIN ccc AS C
ON B.C = C.ID
AND C.ID = #param
Or they are the same?
The two queries are the same, except the second is ANSI-92 SQL syntax and the first is the older SQL syntax which didn't incorporate the join clause. They should produce exactly the same internal query plan, although you may like to check.
You should use the ANSI-92 syntax for several of reasons
The use of the JOIN clause separates
the relationship logic from the
filter logic (the WHERE) and is thus
cleaner and easier to understand.
It doesn't matter with this particular query, but there are a few circumstances where the older outer join syntax (using + ) is ambiguous and the query results are hence implementation dependent - or the query cannot be resolved at all. These do not occur with ANSI-92
It's good practice as most developers and dba's will use ANSI-92 nowadays and you should follow the standard. Certainly all modern query tools will generate ANSI-92.
As pointed out by #gbn, it does tend to avoid accidental cross joins.
Myself I resisted ANSI-92 for some time as there is a slight conceptual advantage to the old syntax as it's a easier to envisage the SQL as a mass Cartesian join of all tables used followed by a filtering operation - a mental technique that can be useful for grasping what a SQL query is doing. However I decided a few years ago that I needed to move with the times and after a relatively short adjustment period I now strongly prefer it - predominantly because of the first reason given above. The only place that one should depart from the ANSI-92 syntax, or rather not use the option, is with natural joins which are implicitly dangerous.
The second construct is known as the "infixed join syntax" in the SQL community. The first construct AFAIK doesn't have widely accepted name so let's call it the 'old style' inner join syntax.
The usual arguments go like this:
Pros of the 'Traditional' syntax: the
predicates are physically grouped together in the WHERE clause in
whatever order which makes the query generally, and n-ary relationships particularly, easier to read and understand (the ON clauses of the infixed syntax can spread out the predicates so you have to look for the appearance of one table or column over a visual distance).
Cons of the 'Traditional' syntax: There is no parse error when omitting one of the 'join' predicates and the result is a Cartesian product (known as a CROSS JOIN in the infixed syntax) and such an error can be tricky to detect and debug. Also, 'join' predicates and 'filtering' predicates are physically grouped together in the WHERE clause, which can cause them to be confused for one another.
The two queries are equal - the first is using non-ANSI JOIN syntax, the 2nd is ANSI JOIN syntax. I recommend sticking with the ANSI JOIN syntax.
And yes, LEFT OUTER JOINs (which, btw are also ANSI JOIN syntax) are what you want to use when there's a possibility that the table you're joining to might not contain any matching records.
Reference: Conditional Joins in SQL Server
OK, they execute the same. That's agreed.
Unlike many I use the older convention. That SQL-92 is "easier to understand" is debatable. Having written programming languages for pushing 40 years (gulp) I know that 'easy to read' begins first, before any other convention, with 'visual acuity' (misapplied term here but it's the best phrase I can use).
When reading SQL the FIRST thing you mind cares about is what tables are involved and then which table (most) defines the grain. Then you care about relevant constraints on the data, then the attributes selected. While SQL-92 mostly separates these ideas out, there are so many noise words, the mind's eye has to interpret and deal with these and it makes reading the SQL slower.
SELECT Mgt.attrib_a AS attrib_a
,Sta.attrib_b AS attrib_b
,Stb.attrib_c AS attrib_c
FROM Main_Grain_Table Mgt
,Surrounding_TabA Sta
,Surrounding_tabB Stb
WHERE Mgt.sta_join_col = Sta.sta_join_col
AND Mgt.stb_join_col = Stb.stb_join_col
AND Mgt.bus_logic_col = 'TIGHT'
Visual Acuity!
Put the commas for new attributes in front It makes commenting code easier too
Use a specific case for functions and keywords
Use a specific case for tables
Use a specific case for attributes
Vertically Line up operators and operations
Make the first table(s) in the FROM represent the grain of the data
Make the first tables of the WHERE be join constraints and let the specific, tight constraints float to the bottom.
Select 3 character alias for ALL tables in your database and use the alias EVERYWHERE you reference the table. You should use that alias as a prefix for (many) indexes on that table as well.
6 of 1 1/2 dozen of another, right? Maybe. But even if you're using ANSI-92 convention (as I have and in cases will continue to do) use visual acuity principles, verticle alignment to let your mind's eye avert to the places you want to see and and easily avoid things (particularly noise words) you don't need to.
Execute both and check their query plans. They should be equal.
In my mind the FROM clause is where I decide what columns I need in the rows for my SELECT clause to work on. It is where a business rule is expressed that will bring onto the same row, values needed in calculations. The business rule can be customers who have invoices, resulting in rows of invoices including the customer responsible. It could also be venues in the same postcode as clients, resulting in a list of venues and clients that are close together.
It is where I work out the centricity of the rows in my result set. After all, we are simply shown the metaphor of a list in RDBMSs, each list having a topic (the entity) and each row being an instance of the entity. If the row centricity is understood, the entity of the result set is understood.
The WHERE clause, which conceptually executes after the rows are defined in the from clause, culls rows not required (or includes rows that are required) for the SELECT clause to work on.
Because join logic can be expressed in both the FROM clause and the WHERE clause, and because the clauses exist to divide and conquer complex logic, I choose to put join logic that involves values in columns in the FROM clause because that is essentially expressing a business rule that is supported by matching values in columns.
i.e. I won't write a WHERE clause like this:
WHERE Column1 = Column2
I will put that in the FROM clause like this:
ON Column1 = Column2
Likewise, if a column is to be compared to external values (values that may or may not be in a column) such as comparing a postcode to a specific postcode, I will put that in the WHERE clause because I am essentially saying I only want rows like this.
i.e. I won't write a FROM clause like this:
ON PostCode = '1234'
I will put that in the WHERE clause like this:
WHERE PostCode = '1234'
ANSI syntax does enforce neither predicate placement in the proper clause (be that ON or WHERE), nor the affinity of the ON clause to adjacent table reference. A developer is free to write a mess like this
SELECT
C.FullName,
C.CustomerCode,
O.OrderDate,
O.OrderTotal,
OD.ExtendedShippingNotes
FROM
Customer C
CROSS JOIN Order O
INNER JOIN OrderDetail OD
ON C.CustomerID = O.CustomerID
AND C.CustomerStatus = 'Preferred'
AND O.OrderTotal > 1000.0
WHERE
O.OrderID = OD.OrderID;
Speaking of query tools who "will generate ANSI-92", I'm commenting here because it generated
SELECT 1
FROM DEPARTMENTS C
JOIN EMPLOYEES A
JOIN JOBS B
ON C.DEPARTMENT_ID = A.DEPARTMENT_ID
ON A.JOB_ID = B.JOB_ID
The only syntax that escapes conventional "restrict-project-cartesian product" is outer join. This operation is more complicated because it is not associative (both with itself and with normal join). One have to judiciously parenthesize query with outer join, at least. However, it is an exotic operation; if you are using it too often I suggest taking relational database class.