SQL FROM clause using n>1 tables - database

If you add more than one table to the FROM clause (in a query), how does this impact the result set? Does it first select from the first table then from the second and then create a union (i.e., only the rowspace is impacted?) or does it actually do something like a join (i.e., extend the column space)? And when you use multiple tables in the FROM clause, does the WHERE clause filter both sub-result-sets?

Specifying two tables in your FROM clause will execute a JOIN. You can then use the WHERE clause to specify your JOIN conditions. If you fail to do this, you will end-up with a Cartesian product (every row in the first table indiscriminately joined to every row in the second).
The code will look something like this:
SELECT a.*, b.*
FROM table1 a, table2 b
WHERE a.id = b.id
However, I always try to explicitly specify my JOINs (with JOIN and ON keywords). That makes it abundantly clear (for the next developer) as to what you're trying to do. Here's the same JOIN, but explicitly specified:
SELECT a.*, b.*
FROM table1 a
INNER JOIN table2 b ON b.id = a.id
Note that now I don't need a WHERE clause. This method also helps you avoid generating an inadvertent Cartesian product (if you happen to forget your WHERE clause), because the ON is specified explicitly.

Related

Join with Or Condition

Is there a more efficient way to write this? I'm not sure this is the best way to implement this.
select *
from stat.UniqueTeams uTeam
Left Join stat.Matches match
on match.AwayTeam = uTeam.Id or match.HomeTeam = uTeam.id
OR in JOINS is a bad practice, because MSSQL can not use indexes in right way.
Better way - use two selects with UNION:
SELECT *
FROM stat.UniqueTeams uTeam
LEFT JOIN stat.Matches match
ON match.AwayTeam = uTeam.Id
UNION
SELECT *
FROM stat.UniqueTeams uTeam
LEFT JOIN stat.Matches match
ON match.HomeTeam = uTeam.id
Things to be noted while using LEFT JOIN in query:
1) First of all, left join can introduce NULL(s) that can be a performance issue because NULL(s) are treated separately by server engine.
2) The table being join as null-able should not be bulky otherwise it will be costly to execute (performance + resource).
3) Try to include column(s) that has been already indexed. Otherwise, if you need to include such column(s) than better first you build some index(es) for them.
In your case you have two columns from the same table to be left joined to another table. So, in this case a good approach would be if you can have a single table with same column of required data as I have shown below:
; WITH Match AS
(
-- Select all required columns and alise the key column(s) as shown below
SELECT match1.*, match1.AwayTeam AS TeamId FROM stat.Matches match1
UNION
SELECT match2.*, match2.HomeTeam AS TeamId FROM stat.Matches match2
)
SELECT
*
FROM
stat.UniqueTeams uTeam
OUTER APPLY Match WHERE Match.TeamId = uTeam.Id
I have used OUTER APPLY which is almost similar to LEFT OUTER JOIN but it is different during query execution. It works as Table-Valued Function that can preform better in your case.
my answer is not to the point, but i found this question seeking for "or" condition for inner join, so it maybe be useful for the next seeker
we can use legacy syntax for case of inner join:
select *
from stat.UniqueTeams uTeam, stat.Matches match
where match.AwayTeam = uTeam.Id or match.HomeTeam = uTeam.id
note - this query has bad perfomance (cross join at first, then filter). but it can work with lot of conditions, and suitable for dirty data research(for example t1.id=t2.id or t1.name=t2.name)

which approach is best to get data from multiple tables

I have 2 tables and suppose having bulk of data
Table1 Table2
Id Name Address Id Table1Id Parents_Name Address
1 ABC 123ABC 1 1 DDD Xyz
. ... ...... . . ... ...
. ... ...... . . ... ...
Now if i want solution in below format
Id Name Parents_Name
1 ABC DDD
. ... ...
. ... ...
then which one will be best
Subquery or join
Joins and subqueries are both be used to query data from different tables and may even share the same query plan, but there are many differences between them. Knowing the differences and when to use either a join or subquery to search data from one or more tables is key to mastering SQL.
Joins and subqueries are both used to combine data from different tables into a single result. They share many similarities and differences.
Subqueries can be used to return either a scalar (single) value or a row set; whereas, joins are used to return rows.
A common use for a subquery may be to calculate a summary value for use in a query. For instance we can use a subquery to help us obtain all products have a greater than average product price.
SELECT ProductID,
Name,
ListPrice,
(SELECT AVG(ListPrice)
FROM Production.Product) AS AvgListPrice
FROM Production.Product
WHERE ListPrice > (SELECT AVG(ListPrice)
FROM Production.Product)
There are two subqueries in this SELECT statement. The first’s purpose is to display the average list price of all products, the second’s purpose is for filtering out products less than or equal to the average list price.
Subquery Free Video Offer
Here the subquery is returning a single value which is then used filter out products.
Notice how the subqueries are queries unto themselves. In this example you could paste the subquery, without the parenthesis, into a query window and run it.
Contrast this with a join whose main purpose of a join is to combine rows from one or more tables based on a match condition. For example we can use a join display product names and models.
Select Product.Name,
ProductModel.Name as ModelName
FROM Production.product
INNER JOIN Production.ProductModel
ON Product.ProductModelID = ProductModel.ProductModelID
In this statement we’re using an INNER JOIN to match rows from both the Product and ProductModel tables. Notice that the column ProducModel.Name is available for use throughout the query.
The combined row set is then available by the select statement for use to display, filter, or group by the columns.
This is different than the subquery. There the subquery returns a result, which is immediately used.
Note that join is an integral part of the select statement. It can not stand on its own as a subquery can.
A subquery is used to run a separate query from within the main query. In many cases the returned value is displayed as a column or used in a filter condition such as where or having clause. When a subquery incorporates a column from the main query it is said to be correlated. In this way a sub query is somewhat like a join in that values from two or more tables can be compared.
Joins are used in the FROM clause of the WHERE statement; however, you’ll find subqueries used in most clauses such as the:
SELECT List – here a subqueries used to return single values are used.
WHERE clause– depending on the conditional operator you’ll see single value or row based subqueries.
FROM clause– It is typical to see row based result subqueries used here.
HAVING clause – In my experience scalar (single value) subqueries are used here.
Though joins and subqueries have many differences, they can be used to solve similar problems. In fact just because you write a SQL statement as a subquery doesn’t mean the DBMS executes as such.
you can refer to this link for more details:-
http://www.essentialsql.com/what-is-the-difference-between-a-join-and-subquery/
Asuming you want an INNER JOIN:
SELECT t1.Id
,t1.Name
,t2.Parents_Name
FROM table1 t1
INNER JOIN table2 t2
ON t2.Table1Id = t1.Id
But you could also use an OUTER JOIN to always show rows from table1 and show the matching rows from table2.
SELECT t1.Id
,t1.Name
,t2.Parents_Name
FROM table1 t1
LEFT OUTER JOIN table2 t2
ON t2.Table1Id = t1.Id
You will want to use a join, which will almost certainly be much more efficient. A quick google with give you much more information than you need to learn about the different types of joins available and how to use them, such as https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
Please create an index on the id of Table1 and Table1Id of Table2 and use this query.
Select t1.Id,
t1.Name,
t2.Parents_Name
from Table1 t1
inner join
Table2 t2 on t1.id=t2.Table1Id
IMHO, Join should be preferred considering query performance. Squbquery will require IN operator that is always a pain with bulk data. However it also depends in your indexing. Better you check both.

Shortcut for adding table to column name SQL-server 2014

Stupidly simple question, but I just don't know what to google!
If I create a query like this:
Select id, data
from table1
Now I want to join with table2. I can immediately see that the id column is no longer unique and I have to change it to
table1.id
Is there any smart way (like a keyboard-shortcut) to do this, instead of manually adding table1 to every column? Either before I add the Join to secure that all columns will be unique, or after with suggestions based on the different possible tables.
No, there is no helper.
But do not you can alias the table name:
select x.Col1, y.Col2
from ALongTableName x
inner join AReallyReallyLongTableName y on x.Id = y.OtherId
which can also make queries clearer, and is very much necessary when doing self joins.
First of all, you should start using aliases:
SQL aliases are used to give a database table, or a column in a table,
a temporary name.
Basically aliases are created to make column names more readable.
This will narrow down your problem and make your code maintenance easier. If that's not enough, I guess you could start using auto-completion tools, such as these:
SQL Complete
SQL Prompt
ApexSQL Complete
These have your desired functionality, however, they do not always work as expected (at least for me).
Oh! You can use alias table name. Like this:
SELECT A.ID, A.data
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID
You just only use A. or B. if two table have same this column selected. If they different, you don't need: Like this:
SELECT A.ID, data -- if Table B not have column data
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID
Or:
Select A.*, B.ID
FROM TableA A
INNER JOIN TableB B
ON A.ID = B.ID

TSQL query to merge data from multiple tables that may or may not have matching rows?

For example, suppose we're conducting research where students can take up to 10 different tests, and each table in the database stores all the students' responses for one test. The tables are named after each test as: T1, T2, ... , T10. Suppose each table has a primary key column 'Username' that identifies each student. Students may or may not have completed each test, so there may or may not be a record in each table for each student.
What is the correct SQL Query to return all the test data from all tables, with one row per student (one row per username)? I want the simplest query possible that returns the correct results. I would also like to coalesce the Username fields into a single Username field in the final query.
To clarify, I understand that SQL has a major limitation in that it does not support a syntax to select all columns except one or more fields like "select *[^ExcludeColumn1][^ExcludeColumn2]". To avoid specifically naming all columns in the final query, it would be acceptable to leave all the Username columns there, as long as it includes a coalesced Username field at the beginning named something like RowID.
As for the overall query, one option would be to perform a union all on the username column of all ten tables, then select the distinct usernames across all tables, then perform a series of left joins against the list of distinct usernames on all 10 tables. That would result in a very straightforward query where each left join is performed on the same distinct set of usernames, but I want to avoid a separate up-front query for distinct usernames. (Although if that's the best option, let me know). It would look something like this:
select * from
(select distinct coalesce(t1.Username,t2.Username,...,t10.Username) as RowID from t1,t2,t3,t4,t5,t6,t7,t8,t9,t10) distinct_usernames
left join t1 on t1.Username = distinct_usernames.RowID
left join t2 on t2.Username = distinct_usernames.RowID
...
left join t10 on t10.Username = distinct_usernames.RowID
Although that is short and easy to write, it is incredibly inefficient and would take hours to run on test tables with 5000+ rows each, so with an adjustment, an equivalent version that runs in a few seconds is:
select * from (
select distinct Username as RowID from (
select Username from t1
union all
select Username from t2
union all
...
select Username from t10
) all_usernames) distinct_usernames
left join t1 on t1.Username = distinct_usernames.RowID
left join t2 on t2.Username = distinct_usernames.RowID
...
left join t10 on t10.Username = distinct_usernames.RowID
I think that what I have above might be the most efficient and correct query (takes only a couple seconds to run and returns correct result set), but I also thought perhaps it could be simplified with some kind of full join. The problem is that full joins get confusing with more than two tables, because without pre-determining the usernames, each subsequent table would have to match records against any of the preceding tables, resulting in a query where each additional table has "[previous table count] + 1" conditions on matching the username.
Assuming that Username is unique in each table, your second query would be the way I would try first, with the slight modifications of removing distinct and simply using union (which implies distinct) rather than union all:
select *
from (
select Username from t1
union
select Username from t2
union
-- ...
select Username from t10
) distinct_usernames
left join t1 on t1.Username = distinct_usernames.Username
left join t2 on t2.Username = distinct_usernames.Username
-- ...
left join t10 on t10.Username = distinct_usernames.Username
From there I would make sure that Username is indexed, possibly even using it as the clustered index. I've also had optimization luck in the past by implementing your distinct_usernames as a temp table (possibly indexed, or an indexed view) at the beginning of the proc, but only testing would determine if that were worthwhile.
A full outer join would require a bunch of or conditions or coalesce arguments, though it could be worth a try on just a few tables to see if the performance is there. I can't try to out-guess what your query engine will like best.
Also, getting just the column names that you want could be done with a query to sys.columns or information_schema.columns and using dynamic SQL to build your query as a string and then executing that.

Iterating 1 row at a time with massive amounts of links/joins

Ok, basically what is needed is a way to have row numbers while using a lot of joins and having where clauses using these rownumbers.
such as something like
select ADDRESS.ADDRESS FROM ADDRESS
INNER JOIN WORKHISTORY ON WORKHISTORY.ADDRESSRID=ADDRESS.ADDRESSRID
INNER JOIN PERSON ON PERSON.PERSONRID=WORKHISTORY.PERSONRID
WHERE PERSONRID=<some number> AND WORKHISTORY.ROWNUMBER=1
ROWNUMBER needs to be generated for this query on that one table though. So that if we want to access the second WORKHISTORY record's address, we could just go WORKHISTORY.ROWNUMBER=2 and if say we had two address's that matched, we could cycle through the addresses for one WORKHISTORY record using ADDRESS.ROWNUMBER=1 and ADDRESS.ROWNUMBER=2
This should be capable of being an automatically generated query. Thus, there could be more than 10 inner joins in order to get to the relevant table, and we need to be able to cycle through each table's record independently of the rest of the tables..
I'm aware there is the RANK and ROWNUMBER functions, but I'm not seeing how it will work for me because of all the inner joins
note: in this example query, ROWNUMBER should be automatically generated! It should never be stored in the actual table
Can you use a temp table?
I ask because you can write the code like this:
select a.field1, b.field2, c.field3, identity (int, 1,1) as TableRownumber into #temp
from table1 a
join table2 b on a.table1id = b.table1id
join table3 c on b.table2id = c.table2id
select * from #temp where ...

Resources