which approach is best to get data from multiple tables - sql-server

I have 2 tables and suppose having bulk of data
Table1 Table2
Id Name Address Id Table1Id Parents_Name Address
1 ABC 123ABC 1 1 DDD Xyz
. ... ...... . . ... ...
. ... ...... . . ... ...
Now if i want solution in below format
Id Name Parents_Name
1 ABC DDD
. ... ...
. ... ...
then which one will be best
Subquery or join

Joins and subqueries are both be used to query data from different tables and may even share the same query plan, but there are many differences between them. Knowing the differences and when to use either a join or subquery to search data from one or more tables is key to mastering SQL.
Joins and subqueries are both used to combine data from different tables into a single result. They share many similarities and differences.
Subqueries can be used to return either a scalar (single) value or a row set; whereas, joins are used to return rows.
A common use for a subquery may be to calculate a summary value for use in a query. For instance we can use a subquery to help us obtain all products have a greater than average product price.
SELECT ProductID,
Name,
ListPrice,
(SELECT AVG(ListPrice)
FROM Production.Product) AS AvgListPrice
FROM Production.Product
WHERE ListPrice > (SELECT AVG(ListPrice)
FROM Production.Product)
There are two subqueries in this SELECT statement. The first’s purpose is to display the average list price of all products, the second’s purpose is for filtering out products less than or equal to the average list price.
Subquery Free Video Offer
Here the subquery is returning a single value which is then used filter out products.
Notice how the subqueries are queries unto themselves. In this example you could paste the subquery, without the parenthesis, into a query window and run it.
Contrast this with a join whose main purpose of a join is to combine rows from one or more tables based on a match condition. For example we can use a join display product names and models.
Select Product.Name,
ProductModel.Name as ModelName
FROM Production.product
INNER JOIN Production.ProductModel
ON Product.ProductModelID = ProductModel.ProductModelID
In this statement we’re using an INNER JOIN to match rows from both the Product and ProductModel tables. Notice that the column ProducModel.Name is available for use throughout the query.
The combined row set is then available by the select statement for use to display, filter, or group by the columns.
This is different than the subquery. There the subquery returns a result, which is immediately used.
Note that join is an integral part of the select statement. It can not stand on its own as a subquery can.
A subquery is used to run a separate query from within the main query. In many cases the returned value is displayed as a column or used in a filter condition such as where or having clause. When a subquery incorporates a column from the main query it is said to be correlated. In this way a sub query is somewhat like a join in that values from two or more tables can be compared.
Joins are used in the FROM clause of the WHERE statement; however, you’ll find subqueries used in most clauses such as the:
SELECT List – here a subqueries used to return single values are used.
WHERE clause– depending on the conditional operator you’ll see single value or row based subqueries.
FROM clause– It is typical to see row based result subqueries used here.
HAVING clause – In my experience scalar (single value) subqueries are used here.
Though joins and subqueries have many differences, they can be used to solve similar problems. In fact just because you write a SQL statement as a subquery doesn’t mean the DBMS executes as such.
you can refer to this link for more details:-
http://www.essentialsql.com/what-is-the-difference-between-a-join-and-subquery/

Asuming you want an INNER JOIN:
SELECT t1.Id
,t1.Name
,t2.Parents_Name
FROM table1 t1
INNER JOIN table2 t2
ON t2.Table1Id = t1.Id
But you could also use an OUTER JOIN to always show rows from table1 and show the matching rows from table2.
SELECT t1.Id
,t1.Name
,t2.Parents_Name
FROM table1 t1
LEFT OUTER JOIN table2 t2
ON t2.Table1Id = t1.Id

You will want to use a join, which will almost certainly be much more efficient. A quick google with give you much more information than you need to learn about the different types of joins available and how to use them, such as https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/

Please create an index on the id of Table1 and Table1Id of Table2 and use this query.
Select t1.Id,
t1.Name,
t2.Parents_Name
from Table1 t1
inner join
Table2 t2 on t1.id=t2.Table1Id

IMHO, Join should be preferred considering query performance. Squbquery will require IN operator that is always a pain with bulk data. However it also depends in your indexing. Better you check both.

Related

Subquery is returning more than one value? Not getting a result set?

Write a SELECT statement that returns two columns: VendorName and LargestInv
(LargestInv is the correlation name of the subquery)
Subquery portion:
SELECT statement that returns the largest InvoiceTotal from the Invoices table (you will need to perform the JOIN within the subquery in one of the clauses).
Sort the results by LargestInv from largest to smallest.(Subquery Must be in the Select statement)
I have tried this but My subqueries returning more than one value
USE AP
SELECT VendorName, (SELECT MAX(InvoiceTotal) FROM Invoices JOIN Vendors
ON Invoices.VendorID = Vendors.VendorID
GROUP BY Invoices.VendorID) AS LargestInv
FROM Vendors
Your issue is scope.
The sub-query shouldn't be joining to the Vendor table if the goal is a correlated sub-query. The "correlated" part comes from joining the results of the inner query (the sub-query) to the outer query.
As written, you're finding VendorID inside the sub-query and the results aren't correlated to the outer query at all. Hence your error message.
SELECT
VendorName,
(SELECT MAX(InvoiceTotal)
FROM Invoices
WHERE Invoices.VendorID = Vendors.VendorID
) AS LargestInv
FROM Vendors
ORDER BY LargestInv DESC;
Edit (extended explanation):
A correlated sub-query isn't designed to pull up a full result set, like the sub-query you were writing at first. It's designed to go over to another table and use a value (or values) from the outer query to bring back a single result, one row at a time.
In this case, using the VendorID from the Vendors table, go over to Invoices, calculate a MAX value "WHERE" the VendorID in Invoices matches the VendorID ON THIS ROW, bring that single value back, then, next row, go back and do that again. And again and again.
It's one way to get the data, but it's not usually efficient. Later, though, you'll learn to use correlated sub-queries in (NOT) EXISTS clauses, and in that context they tend to be extremely efficient. Story for another day, but it's one reason the construct is important to know.
So, your way was good, because it was set based and would tend to be more efficient as a sub-query in the FROM clause, but this way, row by row, is important to understand conceptually.
This is how I would do it.
SELECT VendorName, LargestInv.MaxI
FROM Vendors
FROM (
SELECT VendorName, MAX(InvoiceTotal) as MaxI
FROM Invoices
JOIN Vendors ON Invoices.VendorID = Vendors.VendorID
GROUP BY VendorName
) AS LargestInv ON LargestInv.VendorName = Vendors.VendorName
Now having more than one in the sub-query won't give you an error and you can look at the results.

How to compare the results of two separate queries that have a common field in Sql Server?

Maybe it's because it's Friday but I can't seem to get this and it feels like it should be really really easy.
I have one result set (pulls the data from multiple tables) that gives me the following result set:
Room Type | ImageID | Date
The next query (pulled from separate tables than above) result give me :
ImageID | Date | Tagged
I just want to compare the results to see which imageid's are common between the two results, and which fall into each list separately.
I have tried insert the results from each into temp tables and do a join on imageid but sql server does NOT like that. Ideally I would like a solution that allows me to do this without creating temp tables.
I researched using union but it seems that because the two results don't have the same columns I avoided that route.
Thanks!
You can do this a number of different ways, for instance you can use either a inner join or intersect using the two sets as derived tables.
select ImageID from (your_first_query)
intersect
select ImageID from (your_second_query)
or
select query1.ImageID
from (your_first_query) query1
inner join (your_second_query) query2 on query1.ImageID = query2.ImageID
You don't explain why SQL-Server does not like performing a join on ImageId. Shouldn't be a problem. As to your first question, you need to transform your two queries into subqueries and perform a Full Out Join on them:
Select * from
(Select Room Type, ImageID, Date ...) T1 Full Outer Join
(Select ImageID, Date, Tagged ...) T2 on T1.ImageId = T2.ImageId
The analysis of Null values on both side of the join should give you what you want.
SELECT TableA.ImageID
FROM TableA
WHERE TableA.ImageID
IN (SELECT TableB.ImageID FROM TableB)
select q1.ImageID
from (your_first_query) q1
WHERE EXISTS (select 1
from (your_second_query)
WHERE ImageID = q1.ImageID)

TSQL query to merge data from multiple tables that may or may not have matching rows?

For example, suppose we're conducting research where students can take up to 10 different tests, and each table in the database stores all the students' responses for one test. The tables are named after each test as: T1, T2, ... , T10. Suppose each table has a primary key column 'Username' that identifies each student. Students may or may not have completed each test, so there may or may not be a record in each table for each student.
What is the correct SQL Query to return all the test data from all tables, with one row per student (one row per username)? I want the simplest query possible that returns the correct results. I would also like to coalesce the Username fields into a single Username field in the final query.
To clarify, I understand that SQL has a major limitation in that it does not support a syntax to select all columns except one or more fields like "select *[^ExcludeColumn1][^ExcludeColumn2]". To avoid specifically naming all columns in the final query, it would be acceptable to leave all the Username columns there, as long as it includes a coalesced Username field at the beginning named something like RowID.
As for the overall query, one option would be to perform a union all on the username column of all ten tables, then select the distinct usernames across all tables, then perform a series of left joins against the list of distinct usernames on all 10 tables. That would result in a very straightforward query where each left join is performed on the same distinct set of usernames, but I want to avoid a separate up-front query for distinct usernames. (Although if that's the best option, let me know). It would look something like this:
select * from
(select distinct coalesce(t1.Username,t2.Username,...,t10.Username) as RowID from t1,t2,t3,t4,t5,t6,t7,t8,t9,t10) distinct_usernames
left join t1 on t1.Username = distinct_usernames.RowID
left join t2 on t2.Username = distinct_usernames.RowID
...
left join t10 on t10.Username = distinct_usernames.RowID
Although that is short and easy to write, it is incredibly inefficient and would take hours to run on test tables with 5000+ rows each, so with an adjustment, an equivalent version that runs in a few seconds is:
select * from (
select distinct Username as RowID from (
select Username from t1
union all
select Username from t2
union all
...
select Username from t10
) all_usernames) distinct_usernames
left join t1 on t1.Username = distinct_usernames.RowID
left join t2 on t2.Username = distinct_usernames.RowID
...
left join t10 on t10.Username = distinct_usernames.RowID
I think that what I have above might be the most efficient and correct query (takes only a couple seconds to run and returns correct result set), but I also thought perhaps it could be simplified with some kind of full join. The problem is that full joins get confusing with more than two tables, because without pre-determining the usernames, each subsequent table would have to match records against any of the preceding tables, resulting in a query where each additional table has "[previous table count] + 1" conditions on matching the username.
Assuming that Username is unique in each table, your second query would be the way I would try first, with the slight modifications of removing distinct and simply using union (which implies distinct) rather than union all:
select *
from (
select Username from t1
union
select Username from t2
union
-- ...
select Username from t10
) distinct_usernames
left join t1 on t1.Username = distinct_usernames.Username
left join t2 on t2.Username = distinct_usernames.Username
-- ...
left join t10 on t10.Username = distinct_usernames.Username
From there I would make sure that Username is indexed, possibly even using it as the clustered index. I've also had optimization luck in the past by implementing your distinct_usernames as a temp table (possibly indexed, or an indexed view) at the beginning of the proc, but only testing would determine if that were worthwhile.
A full outer join would require a bunch of or conditions or coalesce arguments, though it could be worth a try on just a few tables to see if the performance is there. I can't try to out-guess what your query engine will like best.
Also, getting just the column names that you want could be done with a query to sys.columns or information_schema.columns and using dynamic SQL to build your query as a string and then executing that.

SQL FROM clause using n>1 tables

If you add more than one table to the FROM clause (in a query), how does this impact the result set? Does it first select from the first table then from the second and then create a union (i.e., only the rowspace is impacted?) or does it actually do something like a join (i.e., extend the column space)? And when you use multiple tables in the FROM clause, does the WHERE clause filter both sub-result-sets?
Specifying two tables in your FROM clause will execute a JOIN. You can then use the WHERE clause to specify your JOIN conditions. If you fail to do this, you will end-up with a Cartesian product (every row in the first table indiscriminately joined to every row in the second).
The code will look something like this:
SELECT a.*, b.*
FROM table1 a, table2 b
WHERE a.id = b.id
However, I always try to explicitly specify my JOINs (with JOIN and ON keywords). That makes it abundantly clear (for the next developer) as to what you're trying to do. Here's the same JOIN, but explicitly specified:
SELECT a.*, b.*
FROM table1 a
INNER JOIN table2 b ON b.id = a.id
Note that now I don't need a WHERE clause. This method also helps you avoid generating an inadvertent Cartesian product (if you happen to forget your WHERE clause), because the ON is specified explicitly.

Iterating 1 row at a time with massive amounts of links/joins

Ok, basically what is needed is a way to have row numbers while using a lot of joins and having where clauses using these rownumbers.
such as something like
select ADDRESS.ADDRESS FROM ADDRESS
INNER JOIN WORKHISTORY ON WORKHISTORY.ADDRESSRID=ADDRESS.ADDRESSRID
INNER JOIN PERSON ON PERSON.PERSONRID=WORKHISTORY.PERSONRID
WHERE PERSONRID=<some number> AND WORKHISTORY.ROWNUMBER=1
ROWNUMBER needs to be generated for this query on that one table though. So that if we want to access the second WORKHISTORY record's address, we could just go WORKHISTORY.ROWNUMBER=2 and if say we had two address's that matched, we could cycle through the addresses for one WORKHISTORY record using ADDRESS.ROWNUMBER=1 and ADDRESS.ROWNUMBER=2
This should be capable of being an automatically generated query. Thus, there could be more than 10 inner joins in order to get to the relevant table, and we need to be able to cycle through each table's record independently of the rest of the tables..
I'm aware there is the RANK and ROWNUMBER functions, but I'm not seeing how it will work for me because of all the inner joins
note: in this example query, ROWNUMBER should be automatically generated! It should never be stored in the actual table
Can you use a temp table?
I ask because you can write the code like this:
select a.field1, b.field2, c.field3, identity (int, 1,1) as TableRownumber into #temp
from table1 a
join table2 b on a.table1id = b.table1id
join table3 c on b.table2id = c.table2id
select * from #temp where ...

Resources