I am attempting to create a query that pulls information from two other tables, however I only know which table to pull from based on a column in another table. I'm currently looking into doing this using a stored procedure (e.g. build the query and then run it) but I wanted to know if there is a better way to do this, or if I could accomplish it in a single query.
In terms of the connections, ID's are unique accross the entire database, so no two ID's will overlap. However I do not know which subtable the ID relates. I am able to find this by pulling in an unrelated table that happens to have the information (call it the Object Table). One of the columns will give me the table name for the information (in my example below, Person). I have drafted a simple example below. Can you see any way I could accomplish this in a single query? Something like this is what I am aiming for but I am starting to think its not possible.
SELECT * FROM base_table
LEFT JOIN object ON object.id = base_table.role
LEFT JOIN [object.type] tmp ON tmp.entity_id = base_table.entity_id
id | role | entity_id (Base Table)
---------------------
1 | 101 | 1000
id | type (Objects Table)
------------
101| person
entity_id | name | etc.. (Person Table)
------------------------
1000 | Bob | ...
I also expect unions might be a possible solution - but other then just joining all the possible tables and parsing the columns to match up properly (which it could be as many as 20 tables) I'd rather not. This solution is also a bit of a nusience since the columns don't always match in a good way (e.g. the Person table doesn't have similar columns to the Address table)
I don't think the left join idea is that bad if you just ignore object type.
Since each ID is unique you don't need to look at type at all if you use coalesce. So to use #TT model as an example:
SELECT bt.*,
COALESCE(P.f1, L.f1, C.f1) AS f1,
-- ...,
COALESCE(P.fn, L.fn, C.fn) AS fn
FROM
base_table AS bt
LEFT JOIN Person AS P ON P.entity_id = bt.entity_id
LEFT JOIN [Legal Person] AS L ON L.entity_id = bt.entity_id
LEFT JOIN Counterpart AS C ON C.entity_id = bt.entity_id
Depending on your data size and indexes this might perform faster or the same as TT's example -- remember there is only 1 select with N joins while TT's has N selects, 2N joins. It really depends on your data.
If there is some field (fz) that does not show up in all types then you just don't inlcude that in the coalesce clause.
I think this style might be easier to maintain and understand and will be the same or faster as TT code.
What you probably want to do is the following: for each possible detail-table (ie the possible values in [object.value]), write a query that only links with that one detail-table and have a WHERE clause to restrict to the proper entities. Then do a UNION ALL for all those queries.
Say you have Person, Legal Person and Counterpart as possible values in [object.type]. Suppose the detail-tables have the same names. You can write:
SELECT
bt.*,
dt.f1,
-- ...,
dt.fn
FROM
base_table AS bt
INNER JOIN object AS o ON o.id = bt.role
INNER JOIN Person AS dt ON dt.entity_id = bt.entity_id
WHERE
o.type='Person'
UNION ALL
SELECT
bt.*,
dt.f1,
-- ...,
dt.fn
FROM
base_table AS bt
INNER JOIN object AS o ON o.id = bt.role
INNER JOIN [Legal Person] AS dt ON dt.entity_id = bt.entity_id
WHERE
o.type='Legal Person'
UNION ALL
SELECT
bt.*,
dt.f1,
-- ...,
dt.fn
FROM
base_table AS bt
INNER JOIN object AS o ON o.id = bt.role
INNER JOIN Counterpart AS dt ON dt.entity_id = bt.entity_id
WHERE
o.type='Counterpart'
Related
So i have to do a cartesian product (or CROSS JOIN) between two tables. One problem is that both tables have a column with the name 'itemname'. My current case looks as follows:
select *
into #cartesian_temp
from xsale CROSS JOIN xitem
delete from #cartesian_temp where deptname='books' and itemcolor='bamboo'
select * from #cartesian_temp
so the error I get is:
Column names in each table must be unique. Column name 'itemname' in table '#cartesian_temp' is specified more than once
Anyone that can help me with my problem?
you can add alias for the columns like below.
select XS.itemname as saleitemname , XI.itemname as saleitemname2
into #cartesian_temp
from xsale XS CROSS JOIN xitem XI
This is one of the reasons why seasoned SQL pro's will ALWAYS advocate to give Tables an alias and to ALWAYS fully qualify every column name by using the alias. It's not just a cross join problem
Avoid this:
SELECT *
FROM
person
INNER JOIN
address
ON addressid = address.id -- Person.addressid
Sure, it'll work as long as the column names are all unique (it'll probably cause issues even now because person will have an I'd column and so will address) but it might stop working at any point in future if someone adds columns to either table with names that clash
Prefer this:
SELECT p.id as personid, a.id as addressid, p.name, a.zipcode
FROM
person p
INNER JOIN
address a
ON p.addressid = a.id
This is fully aliased (both tables have an alias) and we haven't used select *; weve fully qualified every column with a prefix using the table alias and we've aliased columns that have the same named (the ID columns) in each table so we can tel them apart. No one can add any columns to the db and cause this query to stop working
Aliasing Tables helps in another way; it lets us use the same table twice in a query. Suppose a person had a work address and a home address:
SELECT ...
FROM
person p
INNER JOIN
address awork
ON p.workaddressid = awork.id
INNER JOIN
address ahome
ON p.homeaddressid = ahome.id
This is impossible without aliasing. Always give aliases a sensible name (not a1, a2)
For your case, go like:
SELECT xs.itemname as xsitemname, xi.itemname as xiitemname, ...
FROM
xsale xs
CROSS JOIN
xitem xi
WHERE
xi.itemcolor = 'green'
This will be every green item crossed with every sale item
We have been learning SQL Server programming in Database Systems class. The professor goes exceptionally fast and is not very open to asking questions. I did ask him this, but he just advised me to review the code he'd given (which doesn't actually answer the question).
When making a query, what is the difference between using the term JOIN and using the "=" operator? For example, I have the following query:
SELECT VENDOR_NAME, ITEM_NAME, QTY
FROM VENDOR, VENDOR_ORDER, INVENTORY
WHERE VENDOR.VENDOR_ID = VENDOR_ORDER.VENDOR_ID
AND VENDOR_ORDER.INV_ID = INVENTORY.INV_ID
ORDER BY VENDOR_NAME
In class the professor has used the following code:
SELECT DISTINCT CUS_CODE, CUS_LNAME, CUS_FNAME
FROM CUSTOMER JOIN INVOICE USING (CUS_CODE)
JOIN LINE USING (INV_NUMBER)
JOIN PRODUCT USING (P_CODE)
WHERE P_DESCRIPT = 'Claw hammer';
It seems to me that using a join is performing the same function as the "=" is in mine? Am I correct or is there a difference that I am unaware of?
Edit:
Trying to use Inner Join based on things I've found on Google. I ended up with the following.
SELECT VENDOR_NAME, ITEM_NAME, QTY
FROM VENDOR, VENDOR_ORDER, INVENTORY
INNER JOIN VENDOR_ORDER USING (VENDOR_ID)
INNER JOIN INVENTORY USING (INV_ID)
ORDER BY VENDOR_NAME
Now I get the error message ""VENDOR_ID" is not a recognized table hints option. If it is intended as a parameter to a table-valued function or to the CHANGETABLE function, ensure that your database compatibility mode is set to 90.
"
I'm using 2014, so my compatibility level is 120.
The difference between what you are doing (in your first example) and what your professor is doing is that you are creating a set of all possible combinations of the rows in those tables, then narrowing your results to the ones that match the way you want them to. He is creating a set of only the rows that match the way you want them to in the first place.
If your tables were:
Table1
ID1
1
2
3
Table2
ID2
1
2
3
Your query starts with basically a cross join:
Select * from Table1, Table2
ID1 ID2
1 1
2 1
3 1
1 2
2 2
3 2
1 3
2 3
3 3
Then narrows that result set down by applying the where ID1 = ID2
ID1 ID2
1 1
2 2
3 3
This is inefficient and somewhat difficult to read in more complex examples, as people have mentioned in the comments.
Your professor is building the criteria to relate the two tables into the join itself, so he is effectively skipping the first step. In our example tables, this would be Select * from Table1 join Table2 on ID1 = ID2.
There are several types of joins in SQL, which differ based on how you want to handle cases where a value exists in one of your tables, but has no match in the other table. See traditional venn diagram explanation from http://www.codeproject.com/Articles/33052/Visual-Representation-of-SQL-Joins:
Don't worry it's your professors issue not yours. Make sure you give appropriate feedback at the end of the course ;)
Hang in there.
So here is some info:
So the first issue is: your professor should not be teaching you USING because it has limited implementation (it definitely won't work in SQL Server) and IMHO it's a bad idea because you should explicitly list join columns.
Here are some queries that will work in SQL Server - lets build them up bit by bit. I will need to make some assumptions
First just join vendor to vendor order:
SELECT VENDOR.VENDOR_NAME, VENDOR_ORDER.QTY
FROM VENDOR
INNER JOIN
VENDOR_ORDER
ON VENDOR.VENDOR_ID = VENDOR_ORDER.VENDOR_ORDER
By using inner join we match these two tables on VENDOR_ID
If you have seven records in VENDOR_ORDER with VENDOR_ID = 7, and one record in table VENDOR then the result of this will be.... 7 records, with the data from the VENDOR table repeating seven times.
Now to that, join in inventory
SELECT VENDOR.VENDOR_NAME, INVENTORY.ITEM_NAME, VENDOR_ORDER.QTY
FROM VENDOR
INNER JOIN
VENDOR_ORDER
ON VENDOR.VENDOR_ID = VENDOR_ORDER.VENDOR_ORDER
INNER JOIN
INVENTORY ON INVENTORY.INV_ID = VENDOR_ORDER.INV_ID
ORDER BY VENDOR.VENDOR_NAME
This 'INNER JOIN' syntax is the modern version (often referred as SQL-92). Having a comma seperated list after the FROM clause is 'old school'
Both methods work the same way but the old school way causes ambiguities if you start using outer joins. So get into the habit of doing it the new way.
Lastly, to neaten things up you can use an 'allias'. Which means you give each table a shorter name then use that. I've also added in the invoice number so you can get an idea of what's going on:
SELECT V.VENDOR_NAME, I.ITEM_NAME, ORD.INV_ID, ORD.QTY
FROM VENDOR As V
INNER JOIN
VENDOR_ORDER As ORD
ON V.VENDOR_ID = ORD.VENDOR_ORDER
INNER JOIN
INVENTORY As I ON I.INV_ID = ORD.INV_ID
ORDER BY V.VENDOR_NAME
Maybe it's because it's Friday but I can't seem to get this and it feels like it should be really really easy.
I have one result set (pulls the data from multiple tables) that gives me the following result set:
Room Type | ImageID | Date
The next query (pulled from separate tables than above) result give me :
ImageID | Date | Tagged
I just want to compare the results to see which imageid's are common between the two results, and which fall into each list separately.
I have tried insert the results from each into temp tables and do a join on imageid but sql server does NOT like that. Ideally I would like a solution that allows me to do this without creating temp tables.
I researched using union but it seems that because the two results don't have the same columns I avoided that route.
Thanks!
You can do this a number of different ways, for instance you can use either a inner join or intersect using the two sets as derived tables.
select ImageID from (your_first_query)
intersect
select ImageID from (your_second_query)
or
select query1.ImageID
from (your_first_query) query1
inner join (your_second_query) query2 on query1.ImageID = query2.ImageID
You don't explain why SQL-Server does not like performing a join on ImageId. Shouldn't be a problem. As to your first question, you need to transform your two queries into subqueries and perform a Full Out Join on them:
Select * from
(Select Room Type, ImageID, Date ...) T1 Full Outer Join
(Select ImageID, Date, Tagged ...) T2 on T1.ImageId = T2.ImageId
The analysis of Null values on both side of the join should give you what you want.
SELECT TableA.ImageID
FROM TableA
WHERE TableA.ImageID
IN (SELECT TableB.ImageID FROM TableB)
select q1.ImageID
from (your_first_query) q1
WHERE EXISTS (select 1
from (your_second_query)
WHERE ImageID = q1.ImageID)
I have the following DB Structure (simplified):
Payments
----------------------
Id | int
InvoiceId | int
Active | bit
Processed | bit
Invoices
----------------------
Id | int
CustomerOrderId | int
CustomerOrders
------------------------------------
Id | int
ApprovalDate | DateTime
ExternalStoreOrderNumber | nvarchar
Each Customer Order has an Invoice and each Invoice can have multiple Payments.
The ExternalStoreOrderNumber is a reference to the order from the external partner store we imported the order from and the ApprovalDate the timestamp when that import happened.
Now we have the problem that we had a wrong import an need to change some payments to other invoices (several hundert, so too mach to do by hand) according to the following logic:
Search the Invoice of the Order which has the same external number as the current one but starts with 0 instead of the current digit.
To do that I created the following query:
UPDATE DB.dbo.Payments
SET InvoiceId=
(SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
WHERE I.CustomerOrderId=
(SELECT TOP 1 O.Id FROM DB.dbo.CustomerOrders AS O
WHERE O.ExternalOrderNumber='0'+SUBSTRING(
(SELECT TOP 1 OO.ExternalOrderNumber FROM DB.dbo.CustomerOrders AS OO
WHERE OO.Id=I.CustomerOrderId), 1, 10000)))
WHERE Id IN (
SELECT P.Id
FROM DB.dbo.Payments AS P
JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'
Now I started that query on a test system using the live data (~250.000 rows in each table) and it is now running since 16h - did I do something completely wrong in the query or is there a way to speed it up a little?
It is not required to be really fast, as it is a one time task, but several hours seems long to me and as I want to learn for the (hopefully not happening) next time I would like some feedback how to improve...
You might as well kill the query. Your update subquery is completely un-correlated to the table being updated. From the looks of it, when it completes, EVERY SINGLE dbo.payments record will have the same value.
To break down your query, you might find that the subquery runs fine on its own.
SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
WHERE I.CustomerOrderId=
(SELECT TOP 1 O.Id FROM DB.dbo.CustomerOrders AS O
WHERE O.ExternalOrderNumber='0'+SUBSTRING(
(SELECT TOP 1 OO.ExternalOrderNumber FROM DB.dbo.CustomerOrders AS OO
WHERE OO.Id=I.CustomerOrderId), 1, 10000))
That is always a BIG worry.
The next thing is that it is running this row-by-row for every record in the table.
You are also double-dipping into payments, by selecting from where ... the id is from a join involving itself. You can reference a table for update in the JOIN clause using this pattern:
UPDATE P
....
FROM DB.dbo.Payments AS P
JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'
Moving on, another mistake is to use TOP without ORDER BY. That's asking for random results. If you know there's only one result, you wouldn't even need TOP. In this case, maybe you're ok with randomly choosing one from many possible matches. Since you have three levels of TOP(1) without ORDER BY, you might as well just mash them all up (join) and take a single TOP(1) across all of them. That would make it look like this
SET InvoiceId=
(SELECT TOP 1 I.Id
FROM DB.dbo.Invoices AS I
JOIN DB.dbo.CustomerOrders AS O
ON I.CustomerOrderId=O.Id
JOIN DB.dbo.CustomerOrders AS OO
ON O.ExternalOrderNumber='0'+SUBSTRING(OO.ExternalOrderNumber,1,100)
AND OO.Id=I.CustomerOrderId)
However, as I mentioned very early on, this is not being correlated to the main FROM clause at all. We move the entire search into the main query so that we can make use of JOIN-based set operations rather than row-by-row subqueries.
Before I show the final query (fully commented), I think your SUBSTRING is supposed to address this logic but starts with 0 instead of the current digit. However, if that means how I read it, it means that for an order number '5678', you're looking for '0678' which would also mean that SUBSTRING should be using 2,10000 instead of 1,10000.
UPDATE P
SET InvoiceId=II.Id
FROM DB.dbo.Payments AS P
-- invoices for payments
JOIN DB.dbo.Invoices AS I ON I.Id=P.InvoiceId
-- orders for invoices
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
-- another order with '0' as leading digit
JOIN DB.dbo.CustomerOrders AS OO
ON OO.ExternalOrderNumber='0'+substring(O.ExternalOrderNumber,2,1000)
-- invoices for this other order
JOIN DB.dbo.Invoices AS II ON OO.Id=II.CustomerOrderId
-- conditions for the Payments records
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00'
It is worth noting that SQL Server allows UPDATE ..FROM ..JOIN which is less supported by other DBMS, e.g. Oracle. This is because for a single row in Payments (update target), I hope you can see that it is evident it could have many choices of II.Id to choose from from all the cartesian joins. You will get a random possible II.Id.
I think something like this will be more efficient ,if I understood your query right. As i wrote it by hand and didn't run it, it may has some syntax error.
UPDATE DB.dbo.Payments
set InvoiceId=(SELECT TOP 1 I.Id FROM DB.dbo.Invoices AS I
inner join DB.dbo.CustomerOrders AS O ON I.CustomerOrderId=O.Id
inner join DB.dbo.CustomerOrders AS OO On OO.Id=I.CustomerOrderId
and O.ExternalOrderNumber='0'+SUBSTRING(OO.ExternalOrderNumber, 1, 10000)))
FROM DB.dbo.Payments
JOIN DB.dbo.Invoices AS I ON I.Id=Payments.InvoiceId and
Payments.Active=0
AND Payments.Processed=0
AND O.ApprovalDate='2012-07-19 00:00:00'
JOIN DB.dbo.CustomerOrders AS O ON O.Id=I.CustomerOrderId
Try to re-write using JOINs. This will highlight some of the problems. Will the following function do just the same? (The queries are somewhat different, but I guess this is roughly what you're trying to do)
UPDATE Payments
SET InvoiceId= I.Id
FROM DB.dbo.Payments
CROSS JOIN DB.dbo.Invoices AS I
INNER JOIN DB.dbo.CustomerOrders AS O
ON I.CustomerOrderId = O.Id
INNER JOIN DB.dbo.CustomerOrders AS OO
ON O.ExternalOrderNumer = '0' + SUBSTRING(OO.ExternalOrderNumber, 1, 10000)
AND OO.Id = I.CustomerOrderId
WHERE P.Active=0 AND P.Processed=0 AND O.ApprovalDate='2012-07-19 00:00:00')
As you see, two problems stand out:
The undonditional join between Payments and Invoices (of course, you've caught this off by a TOP 1 statement, but set-wise it's still unconditional) - I'm not really sure if this really is a problem in your query. Will be in mine though :).
The join on a 10000-character column (SUBSTRING), embodied in a condition. This is highly inefficient.
If you need a one-time speedup, just take the queries on each table, try to store the in-between-results in temporary tables, create indices on those temporary tables and use the temporary tables to perform the update.
I have a lookup table that has a Name and an ID in it. Example:
ID NAME
-----------------------------------------------------------
5499EFC9-925C-4856-A8DC-ACDBB9D0035E CANCELLED
D1E31B18-1A98-4E1A-90DA-E6A3684AD5B0 31PR
The first record indicates and order status. The next indicates a service type.
In a query from an orders table I do the following:
INNER JOIN order.id = lut.Statusid
This returns the 'cancelled' name from my lookup table. I also need the service type in the same row. This is connected in the order table by the orders.serviceid How would I go about doing this?
It Cancelled doesnt connect to 31PR.
Orders connects to both. Orders has 2 fields in it called Servicetypeid and orderstatusid. That is how those 2 connect to the order. I need to return both names in the same order row.
I think many will tell you that having two different pieces of data in the same column violates first normal form. There is a reason why having one lookup table to rule them all is a bad idea. However, you can do something like the following:
Select ..
From order
Join lut
On lut.Id = order.StatusId
Left Join lut As l2
On l2.id = order.ServiceTypeId
If order.ServiceTypeId (or whatever the column is named) is not nullable, then you can use a Join (inner join) instead.
A lot of info left out, but here it goes:
SELECT orders.id, lut1.Name AS OrderStatus, lut2.Name AS ServiceType
FROM orders
INNER JOIN lut lut1 ON order.id = lut.Statusid
INNER JOIN lut lut2 ON order.serviceid = lut.Statusid