Why Inner Join worked as Cross Join in SQL Server? - sql-server

I am trying to join several tables using INNER JOIN.
Here is code
IF OBJECT_ID('tempdb..#tmpRecData') IS NOT NULL
DROP TABLE #tmpRecData
--STEP 1
SELECT DISTINCT
pr.ChainID, pr.StoreID, pr.SupplierID, pr.ProductID,
MAX(CAST(pr.ActiveLastDate AS date)) AS 'Active Date'
--ChainID, SupplierID, StoreID, InvoiceDate, InvoiceNumber, SupplierInvoiceDate, SupplierInvoiceNumber
INTO
#tmpRecData
FROM
dbo.[ProductPrices_Retailer] AS pr
LEFT JOIN
ProductIdentifiers iden ON pr.ProductID = iden.ProductID
AND iden.ProductIdentifierTypeID = 2
WHERE
pr.ChainID = '119121'
AND pr.ActiveLastDate > '12/01/2016'
GROUP BY
pr.ProductID, pr.ProductName, iden.IdentifierValue,
pr.ChainID, pr.StoreID, pr.SupplierID
--STEP 2
SELECT
rec.ChainID, rec.StoreID, rec.SupplierInvoiceNumber,
rec.TransactionTypeID, rec.SupplierID, rec.SaleDateTime,
rec.ProductID, rec.UPC, rec.ProductDescriptionReported,
rec.RawProductIdentifier
FROM
#tmpRecData t
INNER JOIN
dbo.StoreTransactions AS rec WITH (NOLOCK) ON rec.ChainID = T.ChainID
WHERE
rec.ChainID = '119121'
DROP TABLE #tmpRecData
I am getting 4096 (Step1) * 145979 (Step2) = 725077693 rows (725 million)
This is a huge number of records, but I have used INNER JOIN, so why it worked as CROSS JOIN?

CROSS JOIN is very different to INNER JOIN.
INNER JOIN displays only the rows that have a match in both the joined tables..
CROSS JOIN produces a Cartesian product of the tables in the join. The number of rows of the result is the number of the rows in first table multiplied by the number of rows in the second table.

You need to join with store ID in step2 for this to work. It is running chainID for every store , hence too many number of records. If products also need to match, then you need to Join productID as well in step2
IF OBJECT_ID('tempdb..#tmpRecData') IS NOT NULL DROP TABLE #tmpRecData
--STEP 1
SELECT DISTINCT pr.ChainID,pr.StoreID,pr.SupplierID,pr.ProductID, MAX(CAST(pr.ActiveLastDate AS date)) AS 'Active Date'
--ChainID, SupplierID, StoreID, InvoiceDate, InvoiceNumber, SupplierInvoiceDate, SupplierInvoiceNumber
INTO #tmpRecData
FROM dbo.[ProductPrices_Retailer] AS pr
LEFT JOIN ProductIdentifiers iden
ON pr.ProductID=iden.ProductID
AND iden.ProductIdentifierTypeID=2
WHERE pr.ChainID='119121'
AND pr.ActiveLastDate>'12/01/2016'
GROUP BY pr.ProductID,pr.ProductName,iden.IdentifierValue,pr.ChainID,pr.StoreID,pr.SupplierID
--STEP 2
SELECT rec.ChainID,rec.StoreID,rec.SupplierInvoiceNumber,rec.TransactionTypeID,rec.SupplierID,rec.SaleDateTime,
rec.ProductID,rec.UPC,rec.ProductDescriptionReported,rec.RawProductIdentifier
FROM #tmpRecData t
INNER JOIN dbo.StoreTransactions AS rec WITH (NOLOCK)
ON rec.ChainID=T.ChainID and rec.StoreID = T.storeID
WHERE rec.ChainID='119121'
DROP TABLE #tmpRecData

Related

Multiple Nested Inner Joins: not all records are shown

I have difficulty joining two tables that look like the following:
The main table PMEOBJECT which has a unique key named OBJECTID and
has in total 12768 rows.
Then I want to join PMEOBJECTVALIDITY on it which has an n:1 relationship with PMEOBJECT, since it has more rows,
because it saves the changes over time of PMEOBJECT (i.e. when a certain object is not
valid anymore), this one has 12789 rows (meaning only 21 objects
changed over time). However, I only want to have the current last
VALIDFROM date shown in the query. This all works fine.
Then the trouble starts when I want to join PMEOBJECTDIMENSION, which has an
n:1 relationship with PMEOBJECTVALIDITY and has 36737 rows in total.
SELECT
PMEOBJECT.OBJECTID
,PMEOBJECTVALIDITY.VALIDFROM
,PMEOBJECTDIMENSION.DIMENSION2_
FROM PMEOBJECT
LEFT JOIN PMEOBJECTVALIDITY
ON PMEOBJECTVALIDITY.OBJECTID = PMEOBJECT.OBJECTID
AND PMEOBJECTVALIDITY.DATAAREAID = PMEOBJECT.DATAAREAID
INNER JOIN(
SELECT
OBJECTID,
MAX(VALIDFROM) AS NEWFROMDATE,
MAX(VALIDTO) AS NEWTODATE
FROM PMEOBJECTVALIDITY B
GROUP BY OBJECTID
) B
ON PMEOBJECTVALIDITY.OBJECTID = B.OBJECTID
AND PMEOBJECTVALIDITY.VALIDFROM = B.NEWFROMDATE
LEFT JOIN PMEOBJECTDIMENSION
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = PMEOBJECTVALIDITY.RECID
AND PMEOBJECTDIMENSION.DATAAREAID = PMEOBJECTVALIDITY.DATAAREAID
INNER JOIN(
SELECT
OBJECTVALIDITYID,
MAX(VALIDFROM) AS NEWFROMDATE_2
FROM PMEOBJECTDIMENSION C
GROUP BY OBJECTVALIDITYID
) C
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = C.OBJECTVALIDITYID
AND PMEOBJECTDIMENSION.VALIDFROM = C.NEWFROMDATE_2
Results in query per step:
SELECT PMEOBJECT: 12768 rows
LEFT JOIN PMEVALIDITY: 12789 rows
INNER JOIN PMEVALIDITY: 12768 rows
LEFT JOIN PMEOBJECTDIMENSION: 36737 rows
INNER JOIN PMEOBJECTDIMENSION: 12729 rows
I want the end result again to have the same 12768 rows, I don't want any ObjectId to be left out.
What am I missing here?
Kind regards,
Igor
Following might help:
from PMEOBJECTDIMENSION onwards:
LEFT JOIN (SELECT PMEOBJECTDIMENSION.OBJECTVALIDITYID, PMEOBJECTDIMENSION.DATAAREAID
FROM PMEOBJECTDIMENSION
INNER JOIN(SELECT OBJECTVALIDITYID, MAX(VALIDFROM) AS NEWFROMDATE_2
FROM PMEOBJECTDIMENSION C
GROUP BY OBJECTVALIDITYID
) C
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = C.OBJECTVALIDITYID
AND PMEOBJECTDIMENSION.VALIDFROM = C.NEWFROMDATE_2
)X
ON X.OBJECTVALIDITYID = PMEOBJECTVALIDITY.RECID
AND X.DATAAREAID = PMEOBJECTVALIDITY.DATAAREAID
and select the distinct records if duplicates present.
The INNER JOINs are filtering out records- what you want is that the LEFT JOIN table (PMEOBJECTVALIDITY and PMEOBJECTDIMENSION) should only include records that have at least a match on the INNER JOIN queries (alias B and C). You can accomplish this with by nesting the INNER JOIN with the LEFT JOIN, generally done as follows:
SELECT *
FROM A
LEFT JOIN B
INNER JOIN C
ON B.ID = C.BID
ON A.ID = B.AID
Now B is INNER JOINed on C and will only contain records that have a match in C, but will preserve the LEFT JOIN not remove any records from A.
In your case, you can simply move the ON clause from the LEFT JOIN to the end of the following INNER JOIN.
SELECT
PMEOBJECT.OBJECTID
,PMEOBJECTVALIDITY.VALIDFROM
,PMEOBJECTDIMENSION.DIMENSION2_
FROM PMEOBJECT
LEFT JOIN PMEOBJECTVALIDITY
INNER JOIN(
SELECT
OBJECTID,
MAX(VALIDFROM) AS NEWFROMDATE,
MAX(VALIDTO) AS NEWTODATE
FROM PMEOBJECTVALIDITY B
GROUP BY OBJECTID
) B
ON PMEOBJECTVALIDITY.OBJECTID = B.OBJECTID
AND PMEOBJECTVALIDITY.VALIDFROM = B.NEWFROMDATE
ON PMEOBJECTVALIDITY.OBJECTID = PMEOBJECT.OBJECTID
AND PMEOBJECTVALIDITY.DATAAREAID = PMEOBJECT.DATAAREAID --here it is!
LEFT JOIN PMEOBJECTDIMENSION
INNER JOIN(
SELECT
OBJECTVALIDITYID,
MAX(VALIDFROM) AS NEWFROMDATE_2
FROM PMEOBJECTDIMENSION C
GROUP BY OBJECTVALIDITYID
) C
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = C.OBJECTVALIDITYID
AND PMEOBJECTDIMENSION.VALIDFROM = C.NEWFROMDATE_2
ON PMEOBJECTDIMENSION.OBJECTVALIDITYID = PMEOBJECTVALIDITY.RECID
AND PMEOBJECTDIMENSION.DATAAREAID = PMEOBJECTVALIDITY.DATAAREAID --I'm here

SQL Server Conditional JOIN two tables

I have 3 Tables:
TableA(IDRem, IDPed, IDOP)
TableB(IDOP, IDPed)
TableC(IDPed, InvoiceDate)
I need a select to JOIN the records of TableA and TableC, but there are two possible conditions:
IF IDPed on TableA IS NOT NULL, then join directly to TableC by IDPed
ID IDPed on TableA IS NULL, then join to TableB by IDOp, and then join TableB to TableC by IDPed
So far i try this:
SELECT
TableA.*
,(CASE WHEN TableC.InvoiceDate IS NULL
THEN TableC2.InvoiceDate
ELSE TableC.InvoiceDate
END) AS InvoiceDate
FROM
TableA
LEFT JOIN TableC on TableA.IDPed = TableC.IDPed
LEFT JOIN TableB on TableB.IDOp = TableA.IDOp
INNER JOIN TableC as TableC2 on TableC2.IDPed = TableB.IDPed
The problem with this is that every field o tableA I want to include in the select I need to do a case...when to determine if the origin is tableA or TableA2.
Is there a better way of doing this? Thanks!
For complex joins you are better served in TSQL using cross apply:
When should I use Cross Apply over Inner Join?
select IDRem, IDPed, IDOP from TableA a
cross apply(
select IDOP, IDPed from TableB binner
where a.IDop = binner.IDop
) b
cross apply(
select IDPed, InvoiceDate cinner
where b.IDPed = cinner.IDPed
) c
where ...
Strictly psuedo-code but should give you a start.
You could try doing both JOINs and use UNION ALL
Edit: As pointed out in the comment by Vladimir Baranov, there's no need to check if a.IdPed IS NULL on the first SELECT since if it is, the JOIN would return no rows.
SELECT
a.*,
c.InvoiceDate
FROM TableA a
INNER JOIN TableC c ON c.IdPed = a.IdPed
UNION ALL
SELECT
a.*,
c.InvoiceDate
FROM TableA a
INNER JOIN TableB b ON b.IdOp = a.IdOP
INNER JOIN TableC c ON c.IdPed = b.IdPed
WHERE a.IdPed IS NULL

Applying outer layer that wrapping two select statement

I have two query which has successfully inner join
select t1.countResult, t2.sumResult from (
select
count(column) as countResult
from tableA join tableB
on tableA.id = tableB.id
group by name
)t1 inner join (
select
sum(column) as sumResult
from tableA
join tableB
on tableA.id = tableB.id
group by name
)t2
on t1.name= t2.name
The above query will return me the name and the corresponding number of count and the sum. I need to do a comparison between the count and sum. If the count doesnt match the sum, it will return 0, else 1. And so my idea was implementing another outer layer to wrap them up and use CASE WHEN. However, I've failed to apply an outer layer just to wrap them up? This is what I've tried:
select * from(
select t1.countResult, t2.sumResult from (
select
count(column) as countResult
from tableA join tableB
on tableA.id = tableB.id
group by name
)t1 inner join (
select
sum(column) as sumResult
from tableA
join tableB
on tableA.id = tableB.id
group by name
)t2
on t1.name= t2.name
)
Alright the problem can be solved by simply assigning a name to the outer layer.
select * from(
select t1.countResult, t2.sumResult from (
select
count(column) as countResult
from tableA join tableB
on tableA.id = tableB.id
group by name
)t1 inner join (
select
sum(column) as sumResult
from tableA
join tableB
on tableA.id = tableB.id
group by name
)t2
on t1.name= t2.name
) as whatever //SQL Server need a name to wrap
Hope it will help any newbie like me
Ok, so far you have selected everything your first select has generated (kinda useless, but a start for what you want ;) )
SELECT CASE
WHEN countresult=sumresult THEN 'Equal'
ELSE 'Not'
END
FROM ( --your join select --
)
I don't have any sample data to test this so can just go on your code.
Your queries for t1 & t2 look identical - why don't you just do a sum & count in 1 step?
SELECT COUNT(column) AS countResult
,SUM(column) AS sumResult
FROM tableA INNER JOIN tableB
ON tableA.id = tableB.id
GROUP BY name
Also, as you mention you are a newb - read up on Common Table Expressions in SQL Server.
Before SQL 2005 you had to write these convoluted queries within queries within...
Get into the habit of using CTEs now.

SQL UNION INNER JOIN

Im trying to select from 2 tables that have the same columns, but both tables have an inner join -
select e.ID,
c.FullName,
car.VIN,
car.Registration,
e.Telephone,
e.Mobile,
e.Email,
e.EstimateTotal,
e.LastUpdated,
e.LastUpdateBy from (select id from Estimates UNION ALL select id from PrivateEstimates) e
inner join Customers c on c.ID = e.CustomerID
inner join Cars car on car.ID = e.CarID
where e.Status = 0
The trouble is, it can't find e.CustomerID, e.CarID or e.Status on the inner join? Any ideas?
Your subquery
select id from Estimates
union all
select id from PrivateEstimates
returns only a single id column. Include necessary columns in the subquery if you want to use those columns in JOIN statements

Select rows from table1 which don't have rows in table2

I have 2 tables, OrderTable & OrderDetailTable.
I am trying to select rows from OrderTable that don't have any rows in OrderDetailTable so we can Delete them.
I assume you have an id relation between the 2 tables:
select * from OrderTable
where orderdetails_id not in (select id from OrderDetailTable)
and to delete them
delete from OrderTable
where orderdetails_id not in (select id from OrderDetailTable)
SELECT o.*
FROM OrderTable o
LEFT JOIN OrderDetailTable od ON od.idOrderTable = o.id
WHERE od.id IS NULL;
od.id can be any field from OrderDetailTable that can't be null.
Suppose that OrderTable has a column id, and OrderDetailTable has a column orderTable_id
select * from OrderTable
WHERE id not in (
select ot.id from OrderTable ot inner join OrderDetailTable odt on odt.orderTable_id = ot.id
)

Resources