Duplicates when using inner join in t-sql - sql-server

I know i am missing something ,my issue is, I have two tables with identical values except a filter and trying to join these temp tables in a SP but i am getting duplicate values.
Below is the sample code
SELECT DISTINCT
B.SUBSCRIBER_TAX_ID, B.MEMBER_FIRST_NAME, B.MEMBER_LAST_NAME,
B.BENEFIT_PLAN_NAME AS MEDICAL_PLAN, B.MEMBER_EFF_DATE AS MED_EFF_DATE, B.MEMBER_TERMINATION_DATE AS MED_END_DATE,
P.BENEFIT_PLAN_NAME AS PHARM_PLAN_NAME, P.MEMBER_EFF_DATE AS PHARM_EFF_DATE, P.MEMBER_TERMINATION_DATE AS PHARM_ENDdATE
FROM #BH_MED B
INNER JOIN #BH_PHARM P ON B.MEMBER_HCC_ID = P.MEMBER_HCC_ID
order by b.BENEFIT_PLAN_NAME,P.BENEFIT_PLAN_NAME
I want results as
!I want distinct abc,def in column 3 and column 6

Use group by
SELECT DISTINCT
B.SUBSCRIBER_TAX_ID, B.MEMBER_FIRST_NAME, B.MEMBER_LAST_NAME,
B.BENEFIT_PLAN_NAME AS MEDICAL_PLAN, B.MEMBER_EFF_DATE AS MED_EFF_DATE, B.MEMBER_TERMINATION_DATE AS MED_END_DATE,
P.BENEFIT_PLAN_NAME AS PHARM_PLAN_NAME, P.MEMBER_EFF_DATE AS PHARM_EFF_DATE, P.MEMBER_TERMINATION_DATE AS PHARM_ENDdATE
FROM #BH_MED B
INNER JOIN #BH_PHARM P ON B.MEMBER_HCC_ID = P.MEMBER_HCC_ID
GROUP BY B.SUBSCRIBER_TAX_ID, B.MEMBER_FIRST_NAME, B.MEMBER_LAST_NAME,B.BENEFIT_PLAN_NAME,B.MEMBER_EFF_DATE,B.MEMBER_TERMINATION_DATE,P.BENEFIT_PLAN_NAME,P.MEMBER_EFF_DATE,P.MEMBER_TERMINATION_DATE
order by b.BENEFIT_PLAN_NAME,P.BENEFIT_PLAN_NAME

Related

Decorrelate a query using semi join

I am new to query optimization,how to use semi join while implementing decorrelation I can't totally understand.
Consider the query
SELECT A, B
FROM r
WHERE r.B < SOME (
SELECT B
FROM s
WHERE s.A = r.A
)
Show how to decorrelate the above query using the multi-set version of
the semi-join operation
You may write your query using an inner join as follows:
SELECT DISTINCT r.A, r.B
FROM r
INNER JOIN s
ON r.A = s.A
WHERE r.B < s.B;
The DISTINCT clause is necessary in this version of your query, because a given record in the r table could potentially join to more than one match in the s table. In your original version, there can't be duplicates, because of the SOME clause which take a set of records any always returns a single yes/no answer.

RIGHT\LEFT Join does not provide null values without condition

I have two tables one is the lookup table and the other is the data table. The lookup table has columns named cycleid, cycle. The data table has SID, cycleid, cycle. Below is the structure of the tables.
If you check the data table, the SID may have all the cycles and may not have all the cycles. I want to output the SID completed as well as missed cycles.
I right joined the lookup table and retrieved the missing as well as completed cycles. Below is the query I used.
SELECT TOP 1000 [SID]
,s4.[CYCLE]
,s4.[CYCLEID]
FROM [dbo].[data] s3 RIGHT JOIN
[dbo].[lookup_data] s4 ON s3.CYCLEID = s4.CYCLEID
The query is not displaying me the missed values when I query for all the SID's. When I specifically query for a SID with the below query i am getting the correct result including the missed ones.
SELECT TOP 1000 [SID]
,s4.[CYCLE]
,s4.[CYCLEID]
FROM [dbo].[data] s3 RIGHT JOIN [dbo].[lookup_data] s4
ON s3.CYCLEID = s4.CYCLEID
AND s3.SID = 101002
ORDER BY [SID], s4.[CYCLEID]
As I am supplying this query into tableau I cannot provide the sid value in the query. I want to return all the sid's and from tableau I will be do the rest of the things.
The expected output that i need is as shown below.
I wrote a cross join query like below to acheive my expected output
SELECT DISTINCT
tab.CYCLEID
,tab.SID
,d.CYCLE
FROM ( SELECT d.SID
,d.[CYCLE]
,e.CYCLEID
FROM ( SELECT e.sid
,e.CYCLE
FROM [db_temp].[dbo].[Sheet3$] e
) d
CROSS JOIN [db_temp].[dbo].[Sheet4$] e
) tab
LEFT OUTER JOIN [db_temp].[dbo].[Sheet3$] d
ON d.CYCLEID = tab.CYCLEID
AND d.SID = tab.SID
ORDER BY tab.SID
,tab.CYCLEID;
However I am not able to use this query for more scenarios as my data set have nearly 20 to 40 columns and i am having issues when i use the above one.
Is there any way to do this in a simpler manner with only left or right join itself? I want the query to return all the missing values and the completed values for the all the SID's instead of supplying a single sid in the query.
You can create a master table first (combine all SID and CYCLE ID), then right join with the data table
;with ctxMaster as (
select distinct d.SID, l.CYCLE, l.CYCLEID
from lookup_data l
cross join data d
)
select d.SID, m.CYCLE, m.CYCLEID
from ctxMaster m
left join data d on m.SID = d.SID and m.CYCLEID = d.CYCLEID
order by m.SID, m.CYCLEID
Fiddle
Or if you don't want to use common table expression, subquery version:
select d.SID, m.CYCLE, m.CYCLEID
from (select distinct d.SID, l.CYCLE, l.CYCLEID
from lookup_data l
cross join data d) m
left join data d on m.SID = d.SID and m.CYCLEID = d.CYCLEID
order by m.SID, m.CYCLEID

Why do I have duplicate records in my JOIN

I am retrieving data from table ProductionReportMetrics where I have column NetRate_QuoteID. Then to that result set I need to get Description column.
And in order to get a Description column, I need to join 3 tables:
NetRate_Quote_Insur_Quote
NetRate_Quote_Insur_Quote_Locat
NetRate_Quote_Insur_Quote_Locat_Liabi
But after that my premium is completely off.
What am I doing wrong here?
SELECT QLL.Description,
QLL.ClassCode,
prm.NetRate_QuoteID,
QL.LocationID,
ISNULL(SUM(premium),0) AS NetWrittenPremium,
MONTH(prm.EffectiveDate) AS EffMonth
FROM ProductionReportMetrics prm
LEFT JOIN NetRate_Quote_Insur_Quote Q
ON prm.NetRate_QuoteID = Q.QuoteID
INNER JOIN NetRate_Quote_Insur_Quote_Locat QL
ON Q.QuoteID = QL.QuoteID
INNER JOIN NetRate_Quote_Insur_Quote_Locat_Liabi QLL
ON QL.LocationID = QLL.LocationID
WHERE YEAR(prm.EffectiveDate) = 2016 AND
CompanyLine = 'Ironshore Insurance Company'
GROUP BY MONTH(prm.EffectiveDate),
QLL.Description,
QLL.ClassCode,
prm.NetRate_QuoteID,
QL.LocationID
I think the problem in this table:
What Am I missing in this Query?
select
ClassCode,
QLL.Description,
sum(Premium)
from ProductionReportMetrics prm
LEFT JOIN NetRate_Quote_Insur_Quote Q ON prm.NetRate_QuoteID = Q.QuoteID
LEFT JOIN NetRate_Quote_Insur_Quote_Locat QL ON Q.QuoteID = QL.QuoteID
LEFT JOIN
(SELECT * FROM NetRate_Quote_Insur_Quote_Locat_Liabi nqI
JOIN ( SELECT LocationID, MAX(ClassCode)
FROM NetRate_Quote_Insur_Quote_Locat_Liabi GROUP BY LocationID ) nqA
ON nqA.LocationID = nqI.LocationID ) QLL ON QLL.LocationID = QL.LocationID
where Year(prm.EffectiveDate) = 2016 AND CompanyLine = 'Ironshore Insurance Company'
GROUP BY Q.QuoteID,QL.QuoteID,QL.LocationID
Now it says
Msg 8156, Level 16, State 1, Line 14
The column 'LocationID' was specified multiple times for 'QLL'.
It looks like DVT basically hit on the answer. The only reason you would get different amounts(i.e. duplicated rows) as a result of a join is that one of the joined tables is not a 1:1 relationship with the primary table.
I would suggest you do a quick check against those tables, looking for table counts.
--this should be your baseline count
SELECT COUNT(*)
FROM ProductionReportMetrics
GROUP BY MONTH(prm.EffectiveDate),
prm.NetRate_QuoteID
--this will be a check against the first joined table.
SELECT COUNT(*)
FROM NetRate_Quote_Insur_Quote Q
WHERE QuoteID IN
(SELECT NetRate_QuoteID
FROM ProductionReportMetrics
GROUP BY MONTH(prm.EffectiveDate),
prm.NetRate_QuoteID)
Basically you will want to do a similar check against each of your joined tables. If any of the joined tables are part of the grouping statement, make sure they are also in the grouping of the count check statement. Also make sure to alter the WHERE clause of the check count statement to use the join clause columns you were using.
Once you find a table that returns the incorrect number of rows, you will have your answer as to what table is causing the problem. Then you will just have to decide how to limit that table down to distinct rows(some type of aggregation).
This advice is really just to show you how to QA this particular query. Break it up into the smallest possible parts. In this case, we know that it is a join that is causing the problem, so take it one join at a time until you find the offender.

MS SQL Table Joins - Multiple Tables

I am new to MS SQL and am having trouble joining 4 tables within a query.
I am trying to join Orders, Order Lines, Client, and Picked tables to create a query to show quantity ordered and picked for a client. If I comment out the last inner join for Picked, I get the correct results. When I include the inner join for Picked the query returns results but data that should be in the Picked fields is NULL. One order line can have 1 or more Picked lines.
SELECT W_Warehouse, OH.OrderID, OH.RequiredDate, C.Client, OL.LineNbr, OL.QtyOrd, P.QtyPick
FROM Order
INNER JOIN Warehouse on Order.OH_WHS = Warehouse.W_PK
INNER JOIN Client on Order.O_Client = Client.C_PK
INNER JOIN OrderLine on Order.O_PK = OrderLine.OL_PK
INNER JOIN Picked on OrderLine.O_PK = Picked.P_PK
WHERE C.CLIENT = 'WENDYS'
Without knowing the data in the tables it is difficult to answer precisely.
But as you say you have 1+ rows in the Picked table, you probably want to do aggregation with GROUP BY and SUM()
Maybe this is what you're looking for:
SELECT
W.W_Warehouse,
OH.OrderID,
OH.RequiredDate,
C.Client,
OL.LineNbr,
OL.QtyOrd,
P.QtyPick
FROM
Order OH
INNER JOIN Warehouse W on OH.OH_WHS = W.W_PK
INNER JOIN Client C on OH.O_Client = C.C_PK
INNER JOIN OrderLine OL on OH.O_PK = OL.OL_PK
CROSS APPLY (
select sum(QtyPick) as QtyPick
from Picked P
where OL.O_PK = P.P_PK
) P
WHERE
C.CLIENT = 'WENDYS'
It calculates the sum of QtyPick separately so it doesn't increase the number of lines in the result.

Why does this query work only when I use group by?

This query works:
select p.Nombre as Nombre, c.Nombre as Categoria, s.Nombre as Subcategoria FROM Producto as p
inner join Subcategoria as s ON p.IDSubcategoria = s.ID
inner join Categoria as c on s.IDCategoria = c.ID
group by p.Nombre, c.Nombre, s.Nombre
order by p.Nombre
But when I remove the s.Nombre on the group by statement, I get this error:
Msg 8120, Level 16, State 1, Line 1
Column 'Subcategoria.Nombre' is
invalid in the select list because it
is not contained in either an
aggregate function or the GROUP BY
clause.
Can someone explain to me a little bit what the group by function does and why it allows the query to work?
In the interest of learning! Thanks.
When you state group by p.Nombre, you are specifying that there should be exactly 1 row of output for each distinct p.Nombre. Hence, other fields in the select clause must be aggregated (so that if there are multiple rows with the same p.Nombre, they can be 'collapsed' into one value)
By grouping on p.Nombre, c.Nombre, s.Nombre, you are saying that there should be exactly 1 row of output for each distinct tuple. Hence, it works (because the fields displayed are involved in the grouping clause).
If you use GROUP BY clause you can have on SELECT fields:
the fields that you already use in group by section
agregates (min, max, count....) on other fields
One little example:
MyTable
FieldA FieldB
a 1
a 2
b 3
b 5
Query:
select a, b from myTable GroupBy a
A B
a ?
b ?
Which values you want to have in the field B?
a-> 1 or a -> 2 or a -> 3 (1+2)
If the first you need min(a) aggregate function. If you need 2 - max. If 3 - sum().
The group by function collapses those rows that have the same value in the columns specified in the GROUP BY clause to just one row. For any other columns in your SELECT which are not specified in the GROUP BY clause, the SQL engine needs to know what to do with those columns too by way of an aggregation function, e.g. SUM, MAX, AVG, etc. If you don't specify an aggregation function then the engine throws an exception because it doesn't know what to do.
E.g.
select p.Nombre as Nombre, c.Nombre as Categoria, SUM(s.Nombre) as Subcategoria FROM Producto as p
inner join Subcategoria as s ON p.IDSubcategoria = s.ID
inner join Categoria as c on s.IDCategoria = c.ID
group by p.Nombre, c.Nombre
order by p.Nombre
A group-by clause is only required if you use aggregate functions like COUNT or MAX. As a side effect it removes duplicate rows. In your case it is simpler to remove duplicates by adding DISTINCT to the select clause, and removing the group-by clause altogether.
select DISTINCT p.Nombre as Nombre, c.Nombre as Categoria, s.Nombre as Subcategoria FROM Producto as p
inner join Subcategoria as s ON p.IDSubcategoria = s.ID
inner join Categoria as c on s.IDCategoria = c.ID
order by p.Nombre

Resources