Why do I have duplicate records in my JOIN - sql-server

I am retrieving data from table ProductionReportMetrics where I have column NetRate_QuoteID. Then to that result set I need to get Description column.
And in order to get a Description column, I need to join 3 tables:
NetRate_Quote_Insur_Quote
NetRate_Quote_Insur_Quote_Locat
NetRate_Quote_Insur_Quote_Locat_Liabi
But after that my premium is completely off.
What am I doing wrong here?
SELECT QLL.Description,
QLL.ClassCode,
prm.NetRate_QuoteID,
QL.LocationID,
ISNULL(SUM(premium),0) AS NetWrittenPremium,
MONTH(prm.EffectiveDate) AS EffMonth
FROM ProductionReportMetrics prm
LEFT JOIN NetRate_Quote_Insur_Quote Q
ON prm.NetRate_QuoteID = Q.QuoteID
INNER JOIN NetRate_Quote_Insur_Quote_Locat QL
ON Q.QuoteID = QL.QuoteID
INNER JOIN NetRate_Quote_Insur_Quote_Locat_Liabi QLL
ON QL.LocationID = QLL.LocationID
WHERE YEAR(prm.EffectiveDate) = 2016 AND
CompanyLine = 'Ironshore Insurance Company'
GROUP BY MONTH(prm.EffectiveDate),
QLL.Description,
QLL.ClassCode,
prm.NetRate_QuoteID,
QL.LocationID
I think the problem in this table:
What Am I missing in this Query?
select
ClassCode,
QLL.Description,
sum(Premium)
from ProductionReportMetrics prm
LEFT JOIN NetRate_Quote_Insur_Quote Q ON prm.NetRate_QuoteID = Q.QuoteID
LEFT JOIN NetRate_Quote_Insur_Quote_Locat QL ON Q.QuoteID = QL.QuoteID
LEFT JOIN
(SELECT * FROM NetRate_Quote_Insur_Quote_Locat_Liabi nqI
JOIN ( SELECT LocationID, MAX(ClassCode)
FROM NetRate_Quote_Insur_Quote_Locat_Liabi GROUP BY LocationID ) nqA
ON nqA.LocationID = nqI.LocationID ) QLL ON QLL.LocationID = QL.LocationID
where Year(prm.EffectiveDate) = 2016 AND CompanyLine = 'Ironshore Insurance Company'
GROUP BY Q.QuoteID,QL.QuoteID,QL.LocationID
Now it says
Msg 8156, Level 16, State 1, Line 14
The column 'LocationID' was specified multiple times for 'QLL'.

It looks like DVT basically hit on the answer. The only reason you would get different amounts(i.e. duplicated rows) as a result of a join is that one of the joined tables is not a 1:1 relationship with the primary table.
I would suggest you do a quick check against those tables, looking for table counts.
--this should be your baseline count
SELECT COUNT(*)
FROM ProductionReportMetrics
GROUP BY MONTH(prm.EffectiveDate),
prm.NetRate_QuoteID
--this will be a check against the first joined table.
SELECT COUNT(*)
FROM NetRate_Quote_Insur_Quote Q
WHERE QuoteID IN
(SELECT NetRate_QuoteID
FROM ProductionReportMetrics
GROUP BY MONTH(prm.EffectiveDate),
prm.NetRate_QuoteID)
Basically you will want to do a similar check against each of your joined tables. If any of the joined tables are part of the grouping statement, make sure they are also in the grouping of the count check statement. Also make sure to alter the WHERE clause of the check count statement to use the join clause columns you were using.
Once you find a table that returns the incorrect number of rows, you will have your answer as to what table is causing the problem. Then you will just have to decide how to limit that table down to distinct rows(some type of aggregation).
This advice is really just to show you how to QA this particular query. Break it up into the smallest possible parts. In this case, we know that it is a join that is causing the problem, so take it one join at a time until you find the offender.

Related

sql server - How to Get all distinct value in group by column from two table and count from another table for each value

I have 3 tables in that 2 tables are master table and 3rd is transaction table. i need to get count from transaction table for each value in other two table without loosing rows in mater table
i need result like below
Table layout for understanding
This is the code i have tried,
select s.status_name, e.machine_group_name, qty = COALESCE(COUNT(e.id),0)
from tbl_status s
left outer JOIN tbl_transaction as e ON e.status_name = s.status_name
group by e.machine_group_name, s.status_name
This is solution i have figured:
select m.machine_group_name, s.status_name, qty = COUNT(e.id) from
tbl_machine_group as m
cross join tbl_status as s
left outer join tbl_transaction as e on e.status_name = s.status_name
and e.machine_group_name = m.machine_group_name
group by m.machine_group_name, s.status_name
order by machine_group_name
select
MC_Group_Name
,Status_Name
,count(1) as [Count of Transaction]
from
tbl_Transaction tbl_3
left join tbl_Machine_Group tbl_1
on tbl_3.MC_Group_Name = tbl_1.MC_Group_Name
left join tbl_Status tbl_2
on tbl_3.Status_Name = tbl_2.Status_Name
group by
MC_Group_Name
,Status_Name

Combine multiple left joins in 1 query

I have two queries that I would like to combine. One query is left joining columns in the same table, the other query is left joining columns from two different tables. Both queries have the same table, just unsure how to properly set up the query.
1st Query:
SELECT BIZ_GROUP,
ORDER_ID,
STATION,
A.TC_DATE,
WANT_DATE,
TIME_SLOT,
JOB_CODE,
[ADDRESS],
CITY,
A.TECH_ID,
A.PREMISE,
ISNULL(B.LAST_ARRIVED, A.LAST_ARRIVE) AS ARRIVED,
ORDER_CLOSED,
COMP_STATUS,
WORK_STATUS,
REMARKS,
CORRECTION
FROM MET_timecommit A
LEFT JOIN(SELECT premise,
TC_DATE,
TECH_ID,
MIN(last_arrive) AS LAST_ARRIVED
FROM MET_timecommit
WHERE PREMISE IS NOT NULL
GROUP BY premise,
TC_DATE,
TECH_ID) B ON B.TC_DATE = A.TC_DATE
AND B.PREMISE = A.PREMISE
2nd query:
SELECT *
FROM MET_timecommit
LEFT JOIN (SELECT ORDER_ID,
created,
host_creation,
went_to
FROM workload
WHERE went_to >= getdate()-365) C ON C.went_to=MET_timecommit.TC_DATE
AND C.order_id=MET_timecommit.order_id
Evidently I am not used to this forum. You all don't have to be so rude. TDP was able to help me out based on what I provided. All other comments were unnecessary.
This should bring back the rows for both tables B and C for each row of table A:
SELECT A.BIZ_GROUP,
A.ORDER_ID,
A.STATION,
A.TC_DATE,
A.WANT_DATE,
A.TIME_SLOT,
A.JOB_CODE,
A.[ADDRESS],
A.CITY,
A.TECH_ID,
A.PREMISE,
ISNULL(B.LAST_ARRIVED, A.LAST_ARRIVE) AS ARRIVED,
A.ORDER_CLOSED,
A.COMP_STATUS,
A.WORK_STATUS,
A.REMARKS,
A.CORRECTION,
C.*
FROM MET_timecommit A
LEFT JOIN(SELECT premise,
TC_DATE,
TECH_ID,
MIN(last_arrive) AS LAST_ARRIVED
FROM MET_timecommit
WHERE PREMISE IS NOT NULL
GROUP BY premise,
TC_DATE,
TECH_ID) B ON B.TC_DATE = A.TC_DATE
AND B.PREMISE = A.PREMISE
LEFT JOIN (SELECT ORDER_ID,
created,
host_creation,
went_to
FROM workload
WHERE went_to >= getdate()-365) C ON C.went_to=A.MET_timecommit.TC_DATE
AND C.order_id=A.MET_timecommit.order_id

RIGHT\LEFT Join does not provide null values without condition

I have two tables one is the lookup table and the other is the data table. The lookup table has columns named cycleid, cycle. The data table has SID, cycleid, cycle. Below is the structure of the tables.
If you check the data table, the SID may have all the cycles and may not have all the cycles. I want to output the SID completed as well as missed cycles.
I right joined the lookup table and retrieved the missing as well as completed cycles. Below is the query I used.
SELECT TOP 1000 [SID]
,s4.[CYCLE]
,s4.[CYCLEID]
FROM [dbo].[data] s3 RIGHT JOIN
[dbo].[lookup_data] s4 ON s3.CYCLEID = s4.CYCLEID
The query is not displaying me the missed values when I query for all the SID's. When I specifically query for a SID with the below query i am getting the correct result including the missed ones.
SELECT TOP 1000 [SID]
,s4.[CYCLE]
,s4.[CYCLEID]
FROM [dbo].[data] s3 RIGHT JOIN [dbo].[lookup_data] s4
ON s3.CYCLEID = s4.CYCLEID
AND s3.SID = 101002
ORDER BY [SID], s4.[CYCLEID]
As I am supplying this query into tableau I cannot provide the sid value in the query. I want to return all the sid's and from tableau I will be do the rest of the things.
The expected output that i need is as shown below.
I wrote a cross join query like below to acheive my expected output
SELECT DISTINCT
tab.CYCLEID
,tab.SID
,d.CYCLE
FROM ( SELECT d.SID
,d.[CYCLE]
,e.CYCLEID
FROM ( SELECT e.sid
,e.CYCLE
FROM [db_temp].[dbo].[Sheet3$] e
) d
CROSS JOIN [db_temp].[dbo].[Sheet4$] e
) tab
LEFT OUTER JOIN [db_temp].[dbo].[Sheet3$] d
ON d.CYCLEID = tab.CYCLEID
AND d.SID = tab.SID
ORDER BY tab.SID
,tab.CYCLEID;
However I am not able to use this query for more scenarios as my data set have nearly 20 to 40 columns and i am having issues when i use the above one.
Is there any way to do this in a simpler manner with only left or right join itself? I want the query to return all the missing values and the completed values for the all the SID's instead of supplying a single sid in the query.
You can create a master table first (combine all SID and CYCLE ID), then right join with the data table
;with ctxMaster as (
select distinct d.SID, l.CYCLE, l.CYCLEID
from lookup_data l
cross join data d
)
select d.SID, m.CYCLE, m.CYCLEID
from ctxMaster m
left join data d on m.SID = d.SID and m.CYCLEID = d.CYCLEID
order by m.SID, m.CYCLEID
Fiddle
Or if you don't want to use common table expression, subquery version:
select d.SID, m.CYCLE, m.CYCLEID
from (select distinct d.SID, l.CYCLE, l.CYCLEID
from lookup_data l
cross join data d) m
left join data d on m.SID = d.SID and m.CYCLEID = d.CYCLEID
order by m.SID, m.CYCLEID

LEFT JOIN gets heavy as the number of records in the second table increases

I am trying to run a SELECT query using LEFT JOIN. I get a COUNT on my second table ( the table on the right side of LEFT JOIN ). This process becomes slightly heavy as the number of records on the second table goes up. My first and second table have a one-to-many relationship. The second table's CampaignId column is a foreign key to the first table's Id. This is a simplified version of my query:
SELECT a.[Id]
,a.CampaignId
,a.[Inserted] AS 'Date'
,COUNT(b.Id) AS 'Received'
FROM [CampaignRun] AS a
LEFT JOIN [CampaignRecipient] AS b
ON a.Id = b.CampaignRunId
GROUP BY
a.[Id], a.CampaignId,a.[Inserted]
HAVING
a.CampaignId = 637
ORDER BY
a.[Inserted] DESC
The number 637 is an example for one the records only.
Is there a way to make this query run faster?
Use a sub-select to calculate Received:
SELECT a.[Id]
,a.CampaignId
,a.[Inserted] AS 'Date'
, (SELECT COUNT(*) FROM [CampaignRecipient] AS b
WHERE a.Id = b.CampaignRunId ) AS 'Received'
FROM [CampaignRun] AS a
WHERE a.CampaignId = 637
ORDER BY a.[Inserted] DESC
You have unneed HAVING clause here, which you can move to WHERE clause
SELECT a.[Id]
,a.CampaignId
,a.[Inserted] AS 'Date'
,COUNT(b.Id) AS 'Received'
FROM [CampaignRun] AS a
LEFT JOIN [CampaignRecipient] AS b
ON a.Id = b.CampaignRunId
WHERE a.CampaignId = 637
GROUP BY a.[Id], a.CampaignId,a.[Inserted]
ORDER BY a.[Inserted] DESC
Also ensure that you have index on foreign key in [CampaignRecipient] table on CampaignRunId column. It's considered a good practice.

Query Executing Problem

Using SQL 2005: “Taking too much time to execute”
I want to filter the date, the date should not display in holidays, and I am using three tables with Inner Join
When I run the below query, It taking too much time to execute, because I filter the cardeventdate with three table.
Query
SELECT
PERSONID, CardEventDate tmp_cardevent3
WHERE (CardEventDate NOT IN
(SELECT T_CARDEVENT.CARDEVENTDATE
FROM T_PERSON
INNER JOIN T_CARDEVENT ON T_PERSON.PERSONID = T_CARDEVENT.PERSONID
INNER JOIN DUAL_PRO_II_TAS.dbo.T_WORKINOUTTIME ON T_CARDEVENT.CARDEVENTDAY = DUAL_PRO_II_TAS.dbo.T_WORKINOUTTIME.DAYCODE
AND T_PERSON.TACODE = DUAL_PRO_II_TAS.dbo.T_WORKINOUTTIME.TACODE
WHERE (DUAL_PRO_II_TAS.dbo.T_WORKINOUTTIME.HOLIDAY = 'true')
)
)
ORDER BY PERSONID, CardEventDate DESC
For the above mentioned Query, there is any other way to do date filter.
Expecting alternative queries for my query?
I'm pretty sure that it's not the joined tables that is the problem, but rather the "not in" that makes it slow.
Try to use a join instead:
select m.PERSONID, m.CardEventDate
from T_PERSON p
inner join T_CARDEVENT c on p.PERSONID = c.PERSONID
inner join DUAL_PRO_II_TAS.dbo.T_WORKINOUTTIME w
on c.CARDEVENTDAY = w.DAYCODE
and p.TACODE = w.TACODE
and w.HOLIDAY = 'true'
right join tmp_cardevent3 m on m.CardEventDate = c.CardEventDate
where c.CardEventDate is null
order by m.PERSONID, m.CardEventDate desc
(There is a from clause missing from your query, so I don't know what table you are trying to get the data from.)
Edit:
Put tmp_cardevent3 in the correct place.
Have you created indices on all of the columns that you are using to do the joins? In particular, I'd consider indices on PERSONID in T_CARDEVENT, TACODE in both T_PERSON and T_WORKINOUTTIME, and HOLIDAY in T_WORKINOUTTIME.

Resources