Aggregate sum on two columns in the same table - sql-server

I am querying a data warehouse (so I cannot redesign tables), I'll do my best to mimic the scenario in a simple example.
We have 3 main tables for incident, change, and release. These 3 are connected through an intermediate table called intermediate. Here is their structure along with sample data:
Incident Table:
Change Table:
Release Table:
Intermediate Table:
The first 3 tables have exact same structure, but the intermediate table holds connection of these 3 tables pairwise. For example, if Rel1 is connected to Chg1, you have a row in the intermediate table as or . These two rows have no difference and may not co-exist.
QUERY:
I want ALL release records along with number of related incidents and number of related changes. Here is how I achieved this:
WITH SourceTable AS(
SELECT R.ReleaseItem, R.Prop1, R.Prop2 , I.RelOrInc2 as [RelatedIncident] , Null as [RelatedChanges] FROM Release R
LEFT JOIN [Intermediate] I
ON R.ReleaseItem = I.RelOrInc1
WHERE SUBSTRING(I.RelOrInc2,1,3) = 'Inc'
UNION
SELECT R.ReleaseItem, R.Prop1, R.Prop2 , I.RelOrInc1 , Null as [RelatedChanges] FROM Release R
LEFT JOIN [Intermediate] I
ON R.ReleaseItem = I.RelOrInc2
WHERE SUBSTRING(I.RelOrInc1,1,3) = 'Inc'
UNION
SELECT R.ReleaseItem, R.Prop1, R.Prop2 , Null as [RelatedIncident] , I.RelOrInc2 as [RelatedChanges] FROM Release R
LEFT JOIN [Intermediate] I
ON R.ReleaseItem = I.RelOrInc1
WHERE SUBSTRING(I.RelOrInc2,1,3) = 'Chg'
UNION
SELECT R.ReleaseItem, R.Prop1, R.Prop2 , Null as [RelatedIncident] , I.RelOrInc1 as [RelatedChanges] FROM Release R
LEFT JOIN [Intermediate] I
ON R.ReleaseItem = I.RelOrInc2
WHERE SUBSTRING(I.RelOrInc1,1,3) = 'Chg'
)
SELECT REL.* , COUNT(S.RelatedIncident) As [No Of Related Incidents] , COUNT(S.[RelatedChanges]) AS [No of Related Changes] FROM Release REL
LEFT JOIN SourceTable S
ON REL.ReleaseItem = S.ReleaseItem
GROUP BY REL.ReleaseItem, REL.Prop1, REL.Prop2
This query gives me my required result:
But I think my way of handling this query is so naive and not efficient. My data warehouse may contain around millions of records in intermediate table and my approach could be too slow.
Question:
Is there any better way to get such result with better performance?
BTW, I am using MS SQL Server 2012

SELECT
R.ReleaseItem, R.Prop1, R.Prop2,
[No Of Related Incidents] = (SELECT COUNT(*) FROM [Intermediate] i
WHERE (i.RelOrInc1 = r.ReleaseItem AND i.RelOrInc2 LIKE 'Inc%')
OR (i.RelOrInc2 = r.ReleaseItem AND i.RelOrInc1 LIKE 'Inc%')),
[No of Related Changes] = (SELECT COUNT(*) FROM [Intermediate] i
WHERE (i.RelOrInc1 = r.ReleaseItem AND i.RelOrInc2 LIKE 'Chg%')
OR (i.RelOrInc2 = r.ReleaseItem AND i.RelOrInc1 LIKE 'Chg%'))
FROM Release R

I think this will work for you. Can not say anything about performance. You should check.
select r.releaseitem,
r.prop1,
r.prop2,
sum(case when t.relorinc2 like 'inc%' then 1 else 0 end) as incidents,
sum(case when t.relorinc2 like 'chg%' then 1 else 0 end) as changes
from (select relorinc1, relorinc2 from intermediate where relorinc1 like 'rel%'
union all
select relorinc2, relorinc1 from intermediate where relorinc2 like 'rel%')t
join release r on t.relorinc1 = r.releaseitem
group by r.releaseitem, r.prop1, r.prop2

Related

SQL Join with 2 conditions same column but different data names

I need to join 2 tables using 2 columns as identifiers,
Reference and UAP
TABLE 1
Reference
UAP
Week 1
Week 2
Table 2
Reference
UAP
Stock
Here is the problem with data on both tables in UAP column
Table 1 Table 2
UAP1 M1
UAP2 M2
UAP3 M3
UAP4 M4
UAP5 M5
UAP6 M6
UAPP PROTOS
EXT EXTR
the UAPS are the same but name is just different
I have no control over the data i'm getting, it's a IBM DB2 remote server
I tried a local table to join but i want to avoid that 'cause of the performance impact (30+ seconds)
So far my query is this
SELECT
*
FROM OPENQUERY(MACPAC,
'SELECT
P.Referencia,
P.UAP,
P.W01,
P.W02,
S.Stock
FROM AUTO.D805DATPOR.Production AS P
INNER JOIN AUTO.D805DATPOR.Stock S
ON S.Reference = S.Reference
WHERE (P.Reference Not Like ''FS%'')
ORDER BY Reference')
well of course the ideal would be
SELECT
*
FROM OPENQUERY(MACPAC,
'SELECT
P.Referencia,
P.UAP,
P.W01,
P.W02,
S.Stock
FROM AUTO.D805DATPOR.Production AS P
INNER JOIN AUTO.D805DATPOR.Stock S
ON P.Reference = S.Reference AND P.UAP = S.UAP
WHERE (P.Reference Not Like ''FS%'')
ORDER BY Reference')
but it does not work cause different names... Is there a way to make this happen without a local join table that will slow my query by several seconds?
Here is the output of these queries individually
This is the production table
This is the stock table
the output should show me the Stock on the production table by Reference and UAP
EDIT
Sorry for confusion the remote server is IBM DB2. This was an error of my coworker that said it was coming from oracle--
Cheers for correct answer from #Sergey Menshov.
SELECT
Reference,
UAP,
W01,
W02
FROM OPENQUERY(MACPAC,
'SELECT
P.Reference,
P.UAP,
P.W01,
P.W02
FROM AUTO.D805DATPOR.Production AS P
LEFT JOIN AUTO.D805DATPOR.Stock S
ON P.Reference = S.Reference AND
S.UAP =
CASE P.UAP
WHEN ''UAP1'' THEN ''M1''
WHEN ''UAP2'' THEN ''M2''
WHEN ''UAP3'' THEN ''M3''
WHEN ''UAP4'' THEN ''M4''
WHEN ''UAP5'' THEN ''M5''
WHEN ''UAP6'' THEN ''M6''
WHEN ''UAPP'' THEN ''PROTOS''
WHEN ''EXT'' THEN ''EXTR''
END
WHERE (P.Reference Not Like ''FS%'')
ORDER BY Reference DESC')
another method which gives error in UAP1 not a column in table L that might work using dual table for DB2
SELECT
Reference,
UAP,
W01,
W02
FROM OPENQUERY(MACPAC,
'SELECT
P.Reference,
P.UAP,
P.W01,
P.W02
FROM AUTO.D805DATPOR.Production AS P
JOIN
(
SELECT ''UAP1'' AS UAP1, ''M1'' AS UAP2 FROM sysibm.sysdummy1
UNION ALL SELECT ''UAP2'',''M2'' FROM sysibm.sysdummy1
UNION ALL SELECT ''UAP3'',''M3'' FROM sysibm.sysdummy1
UNION ALL SELECT ''UAP4'',''M4'' FROM sysibm.sysdummy1
UNION ALL SELECT ''UAP5'',''M5'' FROM sysibm.sysdummy1
UNION ALL SELECT ''UAP6'',''M6'' FROM sysibm.sysdummy1
UNION ALL SELECT ''UAPP'',''PROTOS'' FROM sysibm.sysdummy1
UNION ALL SELECT ''EXT'',''EXTR'' FROM sysibm.sysdummy1
) L
ON P.UAP = L.UAP1
INNER JOIN AUTO.D805DATPOR.Stock S
ON P.Reference = S.Reference AND S.UAP = L.UAP2
WHERE (P.Reference Not Like ''FS%'')
ORDER BY Reference DESC')
Try to use a link-subquery:
FROM AUTO.D805DATPOR.Production AS P
JOIN
(
SELECT ''UAP1'' AS UAP1,''M1'' AS UAP2 FROM DUAL
UNION ALL SELECT ''UAP2'',''M2'' FROM DUAL
UNION ALL SELECT ''UAP3'',''M3'' FROM DUAL
UNION ALL SELECT ''UAP4'',''M4'' FROM DUAL
UNION ALL SELECT ''UAP5'',''M5'' FROM DUAL
UNION ALL SELECT ''UAP6'',''M6'' FROM DUAL
UNION ALL SELECT ''UAPP'',''PROTOS'' FROM DUAL
UNION ALL SELECT ''EXT'',''EXTR'' FROM DUAL
) L
ON P.UAP=L.UAP1 -- !!!
INNER JOIN AUTO.D805DATPOR.Stock S
ON P.Reference = S.Reference AND S.UAP=L.UAP2 -- !!!
Or you can create a link-table in the Oracle and then use it in your query.
I think it'll be better because you can use it in other queries and insert there a new combinations.
Pseudo code:
CREATE TABLE UAP_LINK(
UAP1 VARCHAR2(20) NOT NULL,
UAP2 VARCHAR2(20) NOT NULL,
PRIMARY KEY(UAP1),
UNIQUE(UAP2)
)
INSERT UAP_LINK VALUES
UAP1, M1
UAP2, M2
UAP3, M3
UAP4, M4
UAP5, M5
UAP6, M6
UAPP, PROTOS
EXT , EXTR
One more variant with CASE:
FROM AUTO.D805DATPOR.Production AS P
INNER JOIN AUTO.D805DATPOR.Stock S
ON P.Reference = S.Reference
AND S.UAP=
CASE P.UAP
WHEN ''UAP1'' THEN ''M1''
WHEN ''UAP2'' THEN ''M2''
WHEN ''UAP3'' THEN ''M3''
WHEN ''UAP4'' THEN ''M4''
WHEN ''UAP5'' THEN ''M5''
WHEN ''UAP6'' THEN ''M6''
WHEN ''UAPP'' THEN ''PROTOS''
WHEN ''EXT'' THEN ''EXTR''
END

Using the results of WITH clause IN where STATEMENT of main query

I am relatively new at SQL so I apologise if this is obvious but I cannot work out how to use the results of the WITH clause query in the where statement of my main query.
My with query pulls the first record for each customer and gives the sale date for that record:
WITH summary AS(
SELECT ed2.customer,ed2.saledate,
ROW_NUMBER()OVER(PARTITION BY ed2.customer
ORDER BY ed2.saledate)AS rk
FROM Filteredxportdocument ed2)
SELECT s.*
FROM summary s
WHERE s.rk=1
I need to use the date in the above query as the starting point and pull all records for each customer for their first 12 months i.e. where the sale date is between ed2.saledate AND ed2.saledate+12 months.
My main query is:
SELECT ed.totalamountincvat, ed.saledate, ed.name AS SaleRef,
ed.customer, ed.customername, comp.numberofemployees,
comp.companyuid
FROM exportdocument AS ed INNER JOIN
FilteredAccount AS comp ON ed.customer = comp.accountid
WHERE (ed.statecode = 0) AND
ed.saledate BETWEEN ed2.saledate AND DATEADD(M,12,ed2.saledate)
I am sure that I need to add the main query into the WITH clause but I cant work out where. Is anyone able to help please
Does this help?
;WITH summary AS(
SELECT ed2.customer,ed2.saledate,
ROW_NUMBER()OVER(PARTITION BY ed2.customer
ORDER BY ed2.saledate)AS rk
FROM Filteredxportdocument ed2)
SELECT ed.totalamountincvat, ed.saledate, ed.name AS SaleRef,
ed.customer, ed.customername, comp.numberofemployees,
comp.companyuid
FROM exportdocument AS ed INNER JOIN
FilteredAccount AS comp ON ed.customer = comp.accountid
OUTER APPLY (SELECT s.* FROM summary s WHERE s.rk=1) ed2
WHERE ed.statecode = 0 AND
ed.saledate BETWEEN ed2.saledate AND DATEADD(M,12,ed2.saledate)
and ed.Customer = ed2.Customer
Results of CTE are not cached or stored, so you can't reuse it.
EDIT:
Based upon your requirement that all the records from CTE should be in final result, this is a new query:
;WITH summary AS(
SELECT ed2.customer,ed2.saledate,
ROW_NUMBER()OVER(PARTITION BY ed2.customer
ORDER BY ed2.saledate)AS rk
FROM Filteredxportdocument ed2)
SELECT
ed.totalamountincvat,
ed.saledate,
ed.name AS SaleRef,
ed.customer,
ed.customername,
comp.numberofemployees,
comp.companyuid
FROM
summary ed2
left join exportdocument ed
on ed.Customer = ed2.Customer
and ed.statecode = 0
AND ed.saledate BETWEEN ed2.saledate AND DATEADD(M,12,ed2.saledate)
INNER JOIN FilteredAccount comp
ON ed.customer = comp.accountid
WHERE
s.rk=1
summary you will be able to use only once. Alternate solution is store summary into temp table and use that as many times as u want.
Something like : Select * into #temp from Summary s where s.rk=1

Join subquery with min

I'm pulling my hair out over a subquery that I'm using to avoid about 100 duplicates (out of about 40k records). The records that are duplicated are showing up because they have 2 dates in h2.datecreated for a valid reason, so I can't just scrub the data.
I'm trying to get only the earliest date to return. The first subquery (that starts with "select distinct address_id", with the MIN) works fine on it's own...no duplicates are returned. So it would seem that the left join (or just plain join...I've tried that too) couldn't possibly see the second h2.datecreated, since it doesn't even show up in the subquery. But when I run the whole query, it's returning 2 values for some ipc.mfgid's, one with the h2.datecreated that I want, and the other one that I don't want.
I know it's got to be something really simple, or something that just isn't possible. It really seems like it should work! This is MSSQL. Thanks!
select distinct ipc.mfgid as IPC, h2.datecreated,
case when ad.Address is null
then ad.buildingname end as Address, cast(trace.name as varchar)
+ '-' + cast(trace.Number as varchar) as ONT,
c.ACCOUNT_Id,
case when h.datecreated is not null then h.datecreated
else h2.datecreated end as Install
from equipmentjoin as ipc
left join historyjoin as h on ipc.id = h.EQUIPMENT_Id
and h.type like 'add'
left join circuitjoin as c on ipc.ADDRESS_Id = c.ADDRESS_Id
and c.GRADE_Code like '%hpna%'
join (select distinct address_id, equipment_id,
min(datecreated) as datecreated, comment
from history where comment like 'MAC: 5%' group by equipment_id, address_id, comment)
as h2 on c.address_id = h2.address_id
left join (select car.id, infport.name, carport.number, car.PCIRCUITGROUP_Id
from circuit as car (NOLOCK)
join port as carport (NOLOCK) on car.id = carport.CIRCUIT_Id
and carport.name like 'lead%'
and car.GRADE_Id = 29
join circuit as inf (NOLOCK) on car.CCIRCUITGROUP_Id = inf.PCIRCUITGROUP_Id
join port as infport (NOLOCK) on inf.id = infport.CIRCUIT_Id
and infport.name like '%olt%' )
as trace on c.ccircuitgroup_id = trace.pcircuitgroup_id
join addressjoin as ad (NOLOCK) on ipc.address_id = ad.id
The typical approach to only getting the lowest row is one of the following. You didn't bother to specify what version of SQL Server you're using, what you want to do with ties, and I have little interest to try to work this into your complex query, so I'll show you an abstract simplification for different versions.
SQL Server 2000
SELECT x.grouping_column, x.min_column, x.other_columns ...
FROM dbo.foo AS x
INNER JOIN
(
SELECT grouping_column, min_column = MIN(min_column)
FROM dbo.foo GROUP BY grouping_column
) AS y
ON x.grouping_column = y.grouping_column
AND x.min_column = y.min_column;
SQL Server 2005+
;WITH x AS
(
SELECT grouping_column, min_column, other_columns,
rn = ROW_NUMBER() OVER (ORDER BY min_column)
FROM dbo.foo
)
SELECT grouping_column, min_column, other_columns
FROM x
WHERE rn = 1;
This subqery:
select distinct address_id, equipment_id,
min(datecreated) as datecreated, comment
from history where comment like 'MAC: 5%' group by equipment_id, address_id, comment
Probably will return multiple rows because the comment is not guaranteed to be the same.
Try this instead:
CROSS APPLY (
SELECT TOP 1 H2.DateCreated, H2.Comment -- H2.Equipment_id wasn't used
FROM History H2
WHERE
H2.Comment LIKE 'MAC: 5%'
AND C.Address_ID = H2.Address_ID
ORDER BY DateCreated
) H2
Switch that to OUTER APPLY in case you want rows that don't have a matching desired history entry.

TSQL - Return recent date

Having issues getting a dataset to return with one date per client in the query.
Requirements:
Must have the recent date of transaction per client list for user
Will need have the capability to run through EXEC
Current Query:
SELECT
c.client_uno
, c.client_code
, c.client_name
, c.open_date
into #AttyClnt
from hbm_client c
join hbm_persnl p on c.resp_empl_uno = p.empl_uno
where p.login = #login
and c.status_code = 'C'
select
ba.payr_client_uno as client_uno
, max(ba.tran_date) as tran_date
from blt_bill_amt ba
left outer join #AttyClnt ac on ba.payr_client_uno = ac.client_uno
where ba.tran_type IN ('RA', 'CR')
group by ba.payr_client_uno
Currently, this query will produce at least 1 row per client with a date, the problem is that there are some clients that will have between 2 and 10 dates associated with them bloating the return table to about 30,000 row instead of an idealistic 246 rows or less.
When i try doing max(tran_uno) to get the most recent transaction number, i get the same result, some have 1 value and others have multiple values.
The bigger picture has 4 other queries being performed doing other parts, i have only included the parts that pertain to the question.
Edit (2011-10-14 # 1:45PM):
select
ba.payr_client_uno as client_uno
, max(ba.row_uno) as row_uno
into #Bills
from blt_bill_amt ba
inner join hbm_matter m on ba.matter_uno = m.matter_uno
inner join hbm_client c on m.client_uno = c.client_uno
inner join hbm_persnl p on c.resp_empl_uno = p.empl_uno
where p.login = #login
and c.status_code = 'C'
and ba.tran_type in ('CR', 'RA')
group by ba.payr_client_uno
order by ba.payr_client_uno
--Obtain list of Transaction Date and Amount for the Transaction
select
b.client_uno
, ba.tran_date
, ba.tc_total_amt
from blt_bill_amt ba
inner join #Bills b on ba.row_uno = b.row_uno
Not quite sure what was going on but seems the Temp Tables were not acting right at all. Ideally i would have 246 rows of data, but with the previous query syntax it would produce from 400-5000 rows of data, obviously duplications on data.
I think you can use ranking to achieve what you want:
WITH ranked AS (
SELECT
client_uno = ba.payr_client_uno,
ba.tran_date,
be.tc_total_amt,
rnk = ROW_NUMBER() OVER (
PARTITION BY ba.payr_client_uno
ORDER BY ba.tran_uno DESC
)
FROM blt_bill_amt ba
INNER JOIN hbm_matter m ON ba.matter_uno = m.matter_uno
INNER JOIN hbm_client c ON m.client_uno = c.client_uno
INNER JOIN hbm_persnl p ON c.resp_empl_uno = p.empl_uno
WHERE p.login = #login
AND c.status_code = 'C'
AND ba.tran_type IN ('CR', 'RA')
)
SELECT
client_uno,
tran_date,
tc_total_amt
FROM ranked
WHERE rnk = 1
ORDER BY client_uno
Useful reading:
Ranking Functions (Transact-SQL)
ROW_NUMBER (Transact-SQL)
WITH common_table_expression (Transact-SQL)
Using Common Table Expressions

SQL Subquery Select Table based on Outer Query

I have a general query that looks like this:
SELECT DISTINCT pb.id, pb.last, pb.first, pb.middle, pb.sex, pb.phone, pb.type,
specialties = substring(
SELECT ('|' + cs.specialty )
FROM CertSpecialty AS cs
INNER JOIN CertSpecialtyIndex AS csi on cs.specialty = csi.specialty
WHERE cs.id = pb.id
ORDER BY cs.sequence_no
FOR XML path(''),2,500)
FROM table AS pb
WHERE etc etc etc
The issue is this:
The "type" column that I'm selecting is an integer - types 1-4.
In the subquery, see where I am querying from the table CertSpecialty right now.
What I actually need to do is, if the type field comes back as a 1 or a 3, that's the table I need to query. But if the row's result is a type 2 or 4 (i.e., an ELSE), I need to be querying the same column in the table CertSpecialtyOther.
So it'd need to look something like this (though this obv doesn't work):
SELECT DISTINCT pb.id, pb.last, pb.first, pb.middle, pb.sex, pb.phone, pb.type,
specialties =
IF type in (1,3)
substring((SELECT ('|' + cs.specialty )
FROM CertSpecialty AS cs
INNER JOIN CertSpecialtyIndex AS csi on cs.specialty = csi.specialty
WHERE cs.id = pb.id
ORDER BY cs.sequence_no
FOR XML path(''),2,500)
ELSE
substring((SELECT ('|' + cs.specialty )
FROM CertSpecialtyOther AS cs
INNER JOIN CertSpecialtyIndex AS csi on cs.specialty = csi.specialty
WHERE cs.id = pb.id
ORDER BY cs.sequence_no
FOR XML path(''),2,500)
end
FROM table AS pb
WHERE etc etc etc
Is this possible? If so, what is the correct syntax? Is there a simpler way to write it where I'm switching which table I query without completely duplicating the subquery?
Also, does anyone have a good resource they could link me for this sort of thing to learn more besides?
Thanks in advance.
Use a CTE.
;WITH cs AS
(
SELECT 'A' SpecialtyCategory, phy_key, specialty
FROM CertSpecialty
UNION ALL
SELECT 'B' SpecialtyCategory, phy_key, specialty
FROM CertSpecialtyOther
)
SELECT csi.id, cs.specialty
FROM cs
INNER JOIN CertSpecialtyIndex AS csi on cs.specialty = csi.specialty
WHERE cs.phy_key = pb.phy_key
AND cs.SpecialtyCategory = (CASE WHEN type in (1,3) THEN 'A' ELSE 'B' END)

Resources