T-SQL: Query to eliminate non-unique results - sql-server

I'm trying to select the create date for the most recent request # for each unique business unit. Because there are more unique create dates then there are business units, I get non-unique business units in my business unit column. I don't need all create dates, just the most recent one.
If you look at the CTE in v2, I want the most recent CreateDate for each value returned from the CTE.
Any assistance is appreciated.
Version 1:
SELECT
SQ.[Business Unit Impacted] [BU],
COUNT(RD.RequestID) [ReqCount],
(SELECT TOP 1 RD.CreateDate) [Create]
FROM
REP_RequestData RD
LEFT JOIN
REP_StandardQuestionResponses SQ ON SQ.RequestDataId = RD.Id
WHERE
RD.ProductID = 'Firewall.Change.Request'
GROUP BY
SQ.[Business Unit Impacted], RD.CreateDate
Version 2:
WITH D AS
(
SELECT
SQ.[Business Unit Impacted] [BU],
COUNT(RD.RequestID) [ReqCount]
FROM
REP_RequestData RD
LEFT JOIN
REP_StandardQuestionResponses SQ ON SQ.RequestDataId = RD.Id
WHERE
RD.ProductID = 'Firewall.Change.Request'
GROUP BY
SQ.[Business Unit Impacted]
)
SELECT
D.BU,
D.ReqCount,
(SELECT TOP 1 RD.CreateDate) [create]
FROM
D
LEFT JOIN
REP_StandardQuestionResponses SQ ON SQ.[Business Unit Impacted] = D.BU
LEFT JOIN
REP_REQUESTDATA RD on SQ.RequestDataId = RD.Id
WHERE
RD.ProductID = 'Firewall.Change.Request'
GROUP BY
D.BU, D.ReqCount, RD.CreateDate

if I understood correctly you just need the latest date so it's a matter of grouping only by BU and obtain the max(date):
SELECT
SQ.[Business Unit Impacted] [BU],
COUNT(RD.RequestID) [ReqCount],
MAX(RD.CreateDate) [Create]
FROM
REP_RequestData RD
LEFT JOIN
REP_StandardQuestionResponses SQ ON SQ.RequestDataId = RD.Id
WHERE
RD.ProductID = 'Firewall.Change.Request'
GROUP BY
SQ.[Business Unit Impacted]

In Version 2, you need to add a where clause inside of your subselect statement (select top 1 RD.CreateDate) to limit the return to the Business Unit returned on the same line along with an order by to get the latest CreateDate. It would look something like the code below:
(select top 1 RD.CreateDate where RD.BU = D.BU order by RD.CreateDate desc)
That should return the latest CreateDate based on the Business Unit.

Related

Optimise query with count and order by function

I have a problem with the optimization of this query, I have 3 tables (Products = Catalogo.GTIN, Sales Header = TEDEF.Factura and Sales Detail = TEDEF.Farmacia).
The query tries to find the Mode of the column VPRODEXENIGV_FAR. This query without the ORDER BY executes in less than 3 seconds (the table of details has about 30 million rows).
But when I add the ORDER BY clause, the query now takes more than 30 minutes to run.
I want to know how can I optimize this query or the indexes that I need to optimize this.
SELECT *
FROM Catalogo.GTIN G
CROSS APPLY
(SELECT TOP 1
COUNT(FAR.VPRODEXENIGV_FAR) [ROW],
YEAR(FAC2.VFECEMI_FAC) [AÑO],
MONTH(FAC2.VFECEMI_FAC) [MES],
FAR.VCODPROD_FAR_003,
CASE WHEN FAR.VPRODEXENIGV_FAR = 'A' THEN 1 ELSE 0 END AfectoIGV
FROM
TEDEF.Factura FAC2
INNER JOIN
TEDEF.Farmacia FAR ON FAC2.VTDOCPAGO_FAC = FAR.VTDOCPAGO_FAC
AND FAC2.VNDOCPAGO_FAC = FAR.VNDOCPAGO_FAC
WHERE
G.CODIGO = FAR.VCODPROD_FAR_003
GROUP BY
YEAR(FAC2.VFECEMI_FAC),
MONTH(FAC2.VFECEMI_FAC),
FAR.VCODPROD_FAR_003,
FAR.VPRODEXENIGV_FAR
ORDER BY
1 DESC --- <----- THE PROBLEM IS HERE
) GG
Ouch! You have a hugely expensive dependent subquery. It's expensive because SELECT TOP(n) ... ORDER BY col DESC does a whole lot of work to create a result set only to discard all but one row. And, it's a dependent subquery so it's run for every row of Catalogo.GTIN .
It looks like you want to count the resultset rows in the most recent month and year for each Catalogo.GTIN row. So, let's try to refactor your query to do that.
We'll start with a subquery to grab the month-start date of the latest Factura row for each catalog entry.
SELECT CODIGO,
DATEFROMPARTS(YEAR(maxd), MONTH(maxd),1) maxmes
FROM (
SELECT MAX(FAC2.VFECEMI_FAC) maxd,
G.CODIGO
FROM Catalogo.GTIN G
JOIN TDEF.Farmacia FAR
ON G.CODIGO = FAR.VCODPROD_FAR_003
JOIN TEDEF.Factura FAC2
ON FAC2.VTDOCPAGO_FAC = FAR.VTDOCPAGO_FAC
AND FAC2.VNDOCPAGO_FAC = FAR.VNDOCPAGO_FAC
GROUP BY G.CODIGO
) maxd
It's wise to test this and make sure it works correctly and performs tolerably well. If you test it in SSMS, you can use "Show Actual Execution Plan" and see if it recommends an extra index. This subquery need only be run once, rather than once per G.CODIGO row.
Then we'll use it in your larger query.
SELECT G.*,
COUNT(FAR.VPRODEXENIGV_FAR) [ROW],
YEAR(FAC2.VFECEMI_FAC) [AÑO],
MONTH(FAC2.VFECEMI_FAC) [MES],
FAR.VCODPROD_FAR_003,
CASE WHEN FAR.VPRODEXENIGV_FAR = 'A' THEN 1 ELSE 0 END AfectoIGV
FROM Catalogo.GTIN G
JOIN (
SELECT CODIGO,
DATEFROMPARTS(YEAR(maxd), MONTH(maxd),1) maxmes
FROM (
SELECT MAX(FAC2.VFECEMI_FAC) maxd,
G.CODIGO
FROM Catalogo.GTIN G
JOIN TDEF.Farmacia FAR
ON G.CODIGO = FAR.VCODPROD_FAR_003
JOIN TEDEF.Factura FAC2
ON FAC2.VTDOCPAGO_FAC = FAR.VTDOCPAGO_FAC
AND FAC2.VNDOCPAGO_FAC = FAR.VNDOCPAGO_FAC
GROUP BY G.CODIGO
) maxd
) maxmes ON G.CODIGO = maxmes.CODIGO
JOIN TEDEF.Farmacia FAR
ON G.CODIGO = FAR.VCODPROD_FAR_003
JOIN TEDEF.Factura FAC2
ON FAC2.VTDOCPAGO_FAC = FAR.VTDOCPAGO_FAC
AND FAC2.VNDOCPAGO_FAC = FAR.VNDOCPAGO_FAC
AND FAC2.VFECEMI_FAC >= maxmes.maxmes
GROUP BY maxmes.maxmes,
G.CODIGO,
FAR.VCODPROD_FAR_003,
FAR.VPRODEXENIGV_FAR
Here is the tricky bit:
DATEFROMPARTS(YEAR(maxd), MONTH(maxd),1) maxmes turns any date maxd into the first day of that month.
And, FAC2.VFECEMI_FAC >= maxmes.maxmes filters out rows before the first day of that month (for that CODIGO). It does so in a sargable way: a way that can exploit an index on FAC2.VFECEMI_FAC.
That is an alternative way to do TOP(1) ORDER BY d DESC. And faster.
It's all about sets of rows. Especially when using GROUP BY, it's performance-helpful to limit the number of rows in each set.
Obviously I cannot debug this.
Is me again, Finally i resolve the problem of the optimization, now the query delay is about 20 sec (with the sort instruction and with the count in a table over 30 million rows) i hope this way can help others or could be optimice more by the community.
I resolve the problem applying the sort but with the Row_Number instruction, in that way the server take my index for the sort instruction and make the magic:
WITH x
AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY GG.COD, GG.[AÑO], GG.[MES] ORDER BY GG.[ROW] DESC) [ID]
FROM Catalogo.GTIN G
CROSS APPLY
(
SELECT COUNT(FAR.VPRODEXENIGV_FAR) [ROW]
, YEAR(FAC2.VFECEMI_FAC) [AÑO]
, MONTH(FAC2.VFECEMI_FAC) [MES]
, FAR.VCODPROD_FAR_003 [COD]
, CASE WHEN FAR.VPRODEXENIGV_FAR = 'A' THEN 1 ELSE 0 END AfectoIGV
FROM TEDEF.Factura FAC2
INNER JOIN TEDEF.Farmacia FAR
ON FAC2.VTDOCPAGO_FAC = FAR.VTDOCPAGO_FAC
AND FAC2.VNDOCPAGO_FAC = FAR.VNDOCPAGO_FAC
WHERE G.CODIGO = FAR.VCODPROD_FAR_003
GROUP BY YEAR(FAC2.VFECEMI_FAC)
, MONTH(FAC2.VFECEMI_FAC)
, FAR.VCODPROD_FAR_003
, FAR.VPRODEXENIGV_FAR
-- ORDER BY 1 DESC --- <---- this is the bad guy, please, don't do that xD
) GG
) SELECT *
FROM x WHERE ID = 1
In that way i can sort the Count instruction and calculate the Mode for the Column FAR.VPRODEXENIGV_FAR

IN clause not working within subquery inner join

I am trying to pull a list of most recent lab values in 2015. All lab value are stored in one table and I need to both limit the data to be within 2015 and limit it to certain types of labs so the max date doesn't give me the most recent lab regardless of type. Although I use the IN clause, labs of other types are included. I need the last value regardless of what type of lab they have as long as it's within the types identified in the IN clause (i.e. I don't need the last value of each type)
select distinct
t2.pat_id
,t2.pat_last_name "PatientLast"
,t2.pat_first_name "PatFirst"
,t2.birth_date
,t1.contact_date "ContactDate"
,t3.name "EncounterType"
,t4.ord_num_value "Numeric Value"
,t4.result_date
from table1 t1
inner join table2 t2 on t1.pat_id = t2.pat_id
inner join table3 t3 on t1.enc_type_c = t3.disp_enc_type_c
inner join table4 t4 on t1.pat_enc_csn_id = t4.pat_enc_csn_id
inner join
(
select
table1.pat_id
,max(table1.contact_date) as LastResult
,table4.component_id
from table1
**inner join order_results on table1.pat_enc_csn_id = table4.pat_enc_csn_id
where table4.component_id in ('1526664','1558024','1004','2667', '1230000002','1564041')
and table1.contact_date between '2015-01-01' and '2015-12-31'
group by table1.pat_id, table4.component_id
) enc2** on table1.pat_id = enc2.pat_id
and table1.contact_date = enc2.LastResult
order by table2.pat_last_name, table2.pat_first_name
Your query is a bit hard to follow. But one method is to use row_number(). Something like this:
select t.*
from (select . . .,
row_number() over (partition by pat_id order by contact_date desc) as seqnum
from . . .
where . . .
) t
where seqnum = 1;
You have where conditions in the subquery that are not in the outer query, so it is hard to follow the intended logic. The use of row_number() is much simpler than a subquery, because you don't have to repeat any logic.

Using the results of WITH clause IN where STATEMENT of main query

I am relatively new at SQL so I apologise if this is obvious but I cannot work out how to use the results of the WITH clause query in the where statement of my main query.
My with query pulls the first record for each customer and gives the sale date for that record:
WITH summary AS(
SELECT ed2.customer,ed2.saledate,
ROW_NUMBER()OVER(PARTITION BY ed2.customer
ORDER BY ed2.saledate)AS rk
FROM Filteredxportdocument ed2)
SELECT s.*
FROM summary s
WHERE s.rk=1
I need to use the date in the above query as the starting point and pull all records for each customer for their first 12 months i.e. where the sale date is between ed2.saledate AND ed2.saledate+12 months.
My main query is:
SELECT ed.totalamountincvat, ed.saledate, ed.name AS SaleRef,
ed.customer, ed.customername, comp.numberofemployees,
comp.companyuid
FROM exportdocument AS ed INNER JOIN
FilteredAccount AS comp ON ed.customer = comp.accountid
WHERE (ed.statecode = 0) AND
ed.saledate BETWEEN ed2.saledate AND DATEADD(M,12,ed2.saledate)
I am sure that I need to add the main query into the WITH clause but I cant work out where. Is anyone able to help please
Does this help?
;WITH summary AS(
SELECT ed2.customer,ed2.saledate,
ROW_NUMBER()OVER(PARTITION BY ed2.customer
ORDER BY ed2.saledate)AS rk
FROM Filteredxportdocument ed2)
SELECT ed.totalamountincvat, ed.saledate, ed.name AS SaleRef,
ed.customer, ed.customername, comp.numberofemployees,
comp.companyuid
FROM exportdocument AS ed INNER JOIN
FilteredAccount AS comp ON ed.customer = comp.accountid
OUTER APPLY (SELECT s.* FROM summary s WHERE s.rk=1) ed2
WHERE ed.statecode = 0 AND
ed.saledate BETWEEN ed2.saledate AND DATEADD(M,12,ed2.saledate)
and ed.Customer = ed2.Customer
Results of CTE are not cached or stored, so you can't reuse it.
EDIT:
Based upon your requirement that all the records from CTE should be in final result, this is a new query:
;WITH summary AS(
SELECT ed2.customer,ed2.saledate,
ROW_NUMBER()OVER(PARTITION BY ed2.customer
ORDER BY ed2.saledate)AS rk
FROM Filteredxportdocument ed2)
SELECT
ed.totalamountincvat,
ed.saledate,
ed.name AS SaleRef,
ed.customer,
ed.customername,
comp.numberofemployees,
comp.companyuid
FROM
summary ed2
left join exportdocument ed
on ed.Customer = ed2.Customer
and ed.statecode = 0
AND ed.saledate BETWEEN ed2.saledate AND DATEADD(M,12,ed2.saledate)
INNER JOIN FilteredAccount comp
ON ed.customer = comp.accountid
WHERE
s.rk=1
summary you will be able to use only once. Alternate solution is store summary into temp table and use that as many times as u want.
Something like : Select * into #temp from Summary s where s.rk=1

Join the table valued function in the query

I have one table vwuser. I want join this table with the table valued function fnuserrank(userID). So I need to cross apply with table valued function:
SELECT *
FROM vwuser AS a
CROSS APPLY fnuserrank(a.userid)
For each userID it generates multiple records. I only want the last record for each empid that does not have a Rank of Term(inated). How can I do this?
Data:
HistoryID empid Rank MonitorDate
1 A1 E1 2012-8-9
2 A1 E2 2012-9-12
3 A1 Term 2012-10-13
4 A2 E3 2011-10-09
5 A2 TERM 2012-11-9
From this 2nd record and 4th record must be selected.
In SQL Server 2005+ you can use this Common Table Expression (CTE) to determine the latest record by MonitorDate that doesn't have a Rank of 'Term':
WITH EmployeeData AS
(
SELECT *
, ROW_NUMBER() OVER (PARTITION BY empId, ORDER BY MonitorDate DESC) AS RowNumber
FROM vwuser AS a
CROSS APPLY fnuserrank(a.userid)
WHERE Rank != 'Term'
)
SELECT *
FROM EmployeeData AS ed
WHERE ed.RowNumber = 1;
Note: The statement before this CTE will need to end in a semi-colon. Because of this, I have seen many people write them like ;WITH EmployeeData AS...
You'll have to play with this. Having trouble mocking your schema on sqlfiddle.
Select bar.*
from
(
SELECT *
FROM vwuser AS a
CROSS APPLY fnuserrank(a.userid)
where rank != 'TERM'
) foo
left join
(
SELECT *
FROM vwuser AS b
CROSS APPLY fnuserrank(b.userid)
where rank != 'TERM'
) bar
on foo.empId = bar.empId
and foo.MonitorDate > bar.MonitorDate
where bar.empid is null
I always need to test out left outers on dates being higher. The way it works is you do a left outer. Every row EXCEPT one per user has row(s) with a higher monitor date. That one row is the one you want. I usually use an example from my code, but i'm on the wrong laptop. to get it working you can select foo., bar. and look at the results and spot the row you want and make the condition correct.
You could also do this, which is easier to remember
SELECT *
FROM vwuser AS a
CROSS APPLY fnuserrank(a.userid)
) foo
join
(
select empid, max(monitordate) maxdate
FROM vwuser AS b
CROSS APPLY fnuserrank(b.userid)
where rank != 'TERM'
) bar
on foo.empid = bar.empid
and foo.monitordate = bar.maxdate
I usually prefer to use set based logic over aggregate functions, but whatever works. You can tweak it also by caching the results of your TVF join into a table variable.
EDIT:
http://www.sqlfiddle.com/#!3/613e4/17 - I mocked up your TVF here. Apparently sqlfiddle didn't like "go".
select foo.*, bar.*
from
(
SELECT f.*
FROM vwuser AS a
join fnuserrank f
on a.empid = f.empid
where rank != 'TERM'
) foo
left join
(
SELECT f1.empid [barempid], f1.monitordate [barmonitordate]
FROM vwuser AS b
join fnuserrank f1
on b.empid = f1.empid
where rank != 'TERM'
) bar
on foo.empId = bar.barempid
and foo.MonitorDate > bar.barmonitordate
where bar.barempid is null

How to Get Aggregate Result Without Changing Number of Results SQL

This is a query that I've been using:
select serial_number, DAQ.qtag_no, min(TOH.start_time), order_number, DAQ.creation_time, qtag_status, ar_code, pro_foundin, pro_category, root_cause, remark from UNIT U
left JOIN WORK_ORDER WO ON WO.order_key = U.order_key
left join TRACKED_OBJECT_HISTORY TOH on TOH.tobj_key = U.unit_key
WHERE U.creation_time > '01/01/2012' AND U.creation_time < '07/20/2012'
AND order_number NOT LIKE '[R]%'
group by serial_number, qtag_no, order_number, DAQ.creation_time, qtag_status, ar_code, pro_foundin, pro_category, root_cause, remark
order by serial_number
Right now I get 3280 results.
In this setup, there are different stations, such as "Assembly," "Diagnostics," etc. My goal for the min(TOH.start_time) column is to return the first start time at the Assembly station, but currently it's returning the first start time at ANY station. However, if I add another WHERE clause to specify the station (TOH.op_name = 'Assembly'), it limits the number of results (down to 2700). I'd like to keep the 3280 results and instead for units not scanned in to the Assembly station, return NULL for min(TOH.start_time) column. I tried using the case function, but that requires me to include TOH.op_name in the group by clause, which I'm not looking for. Thanks!
You should just be able to add your condition to the left join I think, like this:
select serial_number,
DAQ.qtag_no,
min(TOH.start_time),
order_number,
DAQ.creation_time,
qtag_status,
ar_code,
pro_foundin,
pro_category,
root_cause,
remark
from UNIT U
left
JOIN WORK_ORDER WO
ON WO.order_key = U.order_key
left
join TRACKED_OBJECT_HISTORY TOH
on TOH.tobj_key = U.unit_key
and TOH.op_name = 'Assembly'
WHERE U.creation_time > '01/01/2012' AND U.creation_time < '07/20/2012'
AND order_number NOT LIKE '[R]%'
group by serial_number, qtag_no, order_number, DAQ.creation_time, qtag_status, ar_code, pro_foundin, pro_category, root_cause, remark
order by serial_number

Resources