We have many complex queries which involve a lot of columns and joins (see the example below) that are implemented as views.
In some cases these queries return duplicate rows which then have to be programmatically removed by the consuming app. Therefore, we would like to enhance the SQL query to eliminate the duplicates, and speed up the retrieval process.
I know that I can use OVER / PARTITION BY logic to do this, but I am not sure of how to modify the queries to obtain a working syntax.
Here is an example:
SELECT
Main.MfgOrder.OrderNumber,
Main.MfgOrder.DesignBOMID,
Main.Design_Plant.PlantID,
Main.MfgOrder_Operation.OrderOpID,
Main.MfgOrder_Operation.DesignOpID,
Main.MfgOrder_Operation.OpSeq,
Main.MfgOrder_Operation.Description,
Main.MfgOrder_Operation.CompletionStatus,
Main.MfgOrder__Shift.OrderShiftID,
Main.MfgOrder__Shift.WorkCenterMachineID,
Main.MfgOrder___Event.OrderEventID,
Main.MfgOrder____Reel.OrderReelID,
Main.MfgOrder____Reel.ReelNumber,
Main.MfgOrder____Reel.Location,
Main.MfgOrder____Reel.Test_Status AS Test_Status_Reel,
Main.MfgOrder____Reel.Test_Disposition AS Test_Disposition_Reel,
Main.MfgOrder____Reel.LabReleased,
Main.MfgOrder____Reel.ShipReelsBypassSet,
Main.MfgOrder_____Length.OrderLengthID,
Main.MfgOrder_____Length.LengthType,
Main.MfgOrder_____Length.LocationOnReel,
Main.MfgOrder_____Length.LocationOnLength,
Main.MfgOrder_____Length.TrialNumber,
Main.MfgOrder_____Length.SampleNumber,
Main.MfgOrder_____Length.PrintNumber,
Main.MfgOrder_____Length.Test_Status AS Test_Status_Length,
Main.MfgOrder_____Length.Test_Category,
Main.MfgOrder_____Length.Test_Disposition AS Test_Disposition_Length,
Main.MfgOrder_____Length.SampleSubmittedBy,
Main.MfgOrder_____Length.SampleSubmittedDate,
Main.MfgOrder_____Length.BypassTesting,
Main.MfgOrder_____Length_OperatorQty.Sample1Destination,
Main.MfgOrder_____Length_OperatorQty.Sample2Destination,
Main.MfgOrder_____Length_OperatorQty.Sample3Destination,
Main.MfgOrder______Component.OrderComponentID,
Main.MfgOrder______Component.DesignComponentID,
Main.MfgOrder______Component.ItemNo,
Main.MfgOrder_______Test.LabTestID,
Main.MfgOrder_______Test.OrderTestID,
Main.MfgOrder_______Test.TestComplete,
Main.MfgOrder_______Test.TestStatus,
Main.MfgOrder________Marker2.OrderMarkerID,
Master.Color.ColorName,
Master.LabTest.ExcludeFromPassFail,
CASE
WHEN Main.Design_Component.Component_Label IS NULL
THEN 'Unknown'
ELSE Main.Design_Component.Component_Label
END AS Component_Label
FROM
Main.MfgOrder
INNER JOIN
Main.Design__BOM ON Main.MfgOrder.DesignBOMID = Main.Design__BOM.DesignBOMID
INNER JOIN
Main.Design_Plant ON Main.Design__BOM.DesignPlantID = Main.Design_Plant.DesignPlantID
INNER JOIN
Main.MfgOrder_Operation ON Main.MfgOrder.OrderNumber = Main.MfgOrder_Operation.OrderNumber
INNER JOIN
Main.MfgOrder__Shift ON Main.MfgOrder_Operation.OrderOpID = Main.MfgOrder__Shift.OrderOpID
INNER JOIN
Main.MfgOrder___Event ON Main.MfgOrder__Shift.OrderShiftID = Main.MfgOrder___Event.OrderShiftID
INNER JOIN
Main.MfgOrder____Reel ON Main.MfgOrder___Event.OrderEventID = Main.MfgOrder____Reel.OrderEventID
INNER JOIN
Main.MfgOrder_____Length ON Main.MfgOrder____Reel.OrderReelID = Main.MfgOrder_____Length.OrderReelID
LEFT OUTER JOIN
Main.MfgOrder______Component ON Main.MfgOrder_____Length.OrderLengthID = Main.MfgOrder______Component.OrderLengthID
LEFT OUTER JOIN
Main.MfgOrder_______Test ON Main.MfgOrder______Component.OrderComponentID = Main.MfgOrder_______Test.OrderComponentID
LEFT OUTER JOIN
Main.MfgOrder________Marker2 ON Main.MfgOrder_______Test.OrderTestID = Main.MfgOrder________Marker2.OrderTestID
LEFT OUTER JOIN
Main.Design_Component ON Main.MfgOrder______Component.DesignComponentID = Main.Design_Component.DesignComponentID
LEFT OUTER JOIN
Master.Color ON Main.MfgOrder______Component.TapeColorID = Master.Color.ColorNumber
LEFT OUTER JOIN
Master.LabTest ON Main.MfgOrder_______Test.LabTestID = Master.LabTest.LabTestID
LEFT OUTER JOIN
Main.MfgOrder_____Length_OperatorQty ON Main.MfgOrder______Component.OrderLengthID = Main.MfgOrder_____Length_OperatorQty.OrderLengthID
you can use row_number as below: Below query will not select duplicate only on OrderNumber, if you need to add other columns you add accordingly
Select * from (
Select
RowN = Row_Number() over( partition by Main.MfgOrder.OrderNumber order by Main.MfgOrder.OrderNumber),
--- All your select columns and all your query with joins
) a
Where a.RowN = 1
Is the entire row duplicated exactly? If so then just add DISTINCT
SELECT DISTINCT
...
FROM
...
If you are getting duplicate rows where most columns the same but some columns are different then GROUP BY the columns that are the same and select MIN(column_name) for the ones that are causing the extra rows to appear.
Related
My tables:
Order is:
PurchaseOrderHead
PurchaseOrder
ReceivingNoteHead
ReceivingNote
I want the output like this
MaterialID, PO.Quantity, RN.Quantity so far
There can be multiple receiving notes for a given purchaseorderhead_id as every ReceivingNoteHead will have a PurchaseOrderHeadID.
My attempt:
select
PurchaseOrder.MaterialID,
sum(distinct PurchaseOrder.Quantity) as "Sum_Quantity",
sum(ReceivingNote.Quantity) as "ReceivingNote_Quantity",
PurchaseOrderHead.id
from
(((dbo.PurchaseOrder
inner join
dbo.PurchaseOrderHead on (PurchaseOrderHead.id = PurchaseOrder.PurchaseOrderHeadID))
left outer join
dbo.ReceivingNoteHead ReceivingNoteHead (ReceivingNoteHead.PurchaseOrderHeadID = PurchaseOrderHead.id))
left outer join
dbo.ReceivingNote on (ReceivingNote.ReceivingNoteHeadID = ReceivingNoteHead.id))
group by
PurchaseOrder.MaterialID,
PurchaseOrderHead.id
having
(PurchaseOrderHead.id = 1004)
But ReceivingNote Quantities are repeated when there's no ReceivingNote MaterialID that matches PurchaseOrder's MaterialID.
This also does not work when theres multiple same MaterialID in either PurchaseOrder or ReceivingNote
I would like to learn whether I need to break the ReceivingNote table into 2 tables because of PurchaseOrderHeadID? And I want to get rid of the sum distinct because it's not the way I want it to be.
Maybe by first aggregating the material purchases in a sub-query.
Then left join that to the materials on the receiving end.
Untested notepad scribble:
SELECT
poMat.MaterialID,
poMat.TotQuantity AS [PurchaseOrder_Quantity],
SUM(rn.Quantity) AS [ReceivingNote_Quantity],
poMat.PurchaseOrderHeadID
FROM
(
SELECT
po.PurchaseOrderHeadID,
po.MaterialID,
SUM(po.Quantity) AS TotQuantity
FROM dbo.PurchaseOrder po
-- Uncomment to filter on the PurchaseOrderHeadID
-- WHERE po.PurchaseOrderHeadID = 1004
GROUP BY
po.PurchaseOrderHeadID,
po.MaterialID
) poMat
LEFT JOIN dbo.ReceivingNoteHead rnH
ON rnH.PurchaseOrderHeadID = poMat.PurchaseOrderHeadID
LEFT JOIN dbo.ReceivingNote rn
ON rn.ReceivingNoteHeadID = rnH.id
AND rn.MaterialID = poMat.MaterialID
GROUP BY
poMat.PurchaseOrderHeadID,
poMat.MaterialID,
poMat.TotQuantity
ORDER BY
poMat.PurchaseOrderHeadID,
poMat.MaterialID;
This however, won't show received materials that don't have a matching purchased material.
You are getting duplicate because the table ReceivingNoteHead does not have the PurchaseOrder.ID in it. Add the column PurchaseOrderID in ReceivingNoteHead and you should be good to go
select
PurchaseOrder.MaterialID,
sum(PurchaseOrder.Quantity) as "Sum_Quantity",
sum(ReceivingNote.Quantity) as "ReceivingNote_Quantity",
PurchaseOrderHead.id
from
dbo.PurchaseOrder
inner join
dbo.PurchaseOrderHead on PurchaseOrderHead.id = PurchaseOrder.PurchaseOrderHeadID
left outer join
dbo.ReceivingNoteHead ReceivingNoteHead ReceivingNoteHead.PurchaseOrderHeadID = PurchaseOrderHead.id *and ReceivingNoteHead.PurchaseOrderID=PurchaseOrder.ID*
left outer join
dbo.ReceivingNote on ReceivingNote.ReceivingNoteHeadID = ReceivingNoteHead.id
group by
PurchaseOrder.MaterialID,
PurchaseOrderHead.id
having
PurchaseOrderHead.id = 1004
This is a stored procedure to select the full details of a wine.
It takes as a parameter a storage location ID.
This builds properly and is similar in structure to many sp's I have written that work,
however it returns no data. Is there something I am missing?
CREATE PROCEDURE [sp_retrieveInventory_Full]
(
#StorageLocationID [int]
)
AS
BEGIN
SELECT [WineName],
[RackID],
[RackColumn],
[RackRow],
[WineTypeID],
[WineVarietyID],
[VintnerName],
[dbo].[Vintner].[Country],
[dbo].[Vintner].[StateProvince],
[dbo].[Vintner].[Region],
[VendorName],
[Vintage],
[PurchaseDate],
[PurchasePrice],
[BottleSizeID],
[ABV],
[DrinkByDate],
[FoodType],
[TastingNotes],
[RatedBy],
[RatingScore]
FROM [dbo].[StorageLocation]
INNER JOIN [dbo].[Wine]
ON [dbo].[StorageLocation].[WineID] = [dbo].[Wine].[WineID]
INNER JOIN [dbo].[Vintner]
ON [dbo].[Vintner].[VintnerID] = [dbo].[Wine].[VintnerID]
INNER JOIN [dbo].[Vendor]
ON [dbo].[Vendor].[VendorID] = [dbo].[Wine].[VendorID]
INNER JOIN [dbo].[Pairing]
ON [dbo].[Pairing].[PairingID] = [dbo].[Wine].[PairingID]
INNER JOIN [dbo].[Rating]
ON [dbo].[Rating].[RatingID] = [dbo].[Wine].[RatingID]
WHERE [dbo].[StorageLocation].[StorageLocationID] = #storageLocationID
END
GO
When a query reruns no result it's because no records exists that satisfy the conditions in it's where clause, or, in case of queries that contains inner joins, at least one of the joins on condition returns no records.
The easiest way to figure out what join is messing up your query is to start removing the joins one by one, but to do that, you have to first change the select clause.
What I like to do is this: Start with all the joins except the last one - see if it returns any records:
SELECT 1
FROM [dbo].[StorageLocation]
INNER JOIN [dbo].[Wine]
ON [dbo].[StorageLocation].[WineID] = [dbo].[Wine].[WineID]
INNER JOIN [dbo].[Vintner]
ON [dbo].[Vintner].[VintnerID] = [dbo].[Wine].[VintnerID]
INNER JOIN [dbo].[Vendor]
ON [dbo].[Vendor].[VendorID] = [dbo].[Wine].[VendorID]
INNER JOIN [dbo].[Pairing]
ON [dbo].[Pairing].[PairingID] = [dbo].[Wine].[PairingID]
-- INNER JOIN [dbo].[Rating]
-- ON [dbo].[Rating].[RatingID] = [dbo].[Wine].[RatingID]
WHERE [dbo].[StorageLocation].[StorageLocationID] = #storageLocationID
If not, comment out the next join and run the query again. Rinse and repeat.
Please note, however, that there might be more than one inner join that prevents the query from returning records.
I am new to MS SQL and am having trouble joining 4 tables within a query.
I am trying to join Orders, Order Lines, Client, and Picked tables to create a query to show quantity ordered and picked for a client. If I comment out the last inner join for Picked, I get the correct results. When I include the inner join for Picked the query returns results but data that should be in the Picked fields is NULL. One order line can have 1 or more Picked lines.
SELECT W_Warehouse, OH.OrderID, OH.RequiredDate, C.Client, OL.LineNbr, OL.QtyOrd, P.QtyPick
FROM Order
INNER JOIN Warehouse on Order.OH_WHS = Warehouse.W_PK
INNER JOIN Client on Order.O_Client = Client.C_PK
INNER JOIN OrderLine on Order.O_PK = OrderLine.OL_PK
INNER JOIN Picked on OrderLine.O_PK = Picked.P_PK
WHERE C.CLIENT = 'WENDYS'
Without knowing the data in the tables it is difficult to answer precisely.
But as you say you have 1+ rows in the Picked table, you probably want to do aggregation with GROUP BY and SUM()
Maybe this is what you're looking for:
SELECT
W.W_Warehouse,
OH.OrderID,
OH.RequiredDate,
C.Client,
OL.LineNbr,
OL.QtyOrd,
P.QtyPick
FROM
Order OH
INNER JOIN Warehouse W on OH.OH_WHS = W.W_PK
INNER JOIN Client C on OH.O_Client = C.C_PK
INNER JOIN OrderLine OL on OH.O_PK = OL.OL_PK
CROSS APPLY (
select sum(QtyPick) as QtyPick
from Picked P
where OL.O_PK = P.P_PK
) P
WHERE
C.CLIENT = 'WENDYS'
It calculates the sum of QtyPick separately so it doesn't increase the number of lines in the result.
I have a complex query to retrieve some results:
EDITED QUERY (added the UNION ALL):
SELECT t.*
FROM (
SELECT
dbo.Intervencao.INT_Processo, analista,
ETS.ETS_Sigla, ATC.ATC_Sigla, PAT.PAT_Sigla, dbo.Assunto.SNT_Peso,
CASE
WHEN ETS.ETS_Sigla = 'PE' AND (PAT.PAT_Sigla = 'LIB' OR PAT.PAT_Sigla = 'LBR') THEN (0.3*SNT_Peso)
WHEN ETS.ETS_Sigla = 'CD' THEN (0.3*SNT_Peso)*0.3
ELSE SNT_Peso
END AS PESOAREA,
CASE
WHEN a.max_TEA_FimTarefa IS NULL THEN a.max_TEA_InicioTarefa
ELSE a.max_TEA_FimTarefa
END AS DATA_INICIO_TERMINO,
ROW_NUMBER() OVER (PARTITION BY ATC.ATC_Sigla, a.SRV_Id ORDER BY TEA_FimTarefa DESC) AS seqnum
FROM dbo.Tarefa AS t
INNER JOIN (
SELECT
MAX(dbo.TarefaEtapaAreaTecnica.TEA_InicioTarefa) AS max_TEA_InicioTarefa,
MAX (dbo.TarefaEtapaAreaTecnica.TEA_FimTarefa) AS max_TEA_FimTarefa,
dbo.Pessoa.PFJ_Descri AS analista, dbo.AreaTecnica.ATC_Id, dbo.Tarefa.SRV_Id
FROM dbo.TarefaEtapaAreaTecnica
LEFT JOIN dbo.Tarefa ON dbo.TarefaEtapaAreaTecnica.TRF_Id = dbo.Tarefa.TRF_Id
LEFT JOIN dbo.AreaTecnica ON dbo.TarefaEtapaAreaTecnica.ATC_Id = dbo.AreaTecnica.ATC_Id
LEFT JOIN dbo.ServicoAreaTecnica ON dbo.TarefaEtapaAreaTecnica.ATC_Id = dbo.ServicoAreaTecnica.ATC_Id
AND dbo.Tarefa.SRV_Id = dbo.ServicoAreaTecnica.SRV_Id
INNER JOIN dbo.Pessoa ON dbo.Pessoa.PFJ_Id = dbo.ServicoAreaTecnica.PFJ_Id_Analista
GROUP BY dbo.AreaTecnica.ATC_Id, dbo.Tarefa.SRV_Id, dbo.Pessoa.PFJ_Descri
) AS a ON t.SRV_Id = a.SRV_Id
INNER JOIN dbo.TarefaEtapaAreaTecnica AS TarefaEtapaAreaTecnica_1 ON
t.TRF_Id = TarefaEtapaAreaTecnica_1.TRF_Id
AND a.ATC_Id = TarefaEtapaAreaTecnica_1.ATC_Id
AND a.max_TEA_InicioTarefa = TarefaEtapaAreaTecnica_1.TEA_InicioTarefa
LEFT JOIN AreaTecnica ATC ON TarefaEtapaAreaTecnica_1.ATC_Id = ATC.ATC_Id
LEFT JOIN Etapa ETS ON TarefaEtapaAreaTecnica_1.ETS_Id = ETS.ETS_Id
LEFT JOIN ParecerTipo PAT ON TarefaEtapaAreaTecnica_1.PAT_Id = PAT.PAT_Id
LEFT OUTER JOIN dbo.Servico ON a.SRV_Id = dbo.Servico.SRV_Id
INNER JOIN dbo.Intervencao ON dbo.Servico.INT_Id = dbo.Intervencao.INT_Id
LEFT JOIN dbo.Assunto ON dbo.Servico.SNT_Id = dbo.Assunto.SNT_Id
) t
The result is following:
It works good, the problem is that I was asked that if when a row is not present on this query, it must contain values from another table (ServicoAreaTecnica), so I got this query for the other table based on crucial information of the first query. So if I UNION ALL I get this:
Query1 +
UNION ALL
SELECT INN.INT_Processo,
PES.PFJ_Descri,
NULL, --ETS.ETS_Sigla,
ART.ATC_Sigla,
NULL ,--PAT.PAT_Sigla,
ASS.SNT_Peso,
NULL, --PESOAREA
NULL, --DATA_INICIO_TERMINO
NULL --seqnum
FROM dbo.ServicoAreaTecnica AS SAT
INNER JOIN dbo.AreaTecnica AS ART ON ART.ATC_Id = SAT.ATC_Id
INNER JOIN dbo.Servico AS SER ON SER.SRV_Id = SAT.SRV_Id
INNER JOIN dbo.Assunto AS ASS ON ASS.SNT_Id = SER.SNT_Id
INNER JOIN dbo.Intervencao AS INN ON INN.INT_Id = SER.INT_Id
INNER JOIN dbo.Pessoa AS PES ON PES.PFJ_Id = SAT.PFJ_Id_Analista
The result is following:
So what I want to do is to remove row number 1 because row number 2 exists on the first query, I think I got it explained better this time. The result should be only row number 1, row number 2 would appear only if query 1 doesn't retrieve a row for that particular INN.INT_Processo.
Thanks!
Ok, there are two ways to reduce your record set. Given that you've already written the code to produce the table with the extra rows, it might be easiest to just add code to reduce that:
Select * from
(Select *
, Row_Number() over
(partition by IntProcesso, Analista order by ISNULL(seqnum, 0) desc) as RN
from MyResults) a
where RN = 1
This will assign row_number 1 to any rows that came from your first query, or to any rows from the second query that do not have matches in the first query, then filter out extra rows.
You could also use outer joins with isnull or coalesce, as others have suggested. Something like this:
Select ISNULL(a.IntProcesso, b.IntProcesso) as IntProcesso
, ISNULL(a.Analista, b.Analista) as Analista
, ISNULL(a.ETSsigla, b.ETSsigla) as ETSsigla
[repeat for the rest of your columns]
from Table1 a
full outer join Table2 b
on a.IntProcesso = b.IntProcesso and a.Analista = b.Analista
Your code is hard to read, because of the lengthy names of everything (and to be honest, the fact that they're in a language I don't speak also makes it a lot harder).
But how about: replacing your INNER JOINs with LEFT JOINs, adding more LEFT JOINs to draw in the alternative tables, and introducing ISNULL clauses for each variable you want in the results?
If you do something like ... Query1 Right Join Query2 On ... that should get only the rows in Query2 that don't appear in Query 1.
I have the following SQL server query:
SELECT G.StyleID,
G.Size, Count(*) As 'Count',
G.PropertyID FROM TBL_Garment G
LEFT OUTER JOIN TBL_EmployeeJob ON TBL_EmployeeJob.ID = G.EmployeeJobID
LEFT OUTER JOIN TBL_DepartmentJob ON TBL_DepartmentJob.ID = TBL_EmployeeJob.DepartmentJobID
LEFT OUTER JOIN TBL_Department ON TBL_Department.ID = TBL_DepartmentJob.DepartmentID
LEFT OUTER JOIN TBL_Division ON TBL_Division.ID = TBL_Department.DivisionID
LEFT OUTER JOIN TBL_Job ON TBL_Job.ID = TBL_DepartmentJob.JobID
WHERE G.EmployeejobiD IS NOT NULL
AND TBL_Division.ID = N'1'
AND TBL_Department.ID = N'1'
AND TBL_Job.ID = N'1'
AND G.PropertyID = 1
GROUP BY G.StyleID, G.Size, G.PropertyID
ORDER BY G.StyleID
and here are the results returned by this query:
Now I need 2 extra columns in this table:
One is the sum of the count by StyleID (As Total), and the other is Count/Total.
I am sure I can get the count/Total on my own, but do not know how to get the Total column, or even if it is possible.
Below is a version of how I would like the table to be:
You can use a windowing function:
sum([Count]) over (partition by [StyleID])
Make sure it's good enough for you performance-wise, though.