Many questions/solutions I've read here describe how to use STRING_AGG by itself and I can get the following to work:
SELECT Offers.Id, STRING_AGG ( Offers2Currencies.CurrencyCode, ', ' ) AS Currencies
FROM Offers INNER JOIN Offers2Currencies ON Offers2Currencies.OfferID =
dbo.Offers.ID
WHERE dbo.Offers.BuyerMemberId = '64ad10b9-85a6-4fc4-b9eb-d9f9af164d2b'
GROUP BY dbo.Offers.Id
But I am struggling with how to put that inside a larger query such as:
SELECT
dbo.Offers.ID,
dbo.Offers.UTC,
dbo.Organizations.Code,
dbo.Entities.EntityAbbrev,
dbo.Measurables.Name,
dbo.Offers.Price,
dbo.Offers.SellerMemberId,
dbo.AspNetUsers.UserName
--select STRING_AGG(dbo.Offers2Currencies.CurrencyCode, ', ') Currencies
FROM dbo.Offers
INNER JOIN dbo.AspNetUsers ON dbo.AspNetUsers.Id = dbo.Offers.SellerMemberId
INNER JOIN dbo.MemberCreditRatings ON dbo.AspNetUsers.Id = dbo.MemberCreditRatings.MemberGUID
INNER JOIN dbo.Measurables ON dbo.Offers.MeasurableID = dbo.Measurables.ID
INNER JOIN dbo.Entities ON dbo.Offers.EntityID = dbo.Entities.ID
INNER JOIN dbo.Organizations ON dbo.Measurables.OrganizationID = dbo.Organizations.ID
--INNER JOIN dbo.Offers2Currencies ON dbo.Offers2Currencies.OfferID = dbo.Offers.ID
AND dbo.Entities.OrganizationID = dbo.Organizations.ID
WHERE dbo.Offers.BuyerMemberId = '64ad10b9-85a6-4fc4-b9eb-d9f9af164d2b'
I (think) I figured it out -- put all fields being selected into the Group By (though they never get accessed beyond the first group by) -- see http://sqlfiddle.com/#!18/aaa65/7 (and advise if better or alternative solution)
Related
We have many complex queries which involve a lot of columns and joins (see the example below) that are implemented as views.
In some cases these queries return duplicate rows which then have to be programmatically removed by the consuming app. Therefore, we would like to enhance the SQL query to eliminate the duplicates, and speed up the retrieval process.
I know that I can use OVER / PARTITION BY logic to do this, but I am not sure of how to modify the queries to obtain a working syntax.
Here is an example:
SELECT
Main.MfgOrder.OrderNumber,
Main.MfgOrder.DesignBOMID,
Main.Design_Plant.PlantID,
Main.MfgOrder_Operation.OrderOpID,
Main.MfgOrder_Operation.DesignOpID,
Main.MfgOrder_Operation.OpSeq,
Main.MfgOrder_Operation.Description,
Main.MfgOrder_Operation.CompletionStatus,
Main.MfgOrder__Shift.OrderShiftID,
Main.MfgOrder__Shift.WorkCenterMachineID,
Main.MfgOrder___Event.OrderEventID,
Main.MfgOrder____Reel.OrderReelID,
Main.MfgOrder____Reel.ReelNumber,
Main.MfgOrder____Reel.Location,
Main.MfgOrder____Reel.Test_Status AS Test_Status_Reel,
Main.MfgOrder____Reel.Test_Disposition AS Test_Disposition_Reel,
Main.MfgOrder____Reel.LabReleased,
Main.MfgOrder____Reel.ShipReelsBypassSet,
Main.MfgOrder_____Length.OrderLengthID,
Main.MfgOrder_____Length.LengthType,
Main.MfgOrder_____Length.LocationOnReel,
Main.MfgOrder_____Length.LocationOnLength,
Main.MfgOrder_____Length.TrialNumber,
Main.MfgOrder_____Length.SampleNumber,
Main.MfgOrder_____Length.PrintNumber,
Main.MfgOrder_____Length.Test_Status AS Test_Status_Length,
Main.MfgOrder_____Length.Test_Category,
Main.MfgOrder_____Length.Test_Disposition AS Test_Disposition_Length,
Main.MfgOrder_____Length.SampleSubmittedBy,
Main.MfgOrder_____Length.SampleSubmittedDate,
Main.MfgOrder_____Length.BypassTesting,
Main.MfgOrder_____Length_OperatorQty.Sample1Destination,
Main.MfgOrder_____Length_OperatorQty.Sample2Destination,
Main.MfgOrder_____Length_OperatorQty.Sample3Destination,
Main.MfgOrder______Component.OrderComponentID,
Main.MfgOrder______Component.DesignComponentID,
Main.MfgOrder______Component.ItemNo,
Main.MfgOrder_______Test.LabTestID,
Main.MfgOrder_______Test.OrderTestID,
Main.MfgOrder_______Test.TestComplete,
Main.MfgOrder_______Test.TestStatus,
Main.MfgOrder________Marker2.OrderMarkerID,
Master.Color.ColorName,
Master.LabTest.ExcludeFromPassFail,
CASE
WHEN Main.Design_Component.Component_Label IS NULL
THEN 'Unknown'
ELSE Main.Design_Component.Component_Label
END AS Component_Label
FROM
Main.MfgOrder
INNER JOIN
Main.Design__BOM ON Main.MfgOrder.DesignBOMID = Main.Design__BOM.DesignBOMID
INNER JOIN
Main.Design_Plant ON Main.Design__BOM.DesignPlantID = Main.Design_Plant.DesignPlantID
INNER JOIN
Main.MfgOrder_Operation ON Main.MfgOrder.OrderNumber = Main.MfgOrder_Operation.OrderNumber
INNER JOIN
Main.MfgOrder__Shift ON Main.MfgOrder_Operation.OrderOpID = Main.MfgOrder__Shift.OrderOpID
INNER JOIN
Main.MfgOrder___Event ON Main.MfgOrder__Shift.OrderShiftID = Main.MfgOrder___Event.OrderShiftID
INNER JOIN
Main.MfgOrder____Reel ON Main.MfgOrder___Event.OrderEventID = Main.MfgOrder____Reel.OrderEventID
INNER JOIN
Main.MfgOrder_____Length ON Main.MfgOrder____Reel.OrderReelID = Main.MfgOrder_____Length.OrderReelID
LEFT OUTER JOIN
Main.MfgOrder______Component ON Main.MfgOrder_____Length.OrderLengthID = Main.MfgOrder______Component.OrderLengthID
LEFT OUTER JOIN
Main.MfgOrder_______Test ON Main.MfgOrder______Component.OrderComponentID = Main.MfgOrder_______Test.OrderComponentID
LEFT OUTER JOIN
Main.MfgOrder________Marker2 ON Main.MfgOrder_______Test.OrderTestID = Main.MfgOrder________Marker2.OrderTestID
LEFT OUTER JOIN
Main.Design_Component ON Main.MfgOrder______Component.DesignComponentID = Main.Design_Component.DesignComponentID
LEFT OUTER JOIN
Master.Color ON Main.MfgOrder______Component.TapeColorID = Master.Color.ColorNumber
LEFT OUTER JOIN
Master.LabTest ON Main.MfgOrder_______Test.LabTestID = Master.LabTest.LabTestID
LEFT OUTER JOIN
Main.MfgOrder_____Length_OperatorQty ON Main.MfgOrder______Component.OrderLengthID = Main.MfgOrder_____Length_OperatorQty.OrderLengthID
you can use row_number as below: Below query will not select duplicate only on OrderNumber, if you need to add other columns you add accordingly
Select * from (
Select
RowN = Row_Number() over( partition by Main.MfgOrder.OrderNumber order by Main.MfgOrder.OrderNumber),
--- All your select columns and all your query with joins
) a
Where a.RowN = 1
Is the entire row duplicated exactly? If so then just add DISTINCT
SELECT DISTINCT
...
FROM
...
If you are getting duplicate rows where most columns the same but some columns are different then GROUP BY the columns that are the same and select MIN(column_name) for the ones that are causing the extra rows to appear.
Trying to use Distinct in select statement but not getting the desired result. I want CaseID to be returned for the last updated comment only. Below is the query that I am trying to use.
Select Distinct av.CaseID,fr.Rule_Description, av.Date, av.Status, fr.RULE_PRIORITY, ac.User_comments, max(ac.Comment_PostDate),ac.UserID
From tblAlertView av
Join tblAlertComment ac
on av.CaseID = ac.CaseID
Join tblFBLRule fr
on av.RuleID = fr.Rule_ID
Join TBLUSER usr
on ac.UserID = usr.USERID
group by av.CaseID, fr.Rule_Description, av.Date, av.Status, fr.RULE_PRIORITY, ac.User_comments, ac.Comment_PostDate,ac.UserID
Query Result
Remove
ac.Comment_PostDate
from group by clause
Rather than using JOIN to get to tblAlertComment, if you use CROSS APPLY you can specify to just return the top 1 comment per case:
SELECT av.CaseID,
fr.Rule_Description,
av.Date,
av.Status,
fr.RULE_PRIORITY,
ac.User_comments,
ac.Comment_PostDate,
ac.UserID
FROM tblAlertView AS av
INNER JOIN tblFBLRule AS fr
ON av.RuleID = fr.Rule_ID
CROSS APPLY
( SELECT TOP 1 ac.User_comments, ac.Comment_PostDate, ac.UserID
FROM tblAlertComment AS ac
INNER JOIN tblUser AS usr
ON usr.UserID = ac.UserID
WHERE ac.CaseID = av.CaseID
ORDER BY ac.Comment_PostDate DESC
) AS ac;
I have a complex query to retrieve some results:
EDITED QUERY (added the UNION ALL):
SELECT t.*
FROM (
SELECT
dbo.Intervencao.INT_Processo, analista,
ETS.ETS_Sigla, ATC.ATC_Sigla, PAT.PAT_Sigla, dbo.Assunto.SNT_Peso,
CASE
WHEN ETS.ETS_Sigla = 'PE' AND (PAT.PAT_Sigla = 'LIB' OR PAT.PAT_Sigla = 'LBR') THEN (0.3*SNT_Peso)
WHEN ETS.ETS_Sigla = 'CD' THEN (0.3*SNT_Peso)*0.3
ELSE SNT_Peso
END AS PESOAREA,
CASE
WHEN a.max_TEA_FimTarefa IS NULL THEN a.max_TEA_InicioTarefa
ELSE a.max_TEA_FimTarefa
END AS DATA_INICIO_TERMINO,
ROW_NUMBER() OVER (PARTITION BY ATC.ATC_Sigla, a.SRV_Id ORDER BY TEA_FimTarefa DESC) AS seqnum
FROM dbo.Tarefa AS t
INNER JOIN (
SELECT
MAX(dbo.TarefaEtapaAreaTecnica.TEA_InicioTarefa) AS max_TEA_InicioTarefa,
MAX (dbo.TarefaEtapaAreaTecnica.TEA_FimTarefa) AS max_TEA_FimTarefa,
dbo.Pessoa.PFJ_Descri AS analista, dbo.AreaTecnica.ATC_Id, dbo.Tarefa.SRV_Id
FROM dbo.TarefaEtapaAreaTecnica
LEFT JOIN dbo.Tarefa ON dbo.TarefaEtapaAreaTecnica.TRF_Id = dbo.Tarefa.TRF_Id
LEFT JOIN dbo.AreaTecnica ON dbo.TarefaEtapaAreaTecnica.ATC_Id = dbo.AreaTecnica.ATC_Id
LEFT JOIN dbo.ServicoAreaTecnica ON dbo.TarefaEtapaAreaTecnica.ATC_Id = dbo.ServicoAreaTecnica.ATC_Id
AND dbo.Tarefa.SRV_Id = dbo.ServicoAreaTecnica.SRV_Id
INNER JOIN dbo.Pessoa ON dbo.Pessoa.PFJ_Id = dbo.ServicoAreaTecnica.PFJ_Id_Analista
GROUP BY dbo.AreaTecnica.ATC_Id, dbo.Tarefa.SRV_Id, dbo.Pessoa.PFJ_Descri
) AS a ON t.SRV_Id = a.SRV_Id
INNER JOIN dbo.TarefaEtapaAreaTecnica AS TarefaEtapaAreaTecnica_1 ON
t.TRF_Id = TarefaEtapaAreaTecnica_1.TRF_Id
AND a.ATC_Id = TarefaEtapaAreaTecnica_1.ATC_Id
AND a.max_TEA_InicioTarefa = TarefaEtapaAreaTecnica_1.TEA_InicioTarefa
LEFT JOIN AreaTecnica ATC ON TarefaEtapaAreaTecnica_1.ATC_Id = ATC.ATC_Id
LEFT JOIN Etapa ETS ON TarefaEtapaAreaTecnica_1.ETS_Id = ETS.ETS_Id
LEFT JOIN ParecerTipo PAT ON TarefaEtapaAreaTecnica_1.PAT_Id = PAT.PAT_Id
LEFT OUTER JOIN dbo.Servico ON a.SRV_Id = dbo.Servico.SRV_Id
INNER JOIN dbo.Intervencao ON dbo.Servico.INT_Id = dbo.Intervencao.INT_Id
LEFT JOIN dbo.Assunto ON dbo.Servico.SNT_Id = dbo.Assunto.SNT_Id
) t
The result is following:
It works good, the problem is that I was asked that if when a row is not present on this query, it must contain values from another table (ServicoAreaTecnica), so I got this query for the other table based on crucial information of the first query. So if I UNION ALL I get this:
Query1 +
UNION ALL
SELECT INN.INT_Processo,
PES.PFJ_Descri,
NULL, --ETS.ETS_Sigla,
ART.ATC_Sigla,
NULL ,--PAT.PAT_Sigla,
ASS.SNT_Peso,
NULL, --PESOAREA
NULL, --DATA_INICIO_TERMINO
NULL --seqnum
FROM dbo.ServicoAreaTecnica AS SAT
INNER JOIN dbo.AreaTecnica AS ART ON ART.ATC_Id = SAT.ATC_Id
INNER JOIN dbo.Servico AS SER ON SER.SRV_Id = SAT.SRV_Id
INNER JOIN dbo.Assunto AS ASS ON ASS.SNT_Id = SER.SNT_Id
INNER JOIN dbo.Intervencao AS INN ON INN.INT_Id = SER.INT_Id
INNER JOIN dbo.Pessoa AS PES ON PES.PFJ_Id = SAT.PFJ_Id_Analista
The result is following:
So what I want to do is to remove row number 1 because row number 2 exists on the first query, I think I got it explained better this time. The result should be only row number 1, row number 2 would appear only if query 1 doesn't retrieve a row for that particular INN.INT_Processo.
Thanks!
Ok, there are two ways to reduce your record set. Given that you've already written the code to produce the table with the extra rows, it might be easiest to just add code to reduce that:
Select * from
(Select *
, Row_Number() over
(partition by IntProcesso, Analista order by ISNULL(seqnum, 0) desc) as RN
from MyResults) a
where RN = 1
This will assign row_number 1 to any rows that came from your first query, or to any rows from the second query that do not have matches in the first query, then filter out extra rows.
You could also use outer joins with isnull or coalesce, as others have suggested. Something like this:
Select ISNULL(a.IntProcesso, b.IntProcesso) as IntProcesso
, ISNULL(a.Analista, b.Analista) as Analista
, ISNULL(a.ETSsigla, b.ETSsigla) as ETSsigla
[repeat for the rest of your columns]
from Table1 a
full outer join Table2 b
on a.IntProcesso = b.IntProcesso and a.Analista = b.Analista
Your code is hard to read, because of the lengthy names of everything (and to be honest, the fact that they're in a language I don't speak also makes it a lot harder).
But how about: replacing your INNER JOINs with LEFT JOINs, adding more LEFT JOINs to draw in the alternative tables, and introducing ISNULL clauses for each variable you want in the results?
If you do something like ... Query1 Right Join Query2 On ... that should get only the rows in Query2 that don't appear in Query 1.
I have a query that is working for the most part until I had to add the inner select for "Trainers".
As you can see in the code below, I am trying to get all of the trainers for each of the segment ID's.
I am getting an error on the first inner selects where clause WHERE trn.segmentID = tes.teSegmentID saying that tes.teSegmentID is not defined.
Is there another way to approach this query in order to get the trainers like I am trying to accomplish?
SELECT *,
(SELECT e2.[FirstName] AS trainerFirst,
e2.[LastName] AS trainerLast
FROM BS_Training_Trainers AS trn
LEFT OUTER JOIN
employeeTable AS e2
ON trn.trainerEmpID = e2.EmpID
WHERE trn.segmentID = tes.teSegmentID
FOR XML PATH ('trainer'), TYPE, ELEMENTS, ROOT ('trainers'))
FROM dbo.BS_TrainingEvents AS a
WHERE a.trainingEventID IN (SELECT tes.trainingEventID
FROM dbo.BS_TrainingEvent_Segments AS tes
INNER JOIN
dbo.BS_TrainingEvent_SegmentDetails AS tesd
ON tesd.segmentID = tes.teSegmentID
INNER JOIN
dbo.BS_LocaleCodes AS locale
ON locale.localeID = tesd.localeID
WHERE locale.location = 'Baltimore');
It seems like you're taking the scenic route towards this:
SELECT a.*,
X.[FirstName],
X.[LastName]
FROM dbo.BS_TrainingEvents AS a
LEFT OUTER JOIN (SELECT e2.[FirstName], e2.[LastName], locale.location FROM dbo.BS_TrainingEvent_Segments AS tes
INNER JOIN dbo.BS_Training_Trainers AS trn ON trn.segmentID = tes.teSegmentID
INNER JOIN dbo.BS_TrainingEvent_SegmentDetails AS tesd ON tesd.segmentID = tes.teSegmentID
INNER JOIN dbo.BS_LocaleCodes AS locale ON locale.localeID = tesd.localeID
LEFT OUTER JOIN employeeTable AS e2 ON trn.trainerEmpID = e2.EmpID) AS X ON a.trainingEventID = X.trainingEventID
WHERE X.location = 'Baltimore';
Not sure if I got all those joins right, it was hard to decode from all the nesting you have going on.
If I have guessed table relationships from their names correctly, the only way to solve this is to reference the same filtering condition twice: first, in the XML generation part, and second in the outer level of the query:
with cte as (
select distinct tes.trainingEventID, tes.teSegmentID
from dbo.BS_TrainingEvent_Segments AS tes
INNER JOIN dbo.BS_TrainingEvent_SegmentDetails AS tesd ON tesd.segmentID = tes.teSegmentID
INNER JOIN dbo.BS_LocaleCodes AS locale ON locale.localeID = tesd.localeID
WHERE locale.location = 'Baltimore'
)
SELECT a.*, (
SELECT e2.[FirstName] AS trainerFirst, e2.[LastName] AS trainerLast
FROM BS_Training_Trainers AS trn
LEFT OUTER JOIN employeeTable AS e2 ON trn.trainerEmpID = e2.EmpID
inner join cte c on trn.segmentID = c.teSegmentID
FOR XML PATH ('trainer'), TYPE, ELEMENTS, ROOT ('trainers')
)
FROM dbo.BS_TrainingEvents AS a
where exists (select 0 from cte c where c.testrainingEventID = a.trainingEventID);
It's difficult to tell whether this is completely correct, of course, but I hope you get the idea.
Oh yes, and if you would have an event with multiple Baltimore segments, you will never be able to tell which trainer takes which one. But you can always add more data into XML to resolve this.
I tried everything but I couldn't overcome this problem.
I have a table-valued function.
When I call this function with
SELECT * FROM Ratings o1
CROSS APPLY dbo.FN_RatingSimilarity(50, 497664, 'Cosine') o2
WHERE o1.trackId = 497664
It takes a while to be executed. But when I do this.
SELECT * FROM Ratings o1
CROSS APPLY dbo.FN_RatingSimilarity(50, o1.trackId, 'Cosine') o2
WHERE o1.trackId = 497664
It is executed in 32 seconds. I created all indexes but It didn't help.
My function by the way:
ALTER FUNCTION [dbo].[FN_RatingSimilarity]
(
#trackId INT,
#nTrackId INT,
#measureType VARCHAR(100)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT o2.id,
o2.name,
o2.releaseDate,
o2.numberOfRatings,
o2.averageRating,
COUNT(1) as numberOfSharedUsers,
CASE #measureType
WHEN 'Cosine' THEN SUM(o3.score*o4.score)/(0.01+SQRT(SUM(POWER(o3.score,2))) * SQRT(SUM(POWER(o4.score,2))))
WHEN 'AdjustedCosine' THEN SUM((o3.score-o5.averageRating)*(o4.score-o5.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o5.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o5.averageRating, 2))))
WHEN 'Pearson' THEN SUM((o3.score-o1.averageRating)*(o4.score-o2.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o1.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o2.averageRating, 2))))
END as similarityRatio
FROM dbo.Tracks o1
INNER JOIN dbo.Tracks o2 ON o2.id != #trackId
INNER JOIN dbo.Ratings o3 ON o3.trackId = o1.id
INNER JOIN dbo.Ratings o4 ON o4.trackId = o2.id AND o4.userId = o3.userId
INNER JOIN dbo.Users o5 ON o5.id = o4.userId
WHERE o1.id = #trackId
AND o2.id = ISNULL(#nTrackId, o2.id)
GROUP BY o2.id,
o2.name,
o2.releaseDate,
o2.numberOfRatings,
o2.averageRating
)
Any help would be appreciated.
Thanks.
Emrah
I believe that your bottleneck is the calculations + your very expensive inner joins.
The way your are joining is basically creating a cross join - It is returning a result set with all ther records linked to all other records, Except the one for which the id is supplied. Then you go and add to that result set with the other inner joins.
For every inner join, SQL goes and creates a result set with all the rows matching.
So the first thing you do in your query is to tell SQL to basically do a cross join on the same table. (I am assuming you are still following, that looks pretty advanced so I'll just take you are familiar with advanced SQL syntax and operators)
Now in the next inner join, you are applying the Results table to your newly created huge result set, and only then filtering out the ones not both tables.
So as a start, see if you can't do your joins the other way around. (This really depends on your table record count and record sizes). Try to have the smallest result sets first and then join onto that.
The second thing you might want to try is to firstly limit your result set even before the joins.So start with a CTE where you filter for o1.id = #trackId. Then select * from this CTE , do your joins on the CTE and then filter in your query for o2.id = ISNULL(#nTrackId, o2.id)
I will work on an example, stay tuned...
--
Ok, I added an example, did a quick test and the values returned are the same. Run this through your data and let us know if there is any improvement. (Note, this does not address the INNER JOIN order point discussed, still do play around with that.)
Example:
ALTER FUNCTION [dbo].[FN_RatingSimilarity_NEW]
(
#trackId INT,
#nTrackId INT,
#measureType VARCHAR(100)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
WITH CTE_ALL AS
(
SELECT id,
name,
releaseDate,
numberOfRatings,
averageRating
FROM dbo.Tracks
WHERE id = #trackId
)
SELECT o2.id,
o2.name,
o2.releaseDate,
o2.numberOfRatings,
o2.averageRating,
COUNT(1) as numberOfSharedUsers,
CASE #measureType
WHEN 'Cosine' THEN SUM(o3.score*o4.score)/(0.01+SQRT(SUM(POWER(o3.score,2))) * SQRT(SUM(POWER(o4.score,2))))
WHEN 'AdjustedCosine' THEN SUM((o3.score-o5.averageRating)*(o4.score-o5.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o5.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o5.averageRating, 2))))
WHEN 'Pearson' THEN SUM((o3.score-o1.averageRating)*(o4.score-o2.averageRating))/(0.01+SQRT(SUM(POWER(o3.score-o1.averageRating, 2)))*SQRT(SUM(POWER(o4.score-o2.averageRating, 2))))
END as similarityRatio
FROM CTE_ALL o1
INNER JOIN dbo.Tracks o2 ON o2.id != #trackId
INNER JOIN dbo.Ratings o3 ON o3.trackId = o1.id
INNER JOIN dbo.Ratings o4 ON o4.trackId = o2.id AND o4.userId = o3.userId
INNER JOIN dbo.Users o5 ON o5.id = o4.userId
WHERE o2.id = ISNULL(#nTrackId, o2.id)
GROUP BY o2.id,
o2.name,
o2.releaseDate,
o2.numberOfRatings,
o2.averageRating
)