Is there a way to improve performance of this query (aggregation) - sql-server

Here is the query:
SELECT sdd.CompanyID
,sdd.ClassID
,sdd.PeriodID, SUM(sdd.Volume) AS VolumeTotal, SUM(sdd.Dollars) AS DollasTotal
,COUNT(LogID) as LogIDCount
FROM (SELECT dp.CompanyID
,ds.ClassID
,fs.PeriodID, fs.LogID, sum(fs.Volume) AS Volume,sum(fs.Dollars) AS Dollars
FROM DW.FactSupplyDataDetail fs WITH (NOLOCK)
JOIN DW.DimPLProvider dp WITH (NOLOCK)
ON fs.PLProviderID = dp.PLProviderID
JOIN DW.DimSupply ds WITH (NOLOCK)
ON fs.SupplyID = ds.SupplyID
WHERE fs.PeriodID between 201901 and 201907
GROUP BY dp.CompanyID
,ds.ClassID
,fs.PeriodID,fs.LogID) sdd
GROUP BY sdd.CompanyID
,sdd.ClassID
,sdd.PeriodID
here is the execution plan for the query:
https://www.brentozar.com/pastetheplan/?id=rkoxSEjEH
DW.FactSupplyDataDetail has 10590237 records
DW.DimPLProvider has 5071 records
DW.DimSupply has 81001 records
result of a query is 1992094

Check that
Table FactSupplyDataDetail has index started from PeriodID
Table DimSupply has index started from SupplyID
Table DimPLProvider has index started from PLProviderID
The table TABLE has index started from column COLUMN means that you have index (idx_xxx_) defined as:
CREATE INDEX idx_xxx on TABLE (COLUMN, some other columns or empty list);

Related

Insert using Insert Into and Inner Join

I'm inserting rows of data from one table's column to another table's column. This is my work done:
Insert into [Inventory](Cost)
Select cast(a.[CCost] as numeric(18,6)) from [InventoryTemp] as a
Inner join [Inventory] as b on a.[ID] = b.[ID]
I have 10000 rows of data in my [Inventory] table (ID column is filled up) but when the above query was executed, the Cost data started from 10001 until 20000.
Inventory InventoryTemp
ID Cost ID Cost
1 1 3.12
3 3 9.90
18 18 8.80
The result I want
Inventory
ID Cost
1 3.12
3 9.90
18 8.80
If I have read your question correctly, I think you are trying to update the values of the cost column in your Inventory table, based on the values in the InventoryTemp table.
Therefore you want to perform an UPDATE command rather than an INSERT.
An example of this would be:
UPDATE
Inventory
SET
Inventory.Cost = InventoryTemp.Cost
FROM
Inventory
INNER JOIN
InventoryTemp
ON
Inventory.ID = InventoryTemp.ID
For more info please see this question: How do I UPDATE from a SELECT in SQL Server?
You need to use UPDATE instead of `INSERT'
UPDATE i
SET [Cost] = it.[Cost]
FROM [Inventory] i
INNER JOIN [InventoryTemp] it
ON i.ID = it.ID
Try use update.
UPDATE b
SET b.Cost = a.Cost
FROM
[InventoryTemp] as a
Inner join [Inventory] as b on a.[ID] = b.[ID]

Why do I have duplicate records in my JOIN

I am retrieving data from table ProductionReportMetrics where I have column NetRate_QuoteID. Then to that result set I need to get Description column.
And in order to get a Description column, I need to join 3 tables:
NetRate_Quote_Insur_Quote
NetRate_Quote_Insur_Quote_Locat
NetRate_Quote_Insur_Quote_Locat_Liabi
But after that my premium is completely off.
What am I doing wrong here?
SELECT QLL.Description,
QLL.ClassCode,
prm.NetRate_QuoteID,
QL.LocationID,
ISNULL(SUM(premium),0) AS NetWrittenPremium,
MONTH(prm.EffectiveDate) AS EffMonth
FROM ProductionReportMetrics prm
LEFT JOIN NetRate_Quote_Insur_Quote Q
ON prm.NetRate_QuoteID = Q.QuoteID
INNER JOIN NetRate_Quote_Insur_Quote_Locat QL
ON Q.QuoteID = QL.QuoteID
INNER JOIN NetRate_Quote_Insur_Quote_Locat_Liabi QLL
ON QL.LocationID = QLL.LocationID
WHERE YEAR(prm.EffectiveDate) = 2016 AND
CompanyLine = 'Ironshore Insurance Company'
GROUP BY MONTH(prm.EffectiveDate),
QLL.Description,
QLL.ClassCode,
prm.NetRate_QuoteID,
QL.LocationID
I think the problem in this table:
What Am I missing in this Query?
select
ClassCode,
QLL.Description,
sum(Premium)
from ProductionReportMetrics prm
LEFT JOIN NetRate_Quote_Insur_Quote Q ON prm.NetRate_QuoteID = Q.QuoteID
LEFT JOIN NetRate_Quote_Insur_Quote_Locat QL ON Q.QuoteID = QL.QuoteID
LEFT JOIN
(SELECT * FROM NetRate_Quote_Insur_Quote_Locat_Liabi nqI
JOIN ( SELECT LocationID, MAX(ClassCode)
FROM NetRate_Quote_Insur_Quote_Locat_Liabi GROUP BY LocationID ) nqA
ON nqA.LocationID = nqI.LocationID ) QLL ON QLL.LocationID = QL.LocationID
where Year(prm.EffectiveDate) = 2016 AND CompanyLine = 'Ironshore Insurance Company'
GROUP BY Q.QuoteID,QL.QuoteID,QL.LocationID
Now it says
Msg 8156, Level 16, State 1, Line 14
The column 'LocationID' was specified multiple times for 'QLL'.
It looks like DVT basically hit on the answer. The only reason you would get different amounts(i.e. duplicated rows) as a result of a join is that one of the joined tables is not a 1:1 relationship with the primary table.
I would suggest you do a quick check against those tables, looking for table counts.
--this should be your baseline count
SELECT COUNT(*)
FROM ProductionReportMetrics
GROUP BY MONTH(prm.EffectiveDate),
prm.NetRate_QuoteID
--this will be a check against the first joined table.
SELECT COUNT(*)
FROM NetRate_Quote_Insur_Quote Q
WHERE QuoteID IN
(SELECT NetRate_QuoteID
FROM ProductionReportMetrics
GROUP BY MONTH(prm.EffectiveDate),
prm.NetRate_QuoteID)
Basically you will want to do a similar check against each of your joined tables. If any of the joined tables are part of the grouping statement, make sure they are also in the grouping of the count check statement. Also make sure to alter the WHERE clause of the check count statement to use the join clause columns you were using.
Once you find a table that returns the incorrect number of rows, you will have your answer as to what table is causing the problem. Then you will just have to decide how to limit that table down to distinct rows(some type of aggregation).
This advice is really just to show you how to QA this particular query. Break it up into the smallest possible parts. In this case, we know that it is a join that is causing the problem, so take it one join at a time until you find the offender.

When a SQL Server query returns no rows(NOT null rows) how do you include that in an aggregate function?

I'm writing a query to look for courses that do not have any of its gradable items graded.
In Blackboard when a user doesn't have a grade at all(e.g. no attempt was ever made) there simply isn't a row in the table gradebook_grade
So if a course doesn't have any grades at all the gradebook_grade table does not have any rows referencing the primary key of the Blackboard course_id
This is what I've used so far:
use bblearn
select cm.course_id
from course_main cm
join gradebook_main gbm on cm.pk1 = gbm.crsmain_pk1
join gradebook_grade gbg on gbm.pk1 = gbg.gradebook_main_pk1
where cm.pk1 = 3947
group by cm.course_id
having count(gbg.pk1) <= 0
The course in question(pk1 3947) is confirmed to not have any grades. So SQL Server says 0 rows affected, naturally.
The thing is, it doesn't select the course_id. I'm guessing having doesn't account for blank/non-existent rows. Is there a way to structure the query to get the course ID when there isn't anything returned? Am I joining on the wrong columns or using where on the wrong column? Any help is appreciated.
Use a left join
select cm.course_id
from course_main cm
left join gradebook_main gbm on cm.pk1 = gbm.crsmain_pk1
left join gradebook_grade gbg on gbm.pk1 = gbg.gradebook_main_pk1
where cm.pk1 = 3947
group by cm.course_id

LEFT JOIN gets heavy as the number of records in the second table increases

I am trying to run a SELECT query using LEFT JOIN. I get a COUNT on my second table ( the table on the right side of LEFT JOIN ). This process becomes slightly heavy as the number of records on the second table goes up. My first and second table have a one-to-many relationship. The second table's CampaignId column is a foreign key to the first table's Id. This is a simplified version of my query:
SELECT a.[Id]
,a.CampaignId
,a.[Inserted] AS 'Date'
,COUNT(b.Id) AS 'Received'
FROM [CampaignRun] AS a
LEFT JOIN [CampaignRecipient] AS b
ON a.Id = b.CampaignRunId
GROUP BY
a.[Id], a.CampaignId,a.[Inserted]
HAVING
a.CampaignId = 637
ORDER BY
a.[Inserted] DESC
The number 637 is an example for one the records only.
Is there a way to make this query run faster?
Use a sub-select to calculate Received:
SELECT a.[Id]
,a.CampaignId
,a.[Inserted] AS 'Date'
, (SELECT COUNT(*) FROM [CampaignRecipient] AS b
WHERE a.Id = b.CampaignRunId ) AS 'Received'
FROM [CampaignRun] AS a
WHERE a.CampaignId = 637
ORDER BY a.[Inserted] DESC
You have unneed HAVING clause here, which you can move to WHERE clause
SELECT a.[Id]
,a.CampaignId
,a.[Inserted] AS 'Date'
,COUNT(b.Id) AS 'Received'
FROM [CampaignRun] AS a
LEFT JOIN [CampaignRecipient] AS b
ON a.Id = b.CampaignRunId
WHERE a.CampaignId = 637
GROUP BY a.[Id], a.CampaignId,a.[Inserted]
ORDER BY a.[Inserted] DESC
Also ensure that you have index on foreign key in [CampaignRecipient] table on CampaignRunId column. It's considered a good practice.

Why would SQL Server choose Clustered Index Scan over Non-Clustered one?

In one of the tables I am querying, a clustered index was created over a key that's not a primary key. (I don't know why.)
However, there's a non-clustered index for the primary key for this table.
In the execution plan, SQL is choosing the clustered index, rather than the non-clustered index for the primary key which I am using in my query.
Is there a reason why SQL would do this? How can I force SQL to choose the non-clustered index instead?
Appending more detail:
The table has many fields and the query contains many joins. Let me abstract it a bit.
The table definition looks like this:
SlowTable
[SlowTable_id] [int] IDENTITY(200000000,1) NOT NULL,
[fk1Field] [int] NULL,
[fk2Field] [int] NULL,
[other1Field] [varchar] NULL,
etc. etc...
and then the indices for this table are:
fk1Field (Clustered)
SlowTable_id (Non-Unique, Non-Clustered)
fk2Field (Non-Unique, Non-Clustered)
... and 14 other Non-Unique, Non-Clustered indices on other fields
Presumably there are lots more queries made against fk1Field which is why they selected this as the basis for the Clustered index.
The query I have uses a view:
SELECT
[field list]
FROM
SourceTable1 S1
INNER JOIN SourceTable2 S2
ON S2.S2_id = S1.S2_id
INNER JOIN SourceTable3 S3
ON S3.S3_id = S2.S3_id
INNER JOIN SlowTable ST
ON ST.SlowTable_id = S1.SlowTable_id
INNER JOIN [many other tables, around 7 more...]
The execution plan is quite big, with the nodes concerned say
Hash Match
(Inner Join)
Cost: 9%
with a thick arrow pointing to
Clustered Index Scan (Clustered)
SlowTable.fk1Field
Cost: 77%
I hope this provides enough detail on the issue.
Thanks!
ADDENDUM 2:
Correction to my previous post. The view doesn't have a where clause. It is just a series of inner joins. The execution plan was taken from an Insert statement that uses the View (listed as SLOW_VIEW) in a complex query that looks like the following:
(What this stored procedure does is to do a proportional split of the total amount of some records, based on weights, computed as percentage against a total. This mimics distributing a value from, say, one account, to other accounts.)
INSERT INTO dbo.WDTD(
FieldA,
FieldB,
GWB_id,
C_id,
FieldC,
PG_id,
FieldD,
FieldE,
O_id,
FieldF,
FieldG,
FieldH,
FieldI,
GWBIH_id,
T_id,
JO_id,
PC_id,
PP_id,
FieldJ,
FieldK,
FieldL,
FieldM,
FieldN,
FieldO,
FieldP,
FieldQ,
FieldS)
SELECT DISTINCT
#FieldA FieldA,
GETDATE() FieldB,
#Parameter1 GWB_id,
GWBIH.C_id C_id,
P.FieldT FieldC,
P.PG_id PG_id,
PAM.FieldD FieldD,
PP.FieldU FieldE,
GWBIH.O_id O_id,
CO.FieldF FieldF,
CO.FieldG FieldG,
PSAM.FieldH FieldH,
PSAM.FieldI FieldI,
SOURCE.GWBIH_id GWBIH_id,
' ' T_id,
GWBIH.JO_id JO_id,
SOURCE.PC_id PC_id,
GWB.PP_id,
SOURCE.FieldJ FieldJ,
1 FieldK,
ROUND((SUM(GWBIH.Total) / AGG.Total) * SOURCE.Total, 2) FieldL,
ROUND((SUM(GWBIH.Total) / AGG.Total) * SOURCE.Total, 2) FieldM,
0 FieldN,
' ' FieldO,
ESGM.FieldP_flag FieldP,
SOURCE.FieldQ FieldQ,
'[UNPROCESSED]'
FROM
dbo.Table1 GWBIH
INNER JOIN dbo.Table2 GWBPH
ON GWBPH.GWBP_id = GWBIH.GWBP_id
INNER JOIN dbo.Table3 GWB
ON GWB.GWB_id = GWBPH.GWB_id
INNER JOIN dbo.Table4 P
ON P.P_id = GWBPH.P_id
INNER JOIN dbo.Table5 ESGM
ON ESGM.ET_id = P.ET_id
INNER JOIN dbo.Table6 PAM
ON PAM.PG_id = P.PG_id
INNER JOIN dbo.Table7 O
ON O.dboffcode = GWBIH.O_id
INNER JOIN dbo.Table8 CO
ON
CO.Country_id = O.Country_id
AND CO.Brand_id = O.Brand_id
INNER JOIN dbo.Table9 PSAM
ON PSAM.Office_id = GWBIH.O_id
INNER JOIN dbo.Table10 PCM
ON PCM.PC_id = GWBIH.PC_id
INNER JOIN dbo.Table11 PC
ON PC.PC_id = GWBIH.PC_id
INNER JOIN dbo.Table12 PP
ON PP.PP_id = GWB.PP_id
-- THIS IS THE VIEW THAT CONTAINS THE CLUSTERED INDEX SCAN
INNER JOIN dbo.SLOW_VIEW GL
ON GL.JO_id = GWBIH.JO_id
INNER JOIN (
SELECT
GWBIH.C_id C_id,
GWBPH.GWB_id,
SUM(GWBIH.Total) Total
FROM
dbo.Table1 GWBIH
INNER JOIN dbo.Table2 GWBPH
ON GWBPH.GWBP_id = GWBIH.GWBP_id
INNER JOIN dbo.Table10 PCM
ON PCM.PC_id = GWBIH.PC_id
WHERE
PCM.Split_flag = 0
AND GWBIH.JO_id IS NOT NULL
GROUP BY
GWBIH.C_id,
GWBPH.GWB_id
) AGG
ON AGG.C_id = GWBIH.C_id
AND AGG.GWB_id = GWBPH.GWB_id
INNER JOIN (
SELECT
GWBIH.GWBIH_id GWBIH_id,
GWBIH.C_id C_id,
GWBIH.FieldQ FieldQ,
GWBP.GWB_id GWB_id,
PCM.PC_id PC_id,
CASE
WHEN WT.FieldS IS NOT NULL
THEN WT.FieldS
WHEN WT.FieldS IS NULL
THEN PCMS.FieldT
END FieldJ,
SUM(GWBIH.Total) Total
FROM
dbo.Table1 GWBIH
INNER JOIN dbo.Table2 GWBP
ON GWBP.GWBP_id = GWBIH.GWBP_id
INNER JOIN dbo.Table4 P
ON P.P_id = GWBP.P_id
INNER JOIN dbo.Table10 PCM
ON PCM.PC_id = GWBIH.PC_id
INNER JOIN dbo.Table11 PCMS
ON PCMS.PC_id = PCM.PC_id
LEFT JOIN dbo.WT WT
ON WT.ET_id = P.ET_id
AND WT.PC_id = GWBIH.PC_id
WHERE
PCM.Split_flag = 1
GROUP BY
GWBIH.GWBI_id,
GWBIH.C_id,
GWBIH.FieldQ,
GWBP.GWB_id,
WT.FieldS,
PCM.PC_id,
PCMS.ImportCode
) SOURCE
ON SOURCE.C_id = GWBIH.C_id
AND SOURCE.GWB_id = GWBPH.GWB_id
WHERE
PCM.Split_flag = 0
AND AGG.Total > 0
AND GWBPH.GWB_id = #Parameter1
AND NOT EXISTS (
SELECT *
FROM dbo.WDTD
WHERE
TD.C_id = GWBIH.C_id
AND TD.FieldA = GWBPH.GWB_id
AND TD.JO_id = GWBIH.JO_id
AND TD.PC_id = SOURCE.PC_id
AND TD.GWBIH_id = ' ')
GROUP BY
GWBIH.C_id,
P.FieldT,
GWBIH.JO_id,
GWBIH.O_id,
GWBPH.GWB_id,
P.PG_id,
PAM.FieldD,
PP.FieldU,
GWBIH.O_id,
CO.FieldF,
CO.FieldG,
PSAM.FieldH,
PSAM.FieldI,
GWBIH.JO_id,
SOURCE.PC_id,
GWB.PP_id,
SOURCE.FieldJ,
ESGM.FieldP_flag,
SOURCE.GWBIH_id,
SOURCE.FieldQ,
AGG.Total,
SOURCE.Total
ADDENDUM 3: When doing an execution plan on the select statement of the view, I see this:
Hash Match <==== Bitmap <------ etc...
(Inner Join) (Bitmap Create)
Cost: 0% Cost: 0%
^
|
|
Parallelism Clustered Index Scan (Clustered)
(Repartition Streams) <==== Slow_Table.fk1Field
Cost: 1% Cost: 98%
ADDENDUM 4: I think I found the problem. The Clustered Index Scan isn't referring to my clause that references the Primary Key, but rather another clause that needs a field that is, in some way, related to fk1Field above.
Most likely one of:
too many rows to make the index effective
index doesn't fit the ON/WHERE conditions
index isn't covering and SQL Server avoids a key lookup
Edit, after update:
Your indexes are useless because they are all single column indexes, so it does a clustered index scan.
You need an index that matches your ON, WHERE, GROUP BY conditions with INCLUDES for your SELECT list.
If the query you're executing isn't selecting a small subset of the records, SQL Server may well choose to ignore any "otherwise useful" non-clustered index and just scan through the clustered index (in this instance, most likely all rows in the table) - the logic being that the amount of I/O required to perform the query vs. the non-clustered index outweights that required for a full scan.
If you can post the schema of your table(s) + a sample query, I'm sure we can offer more information.
Ideally you shouldn't be telling SQL Server to do either or, it can pick the best, if you give it a good query. Query hints was created to steer the engine a bit, but you shouldn't have to use this just yet.
Sometimes it is beneficial to cluster the table differently that the primary key, is rare, but it can be useful (the clustering controls the data layout while the primary key ensures correctness).
I can tell you exactly why SQL Server picks the clustered index if you show me your query and schema otherwise I'd only be guessing on likely causes and execution plan is helpful in these cases.
For a non-clustered index to be considered it has to be meaningful to the query and if you non-clustered index doesn't cover your query, there's no guaratee that it will be used at all.
A clustered index scan is essentially a table scan (on a table that happens to have a clustered index). You really should post your statement to get a better answer. Your where clause may not be searchable (see sargs), or if you are selecting many records, sql server may scan the table rather than use the index and later have to look up related columns.

Resources