Why would SQL Server choose Clustered Index Scan over Non-Clustered one? - sql-server
In one of the tables I am querying, a clustered index was created over a key that's not a primary key. (I don't know why.)
However, there's a non-clustered index for the primary key for this table.
In the execution plan, SQL is choosing the clustered index, rather than the non-clustered index for the primary key which I am using in my query.
Is there a reason why SQL would do this? How can I force SQL to choose the non-clustered index instead?
Appending more detail:
The table has many fields and the query contains many joins. Let me abstract it a bit.
The table definition looks like this:
SlowTable
[SlowTable_id] [int] IDENTITY(200000000,1) NOT NULL,
[fk1Field] [int] NULL,
[fk2Field] [int] NULL,
[other1Field] [varchar] NULL,
etc. etc...
and then the indices for this table are:
fk1Field (Clustered)
SlowTable_id (Non-Unique, Non-Clustered)
fk2Field (Non-Unique, Non-Clustered)
... and 14 other Non-Unique, Non-Clustered indices on other fields
Presumably there are lots more queries made against fk1Field which is why they selected this as the basis for the Clustered index.
The query I have uses a view:
SELECT
[field list]
FROM
SourceTable1 S1
INNER JOIN SourceTable2 S2
ON S2.S2_id = S1.S2_id
INNER JOIN SourceTable3 S3
ON S3.S3_id = S2.S3_id
INNER JOIN SlowTable ST
ON ST.SlowTable_id = S1.SlowTable_id
INNER JOIN [many other tables, around 7 more...]
The execution plan is quite big, with the nodes concerned say
Hash Match
(Inner Join)
Cost: 9%
with a thick arrow pointing to
Clustered Index Scan (Clustered)
SlowTable.fk1Field
Cost: 77%
I hope this provides enough detail on the issue.
Thanks!
ADDENDUM 2:
Correction to my previous post. The view doesn't have a where clause. It is just a series of inner joins. The execution plan was taken from an Insert statement that uses the View (listed as SLOW_VIEW) in a complex query that looks like the following:
(What this stored procedure does is to do a proportional split of the total amount of some records, based on weights, computed as percentage against a total. This mimics distributing a value from, say, one account, to other accounts.)
INSERT INTO dbo.WDTD(
FieldA,
FieldB,
GWB_id,
C_id,
FieldC,
PG_id,
FieldD,
FieldE,
O_id,
FieldF,
FieldG,
FieldH,
FieldI,
GWBIH_id,
T_id,
JO_id,
PC_id,
PP_id,
FieldJ,
FieldK,
FieldL,
FieldM,
FieldN,
FieldO,
FieldP,
FieldQ,
FieldS)
SELECT DISTINCT
#FieldA FieldA,
GETDATE() FieldB,
#Parameter1 GWB_id,
GWBIH.C_id C_id,
P.FieldT FieldC,
P.PG_id PG_id,
PAM.FieldD FieldD,
PP.FieldU FieldE,
GWBIH.O_id O_id,
CO.FieldF FieldF,
CO.FieldG FieldG,
PSAM.FieldH FieldH,
PSAM.FieldI FieldI,
SOURCE.GWBIH_id GWBIH_id,
' ' T_id,
GWBIH.JO_id JO_id,
SOURCE.PC_id PC_id,
GWB.PP_id,
SOURCE.FieldJ FieldJ,
1 FieldK,
ROUND((SUM(GWBIH.Total) / AGG.Total) * SOURCE.Total, 2) FieldL,
ROUND((SUM(GWBIH.Total) / AGG.Total) * SOURCE.Total, 2) FieldM,
0 FieldN,
' ' FieldO,
ESGM.FieldP_flag FieldP,
SOURCE.FieldQ FieldQ,
'[UNPROCESSED]'
FROM
dbo.Table1 GWBIH
INNER JOIN dbo.Table2 GWBPH
ON GWBPH.GWBP_id = GWBIH.GWBP_id
INNER JOIN dbo.Table3 GWB
ON GWB.GWB_id = GWBPH.GWB_id
INNER JOIN dbo.Table4 P
ON P.P_id = GWBPH.P_id
INNER JOIN dbo.Table5 ESGM
ON ESGM.ET_id = P.ET_id
INNER JOIN dbo.Table6 PAM
ON PAM.PG_id = P.PG_id
INNER JOIN dbo.Table7 O
ON O.dboffcode = GWBIH.O_id
INNER JOIN dbo.Table8 CO
ON
CO.Country_id = O.Country_id
AND CO.Brand_id = O.Brand_id
INNER JOIN dbo.Table9 PSAM
ON PSAM.Office_id = GWBIH.O_id
INNER JOIN dbo.Table10 PCM
ON PCM.PC_id = GWBIH.PC_id
INNER JOIN dbo.Table11 PC
ON PC.PC_id = GWBIH.PC_id
INNER JOIN dbo.Table12 PP
ON PP.PP_id = GWB.PP_id
-- THIS IS THE VIEW THAT CONTAINS THE CLUSTERED INDEX SCAN
INNER JOIN dbo.SLOW_VIEW GL
ON GL.JO_id = GWBIH.JO_id
INNER JOIN (
SELECT
GWBIH.C_id C_id,
GWBPH.GWB_id,
SUM(GWBIH.Total) Total
FROM
dbo.Table1 GWBIH
INNER JOIN dbo.Table2 GWBPH
ON GWBPH.GWBP_id = GWBIH.GWBP_id
INNER JOIN dbo.Table10 PCM
ON PCM.PC_id = GWBIH.PC_id
WHERE
PCM.Split_flag = 0
AND GWBIH.JO_id IS NOT NULL
GROUP BY
GWBIH.C_id,
GWBPH.GWB_id
) AGG
ON AGG.C_id = GWBIH.C_id
AND AGG.GWB_id = GWBPH.GWB_id
INNER JOIN (
SELECT
GWBIH.GWBIH_id GWBIH_id,
GWBIH.C_id C_id,
GWBIH.FieldQ FieldQ,
GWBP.GWB_id GWB_id,
PCM.PC_id PC_id,
CASE
WHEN WT.FieldS IS NOT NULL
THEN WT.FieldS
WHEN WT.FieldS IS NULL
THEN PCMS.FieldT
END FieldJ,
SUM(GWBIH.Total) Total
FROM
dbo.Table1 GWBIH
INNER JOIN dbo.Table2 GWBP
ON GWBP.GWBP_id = GWBIH.GWBP_id
INNER JOIN dbo.Table4 P
ON P.P_id = GWBP.P_id
INNER JOIN dbo.Table10 PCM
ON PCM.PC_id = GWBIH.PC_id
INNER JOIN dbo.Table11 PCMS
ON PCMS.PC_id = PCM.PC_id
LEFT JOIN dbo.WT WT
ON WT.ET_id = P.ET_id
AND WT.PC_id = GWBIH.PC_id
WHERE
PCM.Split_flag = 1
GROUP BY
GWBIH.GWBI_id,
GWBIH.C_id,
GWBIH.FieldQ,
GWBP.GWB_id,
WT.FieldS,
PCM.PC_id,
PCMS.ImportCode
) SOURCE
ON SOURCE.C_id = GWBIH.C_id
AND SOURCE.GWB_id = GWBPH.GWB_id
WHERE
PCM.Split_flag = 0
AND AGG.Total > 0
AND GWBPH.GWB_id = #Parameter1
AND NOT EXISTS (
SELECT *
FROM dbo.WDTD
WHERE
TD.C_id = GWBIH.C_id
AND TD.FieldA = GWBPH.GWB_id
AND TD.JO_id = GWBIH.JO_id
AND TD.PC_id = SOURCE.PC_id
AND TD.GWBIH_id = ' ')
GROUP BY
GWBIH.C_id,
P.FieldT,
GWBIH.JO_id,
GWBIH.O_id,
GWBPH.GWB_id,
P.PG_id,
PAM.FieldD,
PP.FieldU,
GWBIH.O_id,
CO.FieldF,
CO.FieldG,
PSAM.FieldH,
PSAM.FieldI,
GWBIH.JO_id,
SOURCE.PC_id,
GWB.PP_id,
SOURCE.FieldJ,
ESGM.FieldP_flag,
SOURCE.GWBIH_id,
SOURCE.FieldQ,
AGG.Total,
SOURCE.Total
ADDENDUM 3: When doing an execution plan on the select statement of the view, I see this:
Hash Match <==== Bitmap <------ etc...
(Inner Join) (Bitmap Create)
Cost: 0% Cost: 0%
^
|
|
Parallelism Clustered Index Scan (Clustered)
(Repartition Streams) <==== Slow_Table.fk1Field
Cost: 1% Cost: 98%
ADDENDUM 4: I think I found the problem. The Clustered Index Scan isn't referring to my clause that references the Primary Key, but rather another clause that needs a field that is, in some way, related to fk1Field above.
Most likely one of:
too many rows to make the index effective
index doesn't fit the ON/WHERE conditions
index isn't covering and SQL Server avoids a key lookup
Edit, after update:
Your indexes are useless because they are all single column indexes, so it does a clustered index scan.
You need an index that matches your ON, WHERE, GROUP BY conditions with INCLUDES for your SELECT list.
If the query you're executing isn't selecting a small subset of the records, SQL Server may well choose to ignore any "otherwise useful" non-clustered index and just scan through the clustered index (in this instance, most likely all rows in the table) - the logic being that the amount of I/O required to perform the query vs. the non-clustered index outweights that required for a full scan.
If you can post the schema of your table(s) + a sample query, I'm sure we can offer more information.
Ideally you shouldn't be telling SQL Server to do either or, it can pick the best, if you give it a good query. Query hints was created to steer the engine a bit, but you shouldn't have to use this just yet.
Sometimes it is beneficial to cluster the table differently that the primary key, is rare, but it can be useful (the clustering controls the data layout while the primary key ensures correctness).
I can tell you exactly why SQL Server picks the clustered index if you show me your query and schema otherwise I'd only be guessing on likely causes and execution plan is helpful in these cases.
For a non-clustered index to be considered it has to be meaningful to the query and if you non-clustered index doesn't cover your query, there's no guaratee that it will be used at all.
A clustered index scan is essentially a table scan (on a table that happens to have a clustered index). You really should post your statement to get a better answer. Your where clause may not be searchable (see sargs), or if you are selecting many records, sql server may scan the table rather than use the index and later have to look up related columns.
Related
SQL Server Index over lookup table of distinct values
I am trying to speed up the following SQL Server query: SELECT V.Id, V.Number, V.VisitDate, V.ArrivalTime, V.VisitKindId, VK.Description AS VisitKindDescription, VK.DescriptionAr AS VisitKindDescriptionAr, V.StatusId, V.Note, V.CancelingReason, V.CancelingTime, V.EnterToDoctorRoomTime, V.PatientId, P.Number AS PatientNumber, P.FirstName, P.LastName, P.BirthDate, P.Note AS PatientNotes, V.DoctorId, D.FullName AS DoctorFullName, V.CreatedById, U.FullName AS UserFullName, V.CreationDate, V.VersionNo FROM Patient_Tbl P INNER JOIN Visit_Tbl V ON P.Id = V.PatientId INNER JOIN VisitKind_Tbl VK ON V.VisitKindId = VK.Id INNER JOIN Doctor_Tbl D ON V.DoctorId = D.Id INNER JOIN User_Tbl U ON V.CreatedById = U.Id INNER JOIN VisitStatus_Tbl VS ON V.StatusId = VS.Id WHERE V.StatusId = 2 --patient is in doctor room and we had the following 4 values the VisitStatus_Tbl: (1 -> In Waiting Room, 2 -> In Doctor Room, 3 -> Canceled, 4 -> Completed) and in one moment of time, there is only one record on the Visits table for one person in the doctor's room. The end-user inform me that there is a delay in the use case that depends on the above query. Please help us speed system performance by suggesting the proper index. Thanks,
You do not indicate if you have any indexes on the tables now. I will assume that the 'ID' columns for patient_tbl, etc are clustered primary keys or just primary keys and have indexes. If not, that is another problem. Simple rule: start with index foreign keys (lookup tables) and WHERE clauses. CREATE INDEX ix_visit_tbl_statusid ON visit_tbl(statusId) CREATE INDEX ix_visit_tbl_patientid ON visit_tbl(patientId) CREATE INDEX ix_visit_tbl_visitkindId ON visit_tbl(visitkindId) CREATE INDEX ix_visit_tbl_doctorid ON visit_tbl(doctorId) CREATE INDEX ix_visit_tbl_createdbyid ON visit_tbl(createdbyId) Now for the comments on how that is too many indexes. It depends ...
WHERE clause gives poor query plan
I'm not sure how best to tune this query and/ or indexes to avoid a blunt FORCE ORDER hint. This main query runs fine, currently returns 0 rows in 0 seconds: SELECT S1.ID, S.LOAD_DATE, s.Deleted,S1.HUB_FORM_ID FROM #TMP S INNER JOIN HUB_FORM H1 ON H1.Form_ID = S.HUB_FORM_BK INNER JOIN HUB_ORG H2 ON H2.Organisation_ID = S.HUB_ORG_BK INNER JOIN HUB_PERSON H3 ON H3.person_id = S.HUB_PERSON_BK INNER JOIN HUB_EVENT H4 ON H4.job_id = S.HUB_EVENT_BK INNER JOIN HUB_WORKFLOW_STEP H5 ON H5.step_id = S.HUB_WORKFLOW_STEP_BK INNER JOIN LNK_FORM_ENTITY S1 ON H1.HUB_FORM_ID = S1.HUB_FORM_ID AND H2.HUB_ORG_ID = S1.HUB_ORG_ID AND H3.HUB_PERSON_ID = S1.HUB_PERSON_ID AND H4.HUB_EVENT_ID = S1.HUB_EVENT_ID AND H5.HUB_WORKFLOW_STEP_ID = S1.HUB_WORKFLOW_STEP_ID INNER JOIN DK_SAT_LNK_FORM_ENTITY S2 ON S1.ID = S2.Parent_ID Adding a WHERE clause on S2.LOAD_DATE_TO makes it run and run (killed off after a minute or two). WHERE S2.LOAD_DATE_TO = '31/12/9999' I'm not sure why that happens as: Without the filter, no rows are returned, so it can make no difference. The index used for the table containing this field in the good plan (with no date filter), already contains that field as the second key field so I'd have thought any additional cost is negligible NB - it doesn't always return 0 rows, but it needs to run (and complete in a reasonable time) whether rows are returned or not. CREATE NONCLUSTERED INDEX [JM_TEST_190221_2] ON [dbo].[DK_SAT_LNK_FORM_ENTITY] ( [Parent_ID] ASC, [LOAD_DATE_TO] ASC ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] GO The live query plan shows it running through millions of rows in the LNK_ and DK_ tables and subsequently joined tables, whereas in the original plan it shows actual number of rows = 56 (56 executions - expected 1 row) on the LNK_ table and 0 actual rows (56 executions) on the DK_ table. If I add OPTION (FORCE ORDER) after the WHERE clause, it runs in 0 seconds again, with a different query plan to the original good one. Clearly that resolves the issue in the short term, but I'm wary of using such a blunt instrument given that it may not always be the optimal choice as data changes over time. Edit I have tried updating statistics with FULL SCAN, and rebuilding key indexes but it had no impact. Query plans below - any tips or explanation gratefully received! Original good plan (actual plan): no WHERE clause : https://www.brentozar.com/pastetheplan/?id=HyG3SwTZd Poor plan (from live query plan at point killed off) : https://www.brentozar.com/pastetheplan/?id=rJpBSPpWO Good plan with FORCE ORDER hint : https://www.brentozar.com/pastetheplan/?id=SJqxUvT-d
Clearly, your issue is that HUB_FORM is selective enough that it is limiting the rows down to 0 at the very beginning. But the optimizer does not realize that and therefore it is reversing the order of the joins. To enforce the order without hammering the rest of the query via FORCE ORDER, we have two options: Pre-compute the join of #TMP, HUB_FORM into a temp table or table variable. This can often cause a fair bit of extra IO. A much better option is to persuade the optimizer to compute the join first, but without using explicit hints. This is often best done by putting the join inside a subquery with a SELECT TOP, but you may need to modify this by adding one or two further joins. SELECT S1.ID, S.LOAD_DATE, s.Deleted, S1.HUB_FORM_ID FROM ( SELECT TOP (9223372036854775807) S.* FROM #TMP S INNER JOIN HUB_FORM H1 ON H1.Form_ID = S.HUB_FORM_BK ) S INNER JOIN HUB_ORG H2 ON H2.Organisation_ID = S.HUB_ORG_BK INNER JOIN HUB_PERSON H3 ON H3.person_id = S.HUB_PERSON_BK INNER JOIN HUB_EVENT H4 ON H4.job_id = S.HUB_EVENT_BK INNER JOIN HUB_WORKFLOW_STEP H5 ON H5.step_id = S.HUB_WORKFLOW_STEP_BK INNER JOIN LNK_FORM_ENTITY S1 ON H1.HUB_FORM_ID = S1.HUB_FORM_ID AND H2.HUB_ORG_ID = S1.HUB_ORG_ID AND H3.HUB_PERSON_ID = S1.HUB_PERSON_ID AND H4.HUB_EVENT_ID = S1.HUB_EVENT_ID AND H5.HUB_WORKFLOW_STEP_ID = S1.HUB_WORKFLOW_STEP_ID INNER JOIN DK_SAT_LNK_FORM_ENTITY S2 ON S1.ID = S2.Parent_ID If that doesn't work, you may be able to persuade it by changing the TOP to a variable, and adding an OPTIMIZE FOR hint at the end: DECLARE #topRows bigint = 9223372036854775807; SELECT S1.ID, S.LOAD_DATE, s.Deleted, S1.HUB_FORM_ID FROM ( SELECT TOP (#topRows) S.* FROM #TMP S INNER JOIN HUB_FORM H1 ON H1.Form_ID = S.HUB_FORM_BK ) S INNER JOIN HUB_ORG H2 ON ......... OPTION (OPTIMIZE FOR (#topRows = 1)); This causes the optimizer to think it will only get 1 row out of the join, but actually allows more rows if that is the case at a runtime. Note that none of this changes the essential semantics of the query
Is there a way to improve performance of this query (aggregation)
Here is the query: SELECT sdd.CompanyID ,sdd.ClassID ,sdd.PeriodID, SUM(sdd.Volume) AS VolumeTotal, SUM(sdd.Dollars) AS DollasTotal ,COUNT(LogID) as LogIDCount FROM (SELECT dp.CompanyID ,ds.ClassID ,fs.PeriodID, fs.LogID, sum(fs.Volume) AS Volume,sum(fs.Dollars) AS Dollars FROM DW.FactSupplyDataDetail fs WITH (NOLOCK) JOIN DW.DimPLProvider dp WITH (NOLOCK) ON fs.PLProviderID = dp.PLProviderID JOIN DW.DimSupply ds WITH (NOLOCK) ON fs.SupplyID = ds.SupplyID WHERE fs.PeriodID between 201901 and 201907 GROUP BY dp.CompanyID ,ds.ClassID ,fs.PeriodID,fs.LogID) sdd GROUP BY sdd.CompanyID ,sdd.ClassID ,sdd.PeriodID here is the execution plan for the query: https://www.brentozar.com/pastetheplan/?id=rkoxSEjEH DW.FactSupplyDataDetail has 10590237 records DW.DimPLProvider has 5071 records DW.DimSupply has 81001 records result of a query is 1992094
Check that Table FactSupplyDataDetail has index started from PeriodID Table DimSupply has index started from SupplyID Table DimPLProvider has index started from PLProviderID The table TABLE has index started from column COLUMN means that you have index (idx_xxx_) defined as: CREATE INDEX idx_xxx on TABLE (COLUMN, some other columns or empty list);
Why is using Table Spool slower than not?
There are two similiar sqls running in sql server,in which the table TBSFA_DAT_CUST has millons rows and no constraint(no index and primary key), the other two has just a few rows and normal primary key: s for slower one: SELECT A.CUST_ID, C.CUST_NAME, A.xxx --and several specific columns FROM TBSFA_DAT_ORD_LIST A JOIN VWSFA_ORG_EMPLOYEE B ON A.EMP_ID = B.EMP_ID LEFT JOIN TBSFA_DAT_CUST C ON A.CUST_ID = B.CUST_ID JOIN VWSFA_ORG_EMPLOYEE D ON A.REVIEW_ID = D.EMP_ID WHERE ISNULL(A.BATCH_ID, '') != '' execution plan of slower one f for faster one: SELECT * FROM TBSFA_DAT_ORD_LIST A JOIN VWSFA_ORG_EMPLOYEE B ON A.EMP_ID = B.EMP_ID LEFT JOIN TBSFA_DAT_CUST C ON A.CUST_ID = B.CUST_ID JOIN VWSFA_ORG_EMPLOYEE D ON A.REVIEW_ID = D.EMP_ID WHERE ISNULL(A.BATCH_ID, '') != '' execution plan of faster one f(above 0.6s) is much faster than s(above 4.6s). Otherwise,I found two ways to make s fast as f: 1.Add constaint and primary key in table TBSFA_DAT_CUST.CUST_ID; 2.Specific more than 61 columns of table TBSFA_DAT_CUST(totally 80 columns). My question is why sql optimizer uses Table Spool when I specific columns in SELECT clause rather than '*',and why is using Table Spool one executes slower? My question is about sql-servertable-spool
In the slower query you are limiting your result set to specific columns. Since this is an un-indexed un constrained table the optimizer is creating a temporary table from the original table scan with only the specific columns required. It is then running through the nested loop operator on the temporary table. When it knows its going to need every column on the table (Select *) it can run the nested loop operator directly off the table scan because the result set of the scan will be joined in full to the top table. Outside of that your query has a couple other possible problems: LEFT JOIN TBSFA_DAT_CUST C ON A.CUST_ID = B.CUST_ID you aren't joining to anything here, you are joining the entire table to every record. Did mean a.cust_id = c.cust_id or b.cust_id = c.cust_id or a.cust_id = c.cust_id and b.cust_id = c.cust_id? Also, this function in the where clause is pointless and can degrade performance: WHERE ISNULL(A.BATCH_ID, '') != '' change it to: WHERE A.BATCH_ID is not null and A.Batch_ID <> ''
Why do I have duplicate records in my JOIN
I am retrieving data from table ProductionReportMetrics where I have column NetRate_QuoteID. Then to that result set I need to get Description column. And in order to get a Description column, I need to join 3 tables: NetRate_Quote_Insur_Quote NetRate_Quote_Insur_Quote_Locat NetRate_Quote_Insur_Quote_Locat_Liabi But after that my premium is completely off. What am I doing wrong here? SELECT QLL.Description, QLL.ClassCode, prm.NetRate_QuoteID, QL.LocationID, ISNULL(SUM(premium),0) AS NetWrittenPremium, MONTH(prm.EffectiveDate) AS EffMonth FROM ProductionReportMetrics prm LEFT JOIN NetRate_Quote_Insur_Quote Q ON prm.NetRate_QuoteID = Q.QuoteID INNER JOIN NetRate_Quote_Insur_Quote_Locat QL ON Q.QuoteID = QL.QuoteID INNER JOIN NetRate_Quote_Insur_Quote_Locat_Liabi QLL ON QL.LocationID = QLL.LocationID WHERE YEAR(prm.EffectiveDate) = 2016 AND CompanyLine = 'Ironshore Insurance Company' GROUP BY MONTH(prm.EffectiveDate), QLL.Description, QLL.ClassCode, prm.NetRate_QuoteID, QL.LocationID I think the problem in this table: What Am I missing in this Query? select ClassCode, QLL.Description, sum(Premium) from ProductionReportMetrics prm LEFT JOIN NetRate_Quote_Insur_Quote Q ON prm.NetRate_QuoteID = Q.QuoteID LEFT JOIN NetRate_Quote_Insur_Quote_Locat QL ON Q.QuoteID = QL.QuoteID LEFT JOIN (SELECT * FROM NetRate_Quote_Insur_Quote_Locat_Liabi nqI JOIN ( SELECT LocationID, MAX(ClassCode) FROM NetRate_Quote_Insur_Quote_Locat_Liabi GROUP BY LocationID ) nqA ON nqA.LocationID = nqI.LocationID ) QLL ON QLL.LocationID = QL.LocationID where Year(prm.EffectiveDate) = 2016 AND CompanyLine = 'Ironshore Insurance Company' GROUP BY Q.QuoteID,QL.QuoteID,QL.LocationID Now it says Msg 8156, Level 16, State 1, Line 14 The column 'LocationID' was specified multiple times for 'QLL'.
It looks like DVT basically hit on the answer. The only reason you would get different amounts(i.e. duplicated rows) as a result of a join is that one of the joined tables is not a 1:1 relationship with the primary table. I would suggest you do a quick check against those tables, looking for table counts. --this should be your baseline count SELECT COUNT(*) FROM ProductionReportMetrics GROUP BY MONTH(prm.EffectiveDate), prm.NetRate_QuoteID --this will be a check against the first joined table. SELECT COUNT(*) FROM NetRate_Quote_Insur_Quote Q WHERE QuoteID IN (SELECT NetRate_QuoteID FROM ProductionReportMetrics GROUP BY MONTH(prm.EffectiveDate), prm.NetRate_QuoteID) Basically you will want to do a similar check against each of your joined tables. If any of the joined tables are part of the grouping statement, make sure they are also in the grouping of the count check statement. Also make sure to alter the WHERE clause of the check count statement to use the join clause columns you were using. Once you find a table that returns the incorrect number of rows, you will have your answer as to what table is causing the problem. Then you will just have to decide how to limit that table down to distinct rows(some type of aggregation). This advice is really just to show you how to QA this particular query. Break it up into the smallest possible parts. In this case, we know that it is a join that is causing the problem, so take it one join at a time until you find the offender.