SQL Server Index over lookup table of distinct values - sql-server

I am trying to speed up the following SQL Server query:
SELECT
V.Id, V.Number, V.VisitDate, V.ArrivalTime, V.VisitKindId, VK.Description AS
VisitKindDescription,
VK.DescriptionAr AS VisitKindDescriptionAr, V.StatusId, V.Note, V.CancelingReason,
V.CancelingTime, V.EnterToDoctorRoomTime,
V.PatientId, P.Number AS PatientNumber, P.FirstName, P.LastName, P.BirthDate, P.Note AS
PatientNotes, V.DoctorId, D.FullName AS DoctorFullName, V.CreatedById,
U.FullName AS UserFullName, V.CreationDate, V.VersionNo
FROM
Patient_Tbl P INNER JOIN
Visit_Tbl V ON P.Id = V.PatientId INNER JOIN
VisitKind_Tbl VK ON V.VisitKindId = VK.Id INNER JOIN
Doctor_Tbl D ON V.DoctorId = D.Id INNER JOIN
User_Tbl U ON V.CreatedById = U.Id INNER JOIN
VisitStatus_Tbl VS ON V.StatusId = VS.Id
WHERE V.StatusId = 2 --patient is in doctor room
and we had the following 4 values the VisitStatus_Tbl:
(1 -> In Waiting Room, 2 -> In Doctor Room, 3 -> Canceled, 4 -> Completed)
and in one moment of time, there is only one record on the Visits table for one person in the doctor's room.
The end-user inform me that there is a delay in the use case that depends on the above query.
Please help us speed system performance by suggesting the proper index.
Thanks,

You do not indicate if you have any indexes on the tables now. I will assume that the 'ID' columns for patient_tbl, etc are clustered primary keys or just primary keys and have indexes. If not, that is another problem.
Simple rule: start with index foreign keys (lookup tables) and WHERE clauses.
CREATE INDEX ix_visit_tbl_statusid ON visit_tbl(statusId)
CREATE INDEX ix_visit_tbl_patientid ON visit_tbl(patientId)
CREATE INDEX ix_visit_tbl_visitkindId ON visit_tbl(visitkindId)
CREATE INDEX ix_visit_tbl_doctorid ON visit_tbl(doctorId)
CREATE INDEX ix_visit_tbl_createdbyid ON visit_tbl(createdbyId)
Now for the comments on how that is too many indexes. It depends ...

Related

Why is using Table Spool slower than not?

There are two similiar sqls running in sql server,in which the table TBSFA_DAT_CUST has millons rows and no constraint(no index and primary key),
the other two has just a few rows and normal primary key:
s for slower one:
SELECT A.CUST_ID, C.CUST_NAME, A.xxx --and several specific columns
FROM TBSFA_DAT_ORD_LIST A JOIN VWSFA_ORG_EMPLOYEE B ON A.EMP_ID = B.EMP_ID
LEFT JOIN TBSFA_DAT_CUST C ON A.CUST_ID = B.CUST_ID
JOIN VWSFA_ORG_EMPLOYEE D ON A.REVIEW_ID = D.EMP_ID
WHERE ISNULL(A.BATCH_ID, '') != ''
execution plan of slower one
f for faster one:
SELECT *
FROM TBSFA_DAT_ORD_LIST A JOIN VWSFA_ORG_EMPLOYEE B ON A.EMP_ID = B.EMP_ID
LEFT JOIN TBSFA_DAT_CUST C ON A.CUST_ID = B.CUST_ID
JOIN VWSFA_ORG_EMPLOYEE D ON A.REVIEW_ID = D.EMP_ID
WHERE ISNULL(A.BATCH_ID, '') != ''
execution plan of faster one
f(above 0.6s) is much faster than s(above 4.6s).
Otherwise,I found two ways to make s fast as f:
1.Add constaint and primary key in table TBSFA_DAT_CUST.CUST_ID;
2.Specific more than 61 columns of table TBSFA_DAT_CUST(totally 80 columns).
My question is why sql optimizer uses Table Spool when I specific columns in SELECT clause rather than '*',and why is using Table Spool one executes slower?
My question is about sql-servertable-spool
In the slower query you are limiting your result set to specific columns. Since this is an un-indexed un constrained table the optimizer is creating a temporary table from the original table scan with only the specific columns required. It is then running through the nested loop operator on the temporary table. When it knows its going to need every column on the table (Select *) it can run the nested loop operator directly off the table scan because the result set of the scan will be joined in full to the top table.
Outside of that your query has a couple other possible problems:
LEFT JOIN TBSFA_DAT_CUST C ON A.CUST_ID = B.CUST_ID
you aren't joining to anything here, you are joining the entire table to every record. Did mean a.cust_id = c.cust_id or b.cust_id = c.cust_id or a.cust_id = c.cust_id and b.cust_id = c.cust_id?
Also, this function in the where clause is pointless and can degrade performance:
WHERE ISNULL(A.BATCH_ID, '') != ''
change it to:
WHERE A.BATCH_ID is not null and A.Batch_ID <> ''

RIGHT\LEFT Join does not provide null values without condition

I have two tables one is the lookup table and the other is the data table. The lookup table has columns named cycleid, cycle. The data table has SID, cycleid, cycle. Below is the structure of the tables.
If you check the data table, the SID may have all the cycles and may not have all the cycles. I want to output the SID completed as well as missed cycles.
I right joined the lookup table and retrieved the missing as well as completed cycles. Below is the query I used.
SELECT TOP 1000 [SID]
,s4.[CYCLE]
,s4.[CYCLEID]
FROM [dbo].[data] s3 RIGHT JOIN
[dbo].[lookup_data] s4 ON s3.CYCLEID = s4.CYCLEID
The query is not displaying me the missed values when I query for all the SID's. When I specifically query for a SID with the below query i am getting the correct result including the missed ones.
SELECT TOP 1000 [SID]
,s4.[CYCLE]
,s4.[CYCLEID]
FROM [dbo].[data] s3 RIGHT JOIN [dbo].[lookup_data] s4
ON s3.CYCLEID = s4.CYCLEID
AND s3.SID = 101002
ORDER BY [SID], s4.[CYCLEID]
As I am supplying this query into tableau I cannot provide the sid value in the query. I want to return all the sid's and from tableau I will be do the rest of the things.
The expected output that i need is as shown below.
I wrote a cross join query like below to acheive my expected output
SELECT DISTINCT
tab.CYCLEID
,tab.SID
,d.CYCLE
FROM ( SELECT d.SID
,d.[CYCLE]
,e.CYCLEID
FROM ( SELECT e.sid
,e.CYCLE
FROM [db_temp].[dbo].[Sheet3$] e
) d
CROSS JOIN [db_temp].[dbo].[Sheet4$] e
) tab
LEFT OUTER JOIN [db_temp].[dbo].[Sheet3$] d
ON d.CYCLEID = tab.CYCLEID
AND d.SID = tab.SID
ORDER BY tab.SID
,tab.CYCLEID;
However I am not able to use this query for more scenarios as my data set have nearly 20 to 40 columns and i am having issues when i use the above one.
Is there any way to do this in a simpler manner with only left or right join itself? I want the query to return all the missing values and the completed values for the all the SID's instead of supplying a single sid in the query.
You can create a master table first (combine all SID and CYCLE ID), then right join with the data table
;with ctxMaster as (
select distinct d.SID, l.CYCLE, l.CYCLEID
from lookup_data l
cross join data d
)
select d.SID, m.CYCLE, m.CYCLEID
from ctxMaster m
left join data d on m.SID = d.SID and m.CYCLEID = d.CYCLEID
order by m.SID, m.CYCLEID
Fiddle
Or if you don't want to use common table expression, subquery version:
select d.SID, m.CYCLE, m.CYCLEID
from (select distinct d.SID, l.CYCLE, l.CYCLEID
from lookup_data l
cross join data d) m
left join data d on m.SID = d.SID and m.CYCLEID = d.CYCLEID
order by m.SID, m.CYCLEID

Why do I have duplicate records in my JOIN

I am retrieving data from table ProductionReportMetrics where I have column NetRate_QuoteID. Then to that result set I need to get Description column.
And in order to get a Description column, I need to join 3 tables:
NetRate_Quote_Insur_Quote
NetRate_Quote_Insur_Quote_Locat
NetRate_Quote_Insur_Quote_Locat_Liabi
But after that my premium is completely off.
What am I doing wrong here?
SELECT QLL.Description,
QLL.ClassCode,
prm.NetRate_QuoteID,
QL.LocationID,
ISNULL(SUM(premium),0) AS NetWrittenPremium,
MONTH(prm.EffectiveDate) AS EffMonth
FROM ProductionReportMetrics prm
LEFT JOIN NetRate_Quote_Insur_Quote Q
ON prm.NetRate_QuoteID = Q.QuoteID
INNER JOIN NetRate_Quote_Insur_Quote_Locat QL
ON Q.QuoteID = QL.QuoteID
INNER JOIN NetRate_Quote_Insur_Quote_Locat_Liabi QLL
ON QL.LocationID = QLL.LocationID
WHERE YEAR(prm.EffectiveDate) = 2016 AND
CompanyLine = 'Ironshore Insurance Company'
GROUP BY MONTH(prm.EffectiveDate),
QLL.Description,
QLL.ClassCode,
prm.NetRate_QuoteID,
QL.LocationID
I think the problem in this table:
What Am I missing in this Query?
select
ClassCode,
QLL.Description,
sum(Premium)
from ProductionReportMetrics prm
LEFT JOIN NetRate_Quote_Insur_Quote Q ON prm.NetRate_QuoteID = Q.QuoteID
LEFT JOIN NetRate_Quote_Insur_Quote_Locat QL ON Q.QuoteID = QL.QuoteID
LEFT JOIN
(SELECT * FROM NetRate_Quote_Insur_Quote_Locat_Liabi nqI
JOIN ( SELECT LocationID, MAX(ClassCode)
FROM NetRate_Quote_Insur_Quote_Locat_Liabi GROUP BY LocationID ) nqA
ON nqA.LocationID = nqI.LocationID ) QLL ON QLL.LocationID = QL.LocationID
where Year(prm.EffectiveDate) = 2016 AND CompanyLine = 'Ironshore Insurance Company'
GROUP BY Q.QuoteID,QL.QuoteID,QL.LocationID
Now it says
Msg 8156, Level 16, State 1, Line 14
The column 'LocationID' was specified multiple times for 'QLL'.
It looks like DVT basically hit on the answer. The only reason you would get different amounts(i.e. duplicated rows) as a result of a join is that one of the joined tables is not a 1:1 relationship with the primary table.
I would suggest you do a quick check against those tables, looking for table counts.
--this should be your baseline count
SELECT COUNT(*)
FROM ProductionReportMetrics
GROUP BY MONTH(prm.EffectiveDate),
prm.NetRate_QuoteID
--this will be a check against the first joined table.
SELECT COUNT(*)
FROM NetRate_Quote_Insur_Quote Q
WHERE QuoteID IN
(SELECT NetRate_QuoteID
FROM ProductionReportMetrics
GROUP BY MONTH(prm.EffectiveDate),
prm.NetRate_QuoteID)
Basically you will want to do a similar check against each of your joined tables. If any of the joined tables are part of the grouping statement, make sure they are also in the grouping of the count check statement. Also make sure to alter the WHERE clause of the check count statement to use the join clause columns you were using.
Once you find a table that returns the incorrect number of rows, you will have your answer as to what table is causing the problem. Then you will just have to decide how to limit that table down to distinct rows(some type of aggregation).
This advice is really just to show you how to QA this particular query. Break it up into the smallest possible parts. In this case, we know that it is a join that is causing the problem, so take it one join at a time until you find the offender.

When a SQL Server query returns no rows(NOT null rows) how do you include that in an aggregate function?

I'm writing a query to look for courses that do not have any of its gradable items graded.
In Blackboard when a user doesn't have a grade at all(e.g. no attempt was ever made) there simply isn't a row in the table gradebook_grade
So if a course doesn't have any grades at all the gradebook_grade table does not have any rows referencing the primary key of the Blackboard course_id
This is what I've used so far:
use bblearn
select cm.course_id
from course_main cm
join gradebook_main gbm on cm.pk1 = gbm.crsmain_pk1
join gradebook_grade gbg on gbm.pk1 = gbg.gradebook_main_pk1
where cm.pk1 = 3947
group by cm.course_id
having count(gbg.pk1) <= 0
The course in question(pk1 3947) is confirmed to not have any grades. So SQL Server says 0 rows affected, naturally.
The thing is, it doesn't select the course_id. I'm guessing having doesn't account for blank/non-existent rows. Is there a way to structure the query to get the course ID when there isn't anything returned? Am I joining on the wrong columns or using where on the wrong column? Any help is appreciated.
Use a left join
select cm.course_id
from course_main cm
left join gradebook_main gbm on cm.pk1 = gbm.crsmain_pk1
left join gradebook_grade gbg on gbm.pk1 = gbg.gradebook_main_pk1
where cm.pk1 = 3947
group by cm.course_id

Why would SQL Server choose Clustered Index Scan over Non-Clustered one?

In one of the tables I am querying, a clustered index was created over a key that's not a primary key. (I don't know why.)
However, there's a non-clustered index for the primary key for this table.
In the execution plan, SQL is choosing the clustered index, rather than the non-clustered index for the primary key which I am using in my query.
Is there a reason why SQL would do this? How can I force SQL to choose the non-clustered index instead?
Appending more detail:
The table has many fields and the query contains many joins. Let me abstract it a bit.
The table definition looks like this:
SlowTable
[SlowTable_id] [int] IDENTITY(200000000,1) NOT NULL,
[fk1Field] [int] NULL,
[fk2Field] [int] NULL,
[other1Field] [varchar] NULL,
etc. etc...
and then the indices for this table are:
fk1Field (Clustered)
SlowTable_id (Non-Unique, Non-Clustered)
fk2Field (Non-Unique, Non-Clustered)
... and 14 other Non-Unique, Non-Clustered indices on other fields
Presumably there are lots more queries made against fk1Field which is why they selected this as the basis for the Clustered index.
The query I have uses a view:
SELECT
[field list]
FROM
SourceTable1 S1
INNER JOIN SourceTable2 S2
ON S2.S2_id = S1.S2_id
INNER JOIN SourceTable3 S3
ON S3.S3_id = S2.S3_id
INNER JOIN SlowTable ST
ON ST.SlowTable_id = S1.SlowTable_id
INNER JOIN [many other tables, around 7 more...]
The execution plan is quite big, with the nodes concerned say
Hash Match
(Inner Join)
Cost: 9%
with a thick arrow pointing to
Clustered Index Scan (Clustered)
SlowTable.fk1Field
Cost: 77%
I hope this provides enough detail on the issue.
Thanks!
ADDENDUM 2:
Correction to my previous post. The view doesn't have a where clause. It is just a series of inner joins. The execution plan was taken from an Insert statement that uses the View (listed as SLOW_VIEW) in a complex query that looks like the following:
(What this stored procedure does is to do a proportional split of the total amount of some records, based on weights, computed as percentage against a total. This mimics distributing a value from, say, one account, to other accounts.)
INSERT INTO dbo.WDTD(
FieldA,
FieldB,
GWB_id,
C_id,
FieldC,
PG_id,
FieldD,
FieldE,
O_id,
FieldF,
FieldG,
FieldH,
FieldI,
GWBIH_id,
T_id,
JO_id,
PC_id,
PP_id,
FieldJ,
FieldK,
FieldL,
FieldM,
FieldN,
FieldO,
FieldP,
FieldQ,
FieldS)
SELECT DISTINCT
#FieldA FieldA,
GETDATE() FieldB,
#Parameter1 GWB_id,
GWBIH.C_id C_id,
P.FieldT FieldC,
P.PG_id PG_id,
PAM.FieldD FieldD,
PP.FieldU FieldE,
GWBIH.O_id O_id,
CO.FieldF FieldF,
CO.FieldG FieldG,
PSAM.FieldH FieldH,
PSAM.FieldI FieldI,
SOURCE.GWBIH_id GWBIH_id,
' ' T_id,
GWBIH.JO_id JO_id,
SOURCE.PC_id PC_id,
GWB.PP_id,
SOURCE.FieldJ FieldJ,
1 FieldK,
ROUND((SUM(GWBIH.Total) / AGG.Total) * SOURCE.Total, 2) FieldL,
ROUND((SUM(GWBIH.Total) / AGG.Total) * SOURCE.Total, 2) FieldM,
0 FieldN,
' ' FieldO,
ESGM.FieldP_flag FieldP,
SOURCE.FieldQ FieldQ,
'[UNPROCESSED]'
FROM
dbo.Table1 GWBIH
INNER JOIN dbo.Table2 GWBPH
ON GWBPH.GWBP_id = GWBIH.GWBP_id
INNER JOIN dbo.Table3 GWB
ON GWB.GWB_id = GWBPH.GWB_id
INNER JOIN dbo.Table4 P
ON P.P_id = GWBPH.P_id
INNER JOIN dbo.Table5 ESGM
ON ESGM.ET_id = P.ET_id
INNER JOIN dbo.Table6 PAM
ON PAM.PG_id = P.PG_id
INNER JOIN dbo.Table7 O
ON O.dboffcode = GWBIH.O_id
INNER JOIN dbo.Table8 CO
ON
CO.Country_id = O.Country_id
AND CO.Brand_id = O.Brand_id
INNER JOIN dbo.Table9 PSAM
ON PSAM.Office_id = GWBIH.O_id
INNER JOIN dbo.Table10 PCM
ON PCM.PC_id = GWBIH.PC_id
INNER JOIN dbo.Table11 PC
ON PC.PC_id = GWBIH.PC_id
INNER JOIN dbo.Table12 PP
ON PP.PP_id = GWB.PP_id
-- THIS IS THE VIEW THAT CONTAINS THE CLUSTERED INDEX SCAN
INNER JOIN dbo.SLOW_VIEW GL
ON GL.JO_id = GWBIH.JO_id
INNER JOIN (
SELECT
GWBIH.C_id C_id,
GWBPH.GWB_id,
SUM(GWBIH.Total) Total
FROM
dbo.Table1 GWBIH
INNER JOIN dbo.Table2 GWBPH
ON GWBPH.GWBP_id = GWBIH.GWBP_id
INNER JOIN dbo.Table10 PCM
ON PCM.PC_id = GWBIH.PC_id
WHERE
PCM.Split_flag = 0
AND GWBIH.JO_id IS NOT NULL
GROUP BY
GWBIH.C_id,
GWBPH.GWB_id
) AGG
ON AGG.C_id = GWBIH.C_id
AND AGG.GWB_id = GWBPH.GWB_id
INNER JOIN (
SELECT
GWBIH.GWBIH_id GWBIH_id,
GWBIH.C_id C_id,
GWBIH.FieldQ FieldQ,
GWBP.GWB_id GWB_id,
PCM.PC_id PC_id,
CASE
WHEN WT.FieldS IS NOT NULL
THEN WT.FieldS
WHEN WT.FieldS IS NULL
THEN PCMS.FieldT
END FieldJ,
SUM(GWBIH.Total) Total
FROM
dbo.Table1 GWBIH
INNER JOIN dbo.Table2 GWBP
ON GWBP.GWBP_id = GWBIH.GWBP_id
INNER JOIN dbo.Table4 P
ON P.P_id = GWBP.P_id
INNER JOIN dbo.Table10 PCM
ON PCM.PC_id = GWBIH.PC_id
INNER JOIN dbo.Table11 PCMS
ON PCMS.PC_id = PCM.PC_id
LEFT JOIN dbo.WT WT
ON WT.ET_id = P.ET_id
AND WT.PC_id = GWBIH.PC_id
WHERE
PCM.Split_flag = 1
GROUP BY
GWBIH.GWBI_id,
GWBIH.C_id,
GWBIH.FieldQ,
GWBP.GWB_id,
WT.FieldS,
PCM.PC_id,
PCMS.ImportCode
) SOURCE
ON SOURCE.C_id = GWBIH.C_id
AND SOURCE.GWB_id = GWBPH.GWB_id
WHERE
PCM.Split_flag = 0
AND AGG.Total > 0
AND GWBPH.GWB_id = #Parameter1
AND NOT EXISTS (
SELECT *
FROM dbo.WDTD
WHERE
TD.C_id = GWBIH.C_id
AND TD.FieldA = GWBPH.GWB_id
AND TD.JO_id = GWBIH.JO_id
AND TD.PC_id = SOURCE.PC_id
AND TD.GWBIH_id = ' ')
GROUP BY
GWBIH.C_id,
P.FieldT,
GWBIH.JO_id,
GWBIH.O_id,
GWBPH.GWB_id,
P.PG_id,
PAM.FieldD,
PP.FieldU,
GWBIH.O_id,
CO.FieldF,
CO.FieldG,
PSAM.FieldH,
PSAM.FieldI,
GWBIH.JO_id,
SOURCE.PC_id,
GWB.PP_id,
SOURCE.FieldJ,
ESGM.FieldP_flag,
SOURCE.GWBIH_id,
SOURCE.FieldQ,
AGG.Total,
SOURCE.Total
ADDENDUM 3: When doing an execution plan on the select statement of the view, I see this:
Hash Match <==== Bitmap <------ etc...
(Inner Join) (Bitmap Create)
Cost: 0% Cost: 0%
^
|
|
Parallelism Clustered Index Scan (Clustered)
(Repartition Streams) <==== Slow_Table.fk1Field
Cost: 1% Cost: 98%
ADDENDUM 4: I think I found the problem. The Clustered Index Scan isn't referring to my clause that references the Primary Key, but rather another clause that needs a field that is, in some way, related to fk1Field above.
Most likely one of:
too many rows to make the index effective
index doesn't fit the ON/WHERE conditions
index isn't covering and SQL Server avoids a key lookup
Edit, after update:
Your indexes are useless because they are all single column indexes, so it does a clustered index scan.
You need an index that matches your ON, WHERE, GROUP BY conditions with INCLUDES for your SELECT list.
If the query you're executing isn't selecting a small subset of the records, SQL Server may well choose to ignore any "otherwise useful" non-clustered index and just scan through the clustered index (in this instance, most likely all rows in the table) - the logic being that the amount of I/O required to perform the query vs. the non-clustered index outweights that required for a full scan.
If you can post the schema of your table(s) + a sample query, I'm sure we can offer more information.
Ideally you shouldn't be telling SQL Server to do either or, it can pick the best, if you give it a good query. Query hints was created to steer the engine a bit, but you shouldn't have to use this just yet.
Sometimes it is beneficial to cluster the table differently that the primary key, is rare, but it can be useful (the clustering controls the data layout while the primary key ensures correctness).
I can tell you exactly why SQL Server picks the clustered index if you show me your query and schema otherwise I'd only be guessing on likely causes and execution plan is helpful in these cases.
For a non-clustered index to be considered it has to be meaningful to the query and if you non-clustered index doesn't cover your query, there's no guaratee that it will be used at all.
A clustered index scan is essentially a table scan (on a table that happens to have a clustered index). You really should post your statement to get a better answer. Your where clause may not be searchable (see sargs), or if you are selecting many records, sql server may scan the table rather than use the index and later have to look up related columns.

Resources