Below query execution takes 45 secs. How to improve the speed?
SELECT
u.i_UserID,
u.vch_LoginName,
ea.vch_EmailAddress,
u.vch_DisplayName,
'Current' As 'vch_RecordStatus'
FROM
tblUser u
LEFT JOIN [User].dbo.tblEmailAddress ea
ON u.i_EmailAddressID = ea.i_EmailAddressID
WHERE
IsNull(u.vch_LoginName, '') Like '%'
AND u.vch_DisplayName LIKE 'kala' -- 14 secs
UNION ALL -- 29 secs
SELECT
DISTINCT l.i_UserID
,l.vch_OldLoginName as vch_LoginName
,l.vch_OldEmailAddress as vch_EmailAddress
,u.vch_DisplayName as vch_DisplayName,
'Old' As 'vch_RecordStatus'
FROM tblUserStatusLog l
INNER JOIN tblUser u on u.i_UserID = l.i_UserID
WHERE
l.vch_OldLoginName Like '%'
AND u.vch_DisplayName LIKE 'kala' -- 28 secs
ORDER BY
u.vch_LoginName
First of all, why are you using LIKE to compare strings? LIKE is used to compare partially strings, you should use = operator to search for exact match.
I also don't understand the first condition on both of the where clauses, are you trying to check they are not empty strings and not null? better of use IS NOT NULL and <> operator:
SELECT
u.i_UserID,
u.vch_LoginName,
ea.vch_EmailAddress,
u.vch_DisplayName,
'Current' As 'vch_RecordStatus'
FROM
tblUser u
LEFT JOIN [User].dbo.tblEmailAddress ea ON u.i_EmailAddressID = ea.i_EmailAddressID
WHERE
u.vch_LoginName is not null and u.vch_loginname <> ''
AND u.vch_DisplayName = 'kala'
UNION ALL
SELECT
DISTINCT l.i_UserID
,l.vch_OldLoginName as vch_LoginName
,l.vch_OldEmailAddress as vch_EmailAddress
,u.vch_DisplayName as vch_DisplayName,
'Old' As 'vch_RecordStatus'
FROM tblUserStatusLog l
INNER JOIN tblUser u on u.i_UserID = l.i_UserID
WHERE
l.vch_OldLoginName <> ''
AND u.vch_DisplayName = 'kala'
ORDER BY
u.vch_LoginName
Other then that, I assume you can significantly improve the performance by adding the proper indexes to the tables, but we will need your table structures for that to see what already exists .
But you should have indexes on i_EmailAddressID on first two tables, i_userId on the second ones , and an index on vch_OldLoginName,vch_displayName might also improve the run.
You'd want to take a look at indexing. When you run this query with the actual execution plan switched on do you get any missing query hints? Make sure all indexes are optimised for this query and you'll get much better performance.
You're including the full path for tblEmailAddress, is this stored in another database called 'User'? If so, you'll want to reduce the amount of data you're pulling from the linked server. Again, an index should help you here.
You also don't want to be doing calculations in your where clauses. Check out the SARGability of queries. https://en.wikipedia.org/wiki/Sargable This will tell you why your LIKE statements are a bad idea too.
Related
Update: I will get query plan as soon as I can.
We had a poor performing query that took 4 minutes for a particular organization. After the usual recompiling the stored proc and updating statistics didn't help, we re-wrote the if Exists(...) to a select count(*)... and the stored procedure when from 4 minutes to 70 milliseconds. What is the problem with the conditional that makes a 70 ms query take 4 minutes? See the examples
These all take 4+ minutes:
if (
SELECT COUNT(*)
FROM ObservationOrganism omo
JOIN Observation om ON om.ObservationID = omo.ObservationMicID
JOIN Organism o ON o.OrganismID = omo.OrganismID
JOIN ObservationMicDrug omd ON omd.ObservationOrganismID = omo.ObservationOrganismID
JOIN SIRN srn ON srn.SIRNID = omd.SIRNID
JOIN OrganismDrug od ON od.OrganismDrugID = omd.OrganismDrugID
WHERE
om.StatusCode IN ('F', 'C')
AND o.OrganismGroupID <> -1
AND od.OrganismDrugGroupID <> -1
AND (om.LabType <> 'screen' OR om.LabType IS NULL)) > 0
print 'records';
-
IF (EXISTS(
SELECT *
FROM ObservationOrganism omo
JOIN Observation om ON om.ObservationID = omo.ObservationMicID
JOIN Organism o ON o.OrganismID = omo.OrganismID
JOIN ObservationMicDrug omd ON omd.ObservationOrganismID = omo.ObservationOrganismID
JOIN SIRN srn ON srn.SIRNID = omd.SIRNID
JOIN OrganismDrug od ON od.OrganismDrugID = omd.OrganismDrugID
WHERE
om.StatusCode IN ('F', 'C')
AND o.OrganismGroupID <> -1
AND od.OrganismDrugGroupID <> -1
AND (om.LabType <> 'screen' OR om.LabType IS NULL))
print 'records'
This all take 70 milliseconds:
Declare #recordCount INT;
SELECT #recordCount = COUNT(*)
FROM ObservationOrganism omo
JOIN Observation om ON om.ObservationID = omo.ObservationMicID
JOIN Organism o ON o.OrganismID = omo.OrganismID
JOIN ObservationMicDrug omd ON omd.ObservationOrganismID = omo.ObservationOrganismID
JOIN SIRN srn ON srn.SIRNID = omd.SIRNID
JOIN OrganismDrug od ON od.OrganismDrugID = omd.OrganismDrugID
WHERE
om.StatusCode IN ('F', 'C')
AND o.OrganismGroupID <> -1
AND od.OrganismDrugGroupID <> -1
AND (om.LabType <> 'screen' OR om.LabType IS NULL);
IF(#recordCount > 0)
print 'records';
It doesn't make sense to me why moving the exact same Count(*) query into an if statement causes such degradation or why 'Exists' is slower than Count. I even tried the exists() in a select CASE WHEN Exists() and it is still 4+ minutes.
Given that my previous answer was mentioned, I'll try to explain again because these things are pretty tricky. So yes, I think you're seeing the same problem as the other question. Namely a row goal issue.
So to try and explain what's causing this I'll start with the three types of joins that are at the disposal of the engine (and pretty much categorically): Loop Joins, Merge Joins, Hash Joins. Loop joins are what they sound like, a nested loop over both sets of data. Merge Joins take two sorted lists and move through them in lock-step. And Hash joins throw everything in the smaller set into a filing cabinet and then look for items in the larger set once the filing cabinet has been filled.
So performance wise, loop joins require pretty much no set up and if you're only looking for a small amount of data they're really optimal. Merge are the best of the best as far as join performance for any data size, but require data to be already sorted (which is rare). Hash Joins require a fair amount of setup but allow large data sets to be joined quickly.
Now we get to your query and the difference between COUNT(*) and EXISTS/TOP 1. So the behavior you're seeing is that the optimizer thinks that rows of this query are really likely (you can confirm this by planning the query without grouping and seeing how many records it thinks it will get in the last step). In particular it probably thinks that for some table in that query, every record in that table will appear in the output.
"Eureka!" it says, "if every row in this table ends up in the output, to find if one exists I can do the really cheap start-up loop join throughout because even though it's slow for large data sets, I only need one row." But then it doesn't find that row. And doesn't find it again. And now it's iterating through a vast set of data using the least efficient means at its disposal for weeding through large sets of data.
By comparison, if you ask for the full count of data, it has to find every record by definition. It sees a vast set of data and picks the choices that are best for iterating through that entire set of data instead of just a tiny sliver of it.
If, on the other hand, it really was correct and the records were very well correlated it would have found your record with the smallest possible amount of server resources and maximized its overall throughput.
I have a report with 3 sub reports and several queries to optimize in each. The first has several OR clauses in the WHERE branch and the OR's are filtering through IN options which are pulling sub-queries.
I say this mostly from reading this SO post. Specifically LBushkin's second point.
I'm not the greatest at TSQL but I know enough to think this is very inefficient. I think I need to do two things.
I know I need to add indexes to the tables involved.
I think the query can be greatly enhanced.
So it seems that my first step would be to improve the query. From there I can look at what columns and tables are involved and thus determine the indexes.
At this point I haven't posted table schemas as I'm looking more for options / considerations such as using a cte to replace all the IN sub-queries.
If needed I will definitely post whatever would be helpful such as physical reads etc.
SELECT DISTINCT
auth.icm_authorizationid,
auth.icm_documentnumber
FROM
Filteredicm_servicecost AS servicecost
INNER JOIN Filteredicm_authorization AS auth ON
auth.icm_authorizationid = servicecost.icm_authorizationid
INNER JOIN Filteredicm_service AS service ON
service.icm_serviceid = servicecost.icm_serviceid
INNER JOIN Filteredicm_case AS cases ON
service.icm_caseid = cases.icm_caseid
WHERE
(cases.icm_caseid IN
(SELECT icm_caseid FROM Filteredicm_case AS CRMAF_Filteredicm_case))
OR (service.icm_serviceid IN
(SELECT icm_serviceid FROM Filteredicm_service AS CRMAF_Filteredicm_service))
OR (servicecost.icm_servicecostid IN
(SELECT icm_servicecostid FROM Filteredicm_servicecost AS CRMAF_Filteredicm_servicecost))
OR (auth.icm_authorizationid IN
(SELECT icm_authorizationid FROM Filteredicm_authorization AS CRMAF_Filteredicm_authorization))
EXISTS is usually much faster than IN as the query engine is able to optimize it better.
Try this:
WHERE EXISTS (SELECT 1 FROM FROM Filteredicm_case WHERE icm_caseid = cases.icm_caseid)
OR EXISTS (SELECT 1 FROM Filteredicm_service WHERE icm_serviceid = service.icm_serviceid)
OR EXISTS (SELECT 1 FROM Filteredicm_servicecost WHERE icm_servicecostid = servicecost.icm_servicecostid)
OR EXISTS (SELECT 1 FROM Filteredicm_authorization WHERE icm_authorizationid = auth.icm_authorizationid)
Furthermore, an index on Filteredicm_case.icm_caseid, an index on Filteredicm_service.icm_serviceid, an index on Filteredicm_servicecost.icm_servicecostid, and an index on Filteredicm_authorization.icm_authorizationid will increase performance of this query. They look like they should be keys already, however, so I suspect that indices already exist.
However, unless I'm misreading, there's no way this WHERE clause will ever evaluate to anything other than true.
The clause you wrote says WHERE cases.icm_caseid IN (SELECT icm_caseid FROM Filteredicm_case AS CRMAF_Filteredicm_case). However, cases is an alias to Filteredicm_case. That's the equivalent of WHERE Filteredicm_case.icm_caseid IN (SELECT icm_caseid FROM Filteredicm_case AS CRMAF_Filteredicm_case). That will be true as long as Filteredicm_case.icm_caseid isn't NULL.
The same error in logic exists for the remaining portions in the WHERE clause:
(service.icm_serviceid IN (SELECT icm_serviceid FROM Filteredicm_service AS CRMAF_Filteredicm_service))
service is an alias for Filteredicm_service. This is always true as long as icm_serviceid is not null
(servicecost.icm_servicecostid IN (SELECT icm_servicecostid FROM Filteredicm_servicecost AS CRMAF_Filteredicm_servicecost))
servicecost is an alias for Filteredicm_servicecost. This is always true as long as icm_servicecostid is not null.
(auth.icm_authorizationid IN (SELECT icm_authorizationid FROM Filteredicm_authorization AS CRMAF_Filteredicm_authorization))
auth is an alias for Filteredicm_authorization. This is always true as long as icm_authorizationid is not null.
I don't understand what you're trying to accomplish.
I am currently running into some performance issues when running a query which joins multiple tables. The main table has 170 million records, so it is pretty big.
What I encounter is that when I run the query with a top 1000 clause, the results are instantaneous. However, when I increase that to top 8000 the query easily runs for 15 minutes (and then I kill it). Through trial and error I found that the tipping point is with Top 7934 (works like a charm) and Top 7935 (Runs for ever)
Does someone recognise this behaviour and sees what I am doing wrong? Maybe my Query is faulty in some respects.
Thanks in advance
SELECT top 7934 h.DocIDBeg
,h.[Updated By]
,h.Action
,h.Type
,h.Details
,h.[Update Date]
,h.[Updated Field Name]
,i.Name AS 'Value Set To'
,COALESCE(i.Name,'') + COALESCE(h.NewValue, '') As 'Value Set To'
,h.OldValue
FROM
(SELECT g.DocIDBeg
,g.[Updated By]
,g.Action
,g.Type
,g.Details
,g.[Update Date]
,CAST(g.details as XML).value('auditElement[1]/field[1]/#name','nvarchar(max)') as 'Updated Field Name'
,CAST(g.details as XML).value('(/auditElement//field/setChoice/node())[1]','nvarchar(max)') as 'value'
,CAST(g.details as XML).value('(/auditElement//field/newValue/node())[1]','nvarchar(max)') as 'NewValue'
,CAST(g.details as XML).value('(/auditElement//field/oldValue/node())[1]','nvarchar(max)') as 'OldValue'
FROM(
SELECT a.ArtifactID
,f.DocIDBeg
,b.FullName AS 'Updated By'
,c.Action
,e.ArtifactType AS 'Type'
,a.Details
,a.TimeStamp AS 'Update Date'
FROM [EDDS1015272].[EDDSDBO].[AuditRecord] a
LEFT JOIN [EDDS1015272].[EDDSDBO].AuditUser b
ON a.UserID = b.UserID
LEFT JOIN [EDDS1015272].[EDDSDBO].AuditAction c
ON a.Action = c.AuditActionID
LEFT JOIN [EDDS1015272].[EDDSDBO].[Artifact] d
ON a.ArtifactID = d.ArtifactID
LEFT JOIN [EDDS1015272].[EDDSDBO].[ArtifactType] e
ON d.ArtifactTypeID = e.ArtifactTypeID
INNER JOIN [EDDS1015272].[EDDSDBO].[Document] f
ON a.ArtifactID = f.ArtifactID
) g
) h
LEFT JOIN [EDDS1015272].[EDDSDBO].[Code] i
ON h.value = i.ArtifactID
I used to work with data warehouses a lot and encountered similar problems quite often. The root cause is obviously in memory usage like it was already mentioned here. I don't think that rewriting your query will help a lot if you really need to query all 170 million records and I don't think that it is OK for you to wait for more memory resources.
So here is just a simple workaround from me:
Try to split your query. For example, first query all data you need from AuditRecord record table joined to AuditUser table and store the result in another(temporary table for example) table. Then join this new table with Artifact table and so on. In this case this steps will require less memory one by one then running the whole query and have it hung out. So in the long run you will have not a query but a scrip which will be easy to track as you can print out some statuses in the console and which will do his job unlike the query which never ends
Also make sure that you really need to query all this data at once, because I can think of no use cases why you need it, but still if it is an application then you should implement paging, if it is some export functionality then maybe there is a timeline you can use to batch data. For example to export data on a daily basis and query only the data from yersterday. In this case you will come up with an incremental export.
"Through trial and error I found that the tipping point is with Top 7934 (works like a charm) and Top 7935 (Runs for ever)"
This sounds very much like a spill. Adam Mechanic does a nice demo of the internals of this in the video below. Basically the top forces a sort which requires memory. If the memory grant is not big enough to complete the operation, some of it gets done on disk.
https://www.youtube.com/watch?v=j5YGdIk3DXw
Go to 1:03:50 to see Adam demo a spill. In his query, 668,935 rows do not spill but 668,936 rows do and the query time more than doubles.
Watch the whole session if you have time. Very good for performance tuning!
Could also be the tipping point, as #Remus suggested, but it's all guessing without knowing the actual plan.
i think the subselects are forcing the server to fetch all before the filter can be applied
this will couse more memory usage (xlm fields) and make it hard to use a decent qry plan
as to the strange top behavior: top has a big influence on qry plan generation.
it is possible that the 7935 is a cutoff point for 1 optimal plan and that sql server will choose a different path when it needs to fetch more.
or it could go back to the memory and run out of mem on 7935
update:
i reworked your qry to eliminate the nested selects, i'm not saying its now going to be that mutch faster but it eliminates some fields that werent used and it should be easyer to understand and optimize based on the qry plan.
since we don't now the exact size of each table and we can hardly run the qry to test its impossible to give you the best answer. but i could try some tips:
1 step would be to check if you need all the left joins and turn them into inner if it is not needed ex: AuditUser, an AuditRecord could always have a user?
an other thing you could try is to put the data of preferably the smaller tables in a tmp table and join the bigger tables to that tmp table, possible eliminating a lot of records to join
if possible you could denormalize a bit and for example put the username in the auditrecord 2 so you would eliminate the join on AuditUser alltogether
but it is up to wat you need wat you can/are allowed to and the data/server
SELECT top 7934 f.DocIDBeg
,b.FullName AS 'Updated By'
,c.Action
,e.ArtifactType AS 'Type'
,a.Details
,a.TimeStamp AS 'Update Date'
,CAST(a.Details as XML).value('auditElement[1]/field[1]/#name','nvarchar(max)') as 'Updated Field Name'
,i.Name AS 'Value Set To'
,COALESCE(i.Name,'') + COALESCE(CAST(a.Details as XML).value('(/auditElement//field/newValue/node())[1]','nvarchar(max)') as 'NewValue', '') As 'Value Set To'
,CAST(a.Details as XML).value('(/auditElement//field/oldValue/node())[1]','nvarchar(max)') as 'OldValue'
FROM [EDDS1015272].[EDDSDBO].[AuditRecord] a
LEFT JOIN [EDDS1015272].[EDDSDBO].AuditUser b
ON a.UserID = b.UserID
LEFT JOIN [EDDS1015272].[EDDSDBO].AuditAction c
ON a.Action = c.AuditActionID
LEFT JOIN [EDDS1015272].[EDDSDBO].[Artifact] d
ON a.ArtifactID = d.ArtifactID
LEFT JOIN [EDDS1015272].[EDDSDBO].[ArtifactType] e
ON d.ArtifactTypeID = e.ArtifactTypeID
INNER JOIN [EDDS1015272].[EDDSDBO].[Document] f
ON a.ArtifactID = f.ArtifactID
LEFT JOIN [EDDS1015272].[EDDSDBO].[Code] i
ON CAST(a.details as XML).value('(/auditElement//field/setChoice/node())[1]','nvarchar(max)') = i.ArtifactID
During query optimization I encounted a strange behaviour of sql server (Sql Server 2008 R2 Enterprise). I created several indexes on tables, as well as some indexed views. I have two queries, for example:
select top 10 N0."Oid",N1."ObjectType",N1."OptimisticLockField" from ((("dbo"."Issue" N0
inner join "dbo"."Article" N1 on (N0."Oid" = N1."Oid"))
inner join "dbo"."ProductLink" N2 on (N1."ProductLink" = N2."Oid"))
inner join "dbo"."Technology" N3 on (N2."Technology" = N3."Oid"))
where (N1."GCRecord" is null and (N0."IsPrivate" = 0) and ((N0."HasMarkedAnswers" = 0) or N0."HasMarkedAnswers" is null) and (N3."Name" = N'Discussions'))
order by N1."ModifiedOn" desc
and
select top 30 N0."Oid",N1."ObjectType",N1."OptimisticLockField" from ((("dbo"."Issue" N0
inner join "dbo"."Article" N1 on (N0."Oid" = N1."Oid"))
inner join "dbo"."ProductLink" N2 on (N1."ProductLink" = N2."Oid"))
inner join "dbo"."Technology" N3 on (N2."Technology" = N3."Oid"))
where (N1."GCRecord" is null and (N0."IsPrivate" = 0) and ((N0."HasMarkedAnswers" = 0) or N0."HasMarkedAnswers" is null) and (N3."Name" = N'Discussions'))
order by N1."ModifiedOn" desc
both queries are the same, except first starts with select top 10 and second with select top 30. Both queries returns the same result set - 6 rows. But the second query is 5 times faster then the first one! I looked at the actual execution plans for both queries, and of course, they differs. Second query uses indexed view, and performs great, and the first query denies to use it, using indexes on tables instead. I repeat myself - both queries are the same, to the same table, at the same server, they differs only by number in "top" part.
I tried to force optimizer to use indexed view in the first query by updating statistics, destroing indexes it used and so on. No matter how I try actual execution do not use indexed view for the first query and always use it for the second one.
I am really intrested in the reasons causing such behavior. Any suggestions?
Update I am not sure that it can help without decribing corresponding indexes and view, but this is actual execution plan diagramms:
for select top 19:
for select top 18:
another confusing fact is that for the select top 19 query sometimes indexed view is used, sometimes not
The only thing I can think of is perhaps the optimizer in the first query concluded that the specifying criteria is not selective enough for the "better" execution plan to be used.
If you are still investigating this see if TOP 60, 90, 100, ... produces the second execution plan and performs well. You could also tinker with it to see what the threshold is for the optimizer to select the second plan in this case.
Also try the queries without the order by statement to see if that is affecting the selection of the query plan (check the index on that field, etc)
Beyond that, you said you can't use index hints so perhaps a re-write where you select top X from your Article table (N1) with a bunch of exists statements in your where clause would provide better performance for you.
Don't ask for what, but i need two tables from one SQL query.
Like this...
Select Abc, Dgf from A
and result are two tables
abc
1
1
1
dgf
2
2
2
More details?
Ok lets try.
Now i have sp like this:
SELECT a.* from ActivityView as a with (nolock)
where a.WorkplaceGuid = #WorkplaceGuid
SELECT b.* from ActivityView as a with (nolock)
left join PersonView as b with (nolock) on a.PersonGuid=b.PersonGuid where a.WorkplaceGuid = #WorkplaceGuid
It's cool. But execution time about 22 seconds. I do this because in my programm i have classes that automaticly get data from records set. Class Activity and class Person. That why i can't make it in one recordset. Program didn't parse it.
You can write a stored procedure that has two SELECTs.
SELECT Abc FROM A AS AbcTable;
SELECT Dgf FROM A AS DfgTable;
Depending on your specific requirements, I would consider just submitting two separate queries. I don't see any advantage to combining them.
SQL Server supports legacy COMPUTE BY clause which acts almost like GROUP BY but returns multiple resultsets (the resultsets constituting each group followed by the resultsets with the aggregates):
WITH q AS
(
SELECT 1 AS id
UNION ALL
SELECT 2 AS id
)
SELECT *
FROM q
ORDER BY
id
COMPUTE COUNT(id)
BY id
This, however, is obsolete and is to be removed in the future releases.
Those don't seem to be excessively complicated queries (although select * should in general not be used in production and never when you are doing a join as it needlessly wastes resources sending the value of the joined field twice). Therefore if it is taking 22 seconds, then either you are returning a huge amount of data or you don't have proper indexing.
Have you looked at the execution plans to see what is causing the slowness?