Need help improving query performance - sql-server

I need help with improving the performance of the following SQL query. The database design of this application is based on OLD mainframe entity designs. All the query does is returns a list of clients based on some search criteria:
#Advisers: Only returns clients which was captured by this adviser.
#outlets: just ignore this one
#searchtext: (firstname, surname, suburb, policy number) any combination of that
What I'm doing is creating a temporary table, then query all the tables involved, creating my own dataset, and then insert that dataset into a easily understandable table (#clients)
This query takes 20 seconds to execute and currently only returns 7 rows!
Screenshot of all table count can be found here: Table Record Count
Any ideas where I can start to optimize this query?
ALTER PROCEDURE [dbo].[spOP_SearchDashboard]
#advisers varchar(1000),
#outlets varchar(1000),
#searchText varchar(1000)
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON;
-- Set the prefixes to search for (firstname, surname, suburb, policy number)
DECLARE #splitSearchText varchar(1000)
SET #splitSearchText = REPLACE(#searchText, ' ', ',')
DECLARE #AdvisersListing TABLE
(
adviser varchar(200)
)
DECLARE #SearchParts TABLE
(
prefix varchar(200)
)
DECLARE #OutletListing TABLE
(
outlet varchar(200)
)
INSERT INTO #AdvisersListing(adviser)
SELECT part as adviser FROM SplitString (#advisers, ',')
INSERT INTO #SearchParts(prefix)
SELECT part as prefix FROM SplitString (#splitSearchText, ',')
INSERT INTO #OutletListing(outlet)
SELECT part as outlet FROM SplitString (#outlets, ',')
DECLARE #Clients TABLE
(
source varchar(2),
adviserId bigint,
integratedId varchar(50),
rfClientId bigint,
ifClientId uniqueidentifier,
title varchar(30),
firstname varchar(100),
surname varchar(100),
address1 varchar(500),
address2 varchar(500),
suburb varchar(100),
state varchar(100),
postcode varchar(100),
policyNumber varchar(100),
lastAccess datetime,
deleted bit
)
INSERT INTO #Clients
SELECT
source, adviserId, integratedId, rfClientId, ifClientId, title,
firstname, surname, address1, address2, suburb, state, postcode,
policyNumber, max(lastAccess) as lastAccess, deleted
FROM
(SELECT DISTINCT
'RF' as Source,
advRel.SourceEntityId as adviserId,
cast(pe.entityId as varchar(50)) AS IntegratedID,
pe.entityId AS rfClientId,
cast(ifClient.Id as uniqueidentifier) as ifClientID,
ISNULL(p.title, '') AS title,
ISNULL(p.firstname, '') AS firstname,
ISNULL(p.surname, '') AS surname,
ISNULL(ct.address1, '') AS address1,
ISNULL(ct.address2, '') AS address2,
ISNULL(ct.suburb, '') AS suburb,
ISNULL(ct.state, '') AS state,
ISNULL(ct.postcode, '') AS postcode,
ISNULL(contract.policyNumber,'') AS policyNumber,
coalesce(pp.LastAccess, d_portfolio.dateCreated, pd.dateCreated) AS lastAccess,
ISNULL(client.deleted, 0) as deleted
FROM
tbOP_Entity pe
INNER JOIN tbOP_EntityRelationship advRel ON pe.EntityId = advRel.TargetEntityId
AND advRel.RelationshipId = 39
LEFT OUTER JOIN tbOP_Data pd ON pe.EntityId = pd.entityId
LEFT OUTER JOIN tbOP__Person p ON pd.DataId = p.DataId
LEFT OUTER JOIN tbOP_EntityRelationship ctr ON pe.EntityId = ctr.SourceEntityId
AND ctr.RelationshipId = 79
LEFT OUTER JOIN tbOP_Data ctd ON ctr.TargetEntityId = ctd.entityId
LEFT OUTER JOIN tbOP__Contact ct ON ctd.DataId = ct.DataId
LEFT OUTER JOIN tbOP_EntityRelationship ppr ON pe.EntityId = ppr.SourceEntityId
AND ppr.RelationshipID = 113
LEFT OUTER JOIN tbOP_Data ppd ON ppr.TargetEntityId = ppd.EntityId
LEFT OUTER JOIN tbOP__Portfolio pp ON ppd.DataId = pp.DataId
LEFT OUTER JOIN tbOP_EntityRelationship er_policy ON ppd.EntityId = er_policy.SourceEntityId
AND er_policy.RelationshipId = 3
LEFT OUTER JOIN tbOP_EntityRelationship er_contract ON er_policy.TargetEntityId = er_contract.SourceEntityId AND er_contract.RelationshipId = 119
LEFT OUTER JOIN tbOP_Data d_contract ON er_contract.TargetEntityId = d_contract.EntityId
LEFT OUTER JOIN tbOP__Contract contract ON d_contract.DataId = contract.DataId
LEFT JOIN tbOP_Data d_portfolio ON ppd.EntityId = d_portfolio.EntityId
LEFT JOIN tbOP__Portfolio pt ON d_portfolio.DataId = pt.DataId
LEFT JOIN tbIF_Clients ifClient on pe.entityId = ifClient.RFClientId
LEFT JOIN tbOP__Client client on client.DataId = pd.DataId
where
p.surname <> ''
AND (advRel.SourceEntityId IN (select adviser from #AdvisersListing)
OR
pp.outlet COLLATE SQL_Latin1_General_CP1_CI_AS in (select outlet from #OutletListing)
)
) as RFClients
group by
source, adviserId, integratedId, rfClientId, ifClientId, title,
firstname, surname, address1, address2, suburb, state, postcode,
policyNumber, deleted
SELECT * FROM #Clients --THIS ONLY RETURNS 10 RECORDS WITH MY CURRENT DATASET
END

Clarifying questions
What is the MAIN piece of data that you are querying on - advisers, search-text, outlets?
It feels like your criteria allows for users to search in many different ways. A sproc will always use exactly the SAME plan for every question you ask of it. You get better performance by using several sprocs - each tuned for a specific search scenario (i.e I bet you could write something blazingly fast for querying just by policy-number).
If you can separate your search-text into INDIVIDUAL parameters then you may be able to:
Search for adviser relationships matching your supplied list - store in temp table (or table variable).
IF ANY surnames have been specified then delete all records from temp which aren't for people with your supplied names.
Repeat for other criteria lists - all the time reducing your temp records.
THEN join to the outer-join stuff and return the results.
In your notes you say that outlets can be ignored. If this is true then taking them out would simplify your query. The "or" clause in your example means that SQL-Server needs to find ALL relationships for ALL portfolios before it can realistically get down to the business of filtering the results that you actually want.
Breaking the query up
Most of you query consists of outer-joins that are not involved in filtering. Try moving these joins into a separate select (i.e. AFTER you have applied all of your criteria). When SQL-Server sees lots of tables then it switches off some of its possible optimisations. So your first step (assuming that you always specify advisers) is just:
SELECT advRel.SourceEntityId as adviserId,
advRel.TargetEntityId AS rfClientId
INTO #temp1
FROM #AdvisersListing advisers
INNER JOIN tbOP_EntityRelationship advRel
ON advRel.SourceEntityId = advisers.adviser
AND advRel.RelationshipId = 39;
The link to tbOP_Entity (aliased as "pe") does not look like it is needed for its data. So you should be able to replace all references to "pe.EntityId" with "advRel.TargetEntityId".
The DISTINCT clause and the GROUP-BY are probably trying to achieve the same thing - and both of them are really expensive. Normally you find ONE of these used when a previous developer has not been able to get his results right. Get rid of them - check your results - if you get duplicates then try to filter the duplicates out. You may need ONE of them if you have temporal data - you definitely don't need both.
Indexes
Make sure that the #AdvisersListing.adviser column is same datetype as SourceEntityId and that SourceEntityId is indexed. If the column has a different datatype then SQL-Server won't want to use the index (so you would want to change the data-type on #AdvisersListing).
The tbOP_EntityRelationship tables sounds like it should have an index something like:
CREATE UNIQUE INDEX advRel_idx1 ON tbOP_EntityRelationship (SourceEntityId,
RelationshipId, TargetEntityId);
If this exists then SQL-Server should be able to get everything it needs by ONLY going to the index pages (rather than to the table pages). This is known as a "covering" index.
There should be a slightly different index on tbOP_Data (assuming it has a clustered primary key on DataId):
CREATE INDEX tbOP_Data_idx1 ON tbOP_Data (entityId) INCLUDE (dateCreated);
SQL-Server will store the keys from the table's clustered index (which I assume will be DataId) along with the value of "dateCreated" in the index leaf pages. So again we have a "covering" index.
Most of the other tables (tbOP__Client, etc) should have indexes on DataId.
Query plan
Unfortunately I couldn't see the explain-plan picture (our firewall ate it). However 1 useful tip is to hover your mouse over some of the join lines. It tells you how many records be accessed.
Watch out for full-table-scans. If SQL-Server needs to use them then its pretty-much given up on your indexes.
Database structure
Its been designed as a transaction database. The level of normalization (and all of the EntityRelationship-this and Data-that are really painful for reporting). You really need to consider having a separate reporting database that unravels some of this information into a more usable structure.
If you are running reports directly against your production database then I would expect a bunch of locking problems and resource contention.
Hope this has been useful - its the first time I've posted here. Has been ages since I last tuned a query in my current company (they have a bunch of stern-faced DBAs for sorting this sort of thing out).

Looking at your execution plan... 97% of the cost of your query is in processing the DISTINCT clause. I'm not sure it is even necessary since you are taking all that data and doing a group by on it anyway. You might want to take it out and see how that affects the plan.

That kind of query is just going to take time, with that many joins and that many temp tables, there's just nothing easy or efficient about it. One trick I have been using is using local variables. It might not be an all out solution, bit if it shaves a few seconds, it's worth it.
DECLARE #Xadvisers varchar(1000)
DECLARE #Xoutlets varchar(1000)
DECLARE #XsearchText varchar(1000)
SET #Xadvisers = #advisers
SET #Xoutlets = #outlets
SET #XsearchText = #searchText
Believe me, I have tested it thoroughly, and it helps with complicated scripts. Something about the way SQL Server handles local variables. Good luck!

Related

SQL Server: Selecting Value That was met in the Where Criteria

My query selects records where given item types were ordered and I would like it to return a column that has the value for the criteria which a given record has met.
For example (since the above explanation is probably confusing):
DECLARE #Item1 VARCHAR(8) = 'Red Shoes',
#Item2 VARCHAR(8) = 'Brown Belt',
#Item3 VARCHAR(8) = 'Blue Shoes',
#Item4 VARCHAR(8) = 'Black Belt'
SELECT DISTINCT ord.Order_number,
ord.Item_number,
ord.Item_type,
ord.Item_desc,
link.Item_number AS linked_item_number
FROM Ordertbl ord
LEFT JOIN Item_tables link
ON link.item_number = ord.item_number
WHERE link.Item_number IN (#Item1,#Item2,#Item3,#Item4) AND
ord.Item_number NOT IN (#Item1,#Item2,#Item3,#Item4)
Desired Outcome: All items that were ordered whenever Item1,2,3, or 4 were ordered and, for each record, a field that depicts what item (1,2,3, or 4) was the source for that record being returned.
Using multiple Union queries with where criteria set to a single item provides the desired outcome if I set the linked_item_number field to the queried item, but that method is less than ideal because, at times, large numbers of items may be queried.
Edited: I've updated my answer a bit, and expanded on a few areas using my best guesses, but hopefully they help help illuminate the points I'm making.
Using NOT IN in a WHERE clause is really bad for performance. It would be better to convert your filters into tables, and then JOIN them to your order table.
But before we get to that, let's make a few DML assumptions here that will help keep things clear. Let's say you have two tables like the following:
CREATE TABLE Ordertbl
(
Order_number INT
,Item_number INT
--you might have more columns in your table
)
CREATE TABLE Item_tables
(
Item_number INT
,Item_type INT
,Item_desc VARCHAR(8)
--again, you might have more columns in your table
)
I'm also going to assume that the details about an item are in Item_tables and not in Ordertbl, because that makes the most sense for a database design to me.
My original answer had the following block of text next:
In this scenario, you'd need two additional tables, one for the list
of Items in where Item_number in (#Item1,#Item2,#Item3,#Item4) which
would have the corresponding Item_numbers. The other table would be
the list of Subjs in Item_number Not in
(#Subj1,#Subj2,#Subj3,#Subj4,#Subj5,#Subj6,#Subj7), again, including
the Item_number for those Subj records.
The question has been updated so that the WHERE clause is different than it was when I wrote the original version of my answer. The design pattern still applies here, even if the variables being used are different.
So let's create a temp table to hold all of our "triggering" items, and then populate it.
DECLARE #Item1 VARCHAR(8) = 'Red Shoes',
#Item2 VARCHAR(8) = 'Brown Belt',
#Item3 VARCHAR(8) = 'Blue Shoes',
#Item4 VARCHAR(8) = 'Black Belt'
CREATE TABLE #TriggeringItems
(
Item_desc VARCHAR(8)
)
INSERT INTO #TriggeringItems
(
Item_desc
)
SELECT #Item1
UNION
SELECT #Item2
UNION
SELECT #Item3
UNION
SELECT #Item4
If you had more filter variables to add, you could keep UNIONing them onto the INSERT.
So now we have our temp table and we can filter our query. Great!
...right?
Well, not quite. Our input parameters are descriptions, and our foreign key is an INT, which means we'll have to do a few extra joins to get our key values into the query. But the general idea is that you'd use an INNER JOIN to replace WHERE ... IN ..., and a LEFT JOIN to replace the WHERE ... NOT IN ... (adding a WHERE X IS NULL clause, where X is the key of the LEFT JOINed table)
If you didn't care about getting the triggering items back in your SELECT, then you could just go ahead with replacing the WHERE ... IN ... with an INNER JOIN, and that would be the end of it. But let's say you only wanted the list of items that WEREN'T the triggering items. For that, you would need to join Ordertbl to itself, to get the list of Order_numbers with triggering items within them. Then you could INNER JOIN one side to the temp table, and LEFT JOIN the other half. I know my explanation might be hard to follow, so let me show you what I mean in code:
SELECT DISTINCT onums.Order_number,
orditems.Item_number,
orditems.Item_type,
orditems.Item_desc,
tinums.Item_number AS linked_item_number
FROM #TriggeringItems ti
INNER JOIN Item_tables tinums ON ti.Item_desc = tinums.Item_desc
INNER JOIN Ordertbl onums ON tinums.Item_number = onums.Item_number
INNER JOIN Ordertbl ord ON onums.Order_number = ord.Order_number
INNER JOIN Item_tables orditems ON ord.Item_number = orditems.Item_number
LEFT JOIN #TriggeringItems excl ON orditems.Item_desc = excl.Item_desc
WHERE excl.Item_desc IS NULL
onums is our list of order numbers, but ord is where we're going to pull our items from. But we only want to return items that aren't triggers, so we LEFT JOIN our temp table at the end and add the WHERE excl.Item_desc IS NULL.

SQL Server & SSMS 2012 - Move a value from one column to a new one to ensure only one row

This is a problem that has troubled several times in the past an I have always wondered if a solution is possible.
I have a query using several tables one of the values is mobile phone number.
I have name, addresss etc.... I also have income information in the table which is used for a summary in Excel.
Where the problem occurs is when a contact has more than one mobile number, as you know this will create extra rows with the majority of the data being duplicate including the income.
Question: is it possible for the query to identify whether the contact has more than one number and if so create a new column with the 2nd mobile number.
Effectively returning the contacts information to one row and creating new columns.
My SQL is intermediate and I cannot think of a solution so thought I would ask.
Many thanks
I am pretty sure that it isn't the best possible solution, since we don't have information on how many records do you have in your dataset and I didn't have enough time, so just an idea how you can solve your original problem with two different numbers for one same customer.
declare #t table (id int
,firstName varchar(20)
,lastName varchar(20)
,phoneNumber varchar(20)
,income money)
insert into #t values
(1,'John','Doe','1234567',50)
,(1,'John','Doe','6789856',50)
,(2,'Mike','Smith','5687456',150)
,(3,'Stela','Hodhson','3334445',500)
,(4,'Nick','Slotter','5556667',550)
,(4,'Nick','Slotter','8889991',550)
,(5,'Abraham','Lincoln','4578912',52)
,(6,'Ronald','Regan','6987456',587)
,(7,'Thomas','Jefferson','8745612',300);
with a as(
select id
,max(phoneNumber) maxPhone
from #t group by id
),
b as(
select id
,min(phoneNumber) minPhone
from #t group by id
)
SELECT distinct t.id
,t.firstName
,t.lastName
,t.income
,a.maxPhone as phoneNumber1
,case when b.minPhone = a.maxPhone then ''
else b.minphone end as phoneNumber2
from #t t
inner join a a on a.id = t.id
inner join b b on b.id = t.id

Should subselects in FROM clauses be optimized into derived tables in SQL Server?

Query 1:
SELECT au_lname, au_fname, title
FROM (SELECT au_lname, au_fname, id FROM pubs.dbo.authors WHERE state = 'CA') as a
JOIN pubs.dbo.titleauthor ta on a.au_id=ta.au_id
JOIN pubs.dbo.titles t on ta.title_id = t.title_id
Query 2:
DECLARE #DATASET TABLE ( au_lname VARCHAR(10), au_fname VARCHAR(10), id INT );
INSERT #DATASET SELECT au_lname, au_fname, id FROM pubs.dbo.authors WHERE state = 'CA';
SELECT au_lname, au_fname, title
FROM #DATASET DATASET as a
JOIN pubs.dbo.titleauthor ta on a.au_id=ta.au_id
JOIN pubs.dbo.titles t on ta.title_id = t.title_id
My assumption is that these two queries are not very different from each other, from the standpoint of performance. Is Query 2 ever an improvement over Query 1?
As a side note, I know that the subquery in this example doesn't even need to be there. This is a simplified version -- the actual query I'm dealing with is much more complicated.
If you have more than a couple rows of data, query 2 is in all likelihood worse than query 1.
You are SELECTing the data twice - once into the table variable and once again to return it
Table variables generally do not perform great - they have no stats or indexes and contrary to popular belief still write to tempdb and the log
If it were me I would rewrite the first query without a derived table, which is really unnecessary:
SELECT
au_lname
,au_fname
,title
FROM
pubs.dbo.authors a
INNER JOIN
pubs.dbo.titleauthor ta
on a.au_id=ta.au_id
INNER JOIN
pubs.dbo.titles t
on ta.title_id = t.title_id
WHERE
a.state = 'CA'
Note
For more info than you ever wanted on table variables and temp tables, see this epic answer from Martin Smith.
The only thing Query 2 buys you is the ability to store additional data in the table variable (for instance, if your procedure generates extra meta data to support its purpose). What it costs you is the ability to use any indexes on the physical table.
No. Let the optimiser do its thing. By introducing your own temporary tables/variables you are taking away options that would otherwise be available. By telling it exactly what you want (i.e. a normal query) it can do its best. By trying to second guess it, you're making it break down into usually unnecessary steps.

is there anyway to cache data that can be used in a SQL server db trigger

i have an orders table that has a userID column
i have a user table that has id, name,
i would like to have a database trigger that shows the insert, update or delete by name.
so i wind up having to do this join between these two tables on every single db trigger. I would think it would be better if i can one query upfront to map users to Ids and then reuse that "lookup " on my triggers . . is this possible?
DECLARE #oldId int
DECLARE #newId int
DECLARE #oldName VARCHAR(100)
DECLARE #newName VARCHAR(100)
SELECT #oldId = (SELECT user_id FROM Deleted)
SELECT #newId = (SELECT user_id FROM Inserted)
SELECT #oldName = (SELECT name FROM users where id = #oldId)
SELECT #newName = (SELECT name FROM users where id = #newId)
INSERT INTO History(id, . . .
Good news, you are already are using a cache! Your SELECT name FROM users WHERE id = #id is going to fetch the name for the buffer pool cached pages. Believe you me, you won't be able to construct a better tuned, higher scale and faster cache than that.
Result caching may make sense in the client, where one can avoid the roundtrip to the database altogether. Or it may be valuable to cache some complex and long running query result. But inside a stored proc/trigger there is absolutely no value in caching a simple index lookup result.
How about you turn on Change Data Capture, and then get rid of all this code?
Edited to add the rest:
Actually, if you're considering the possibility of a scalar function to fetch the username, then don't. That's really bad because of the problems of scalar functions being procedural. You'd be better off with something like:
INSERT dbo.History (id, ...)
SELECT i.id, ...
FROM inserted i
JOIN deleted d ON d.id = i.id
JOIN dbo.users u ON u.user_id = i.user_id;
As user_id is unique, and you have a FK whenever it's used, it shouldn't be a major problem. But yes, you need to repeat this logic in every trigger. If you don't want to repeat the logic, then use Change Data Capture in SQL 2008.

T-SQL filtering on dynamic name-value pairs

I'll describe what I am trying to achieve:
I am passing down to a SP an xml with name value pairs that I put into a table variable, let's say #nameValuePairs.
I need to retrieve a list of IDs for expressions (a table) with those exact match of name-value pairs (attributes, another table) associated.
This is my schema:
Expressions table --> (expressionId, attributeId)
Attributes table --> (attributeId, attributeName, attributeValue)
After trying complicated stuff with dynamic SQL and evil cursors (which works but it's painfully slow) this is what I've got now:
--do the magic plz!
-- retrieve number of name-value pairs
SET #noOfAttributes = select count(*) from #nameValuePairs
select distinct
e.expressionId, a.attributeName, a.attributeValue
into
#temp
from
expressions e
join
attributes a
on
e.attributeId = a.attributeId
join --> this join does the filtering
#nameValuePairs nvp
on
a.attributeName = nvp.name and a.attributeValue = nvp.value
group by
e.expressionId, a.attributeName, a.attributeValue
-- now select the IDs I need
-- since I did a select distinct above if the number of matches
-- for a given ID is the same as noOfAttributes then BINGO!
select distinct
expressionId
from
#temp
group by expressionId
having count(*) = #noOfAttributes
Can people please review and see if they can spot any problems? Is there a better way of doing this?
Any help appreciated!
I belive that this would satisfy the requirement you're trying to meet. I'm not sure how much prettier it is, but it should work and wouldn't require a temp table:
SET #noOfAttributes = select count(*) from #nameValuePairs
SELECT e.expressionid
FROM expression e
LEFT JOIN (
SELECT attributeid
FROM attributes a
JOIN #nameValuePairs nvp ON nvp.name = a.Name AND nvp.Value = a.value
) t ON t.attributeid = e.attributeid
GROUP BY e.expressionid
HAVING SUM(CASE WHEN t.attributeid IS NULL THEN (#noOfAttributes + 1) ELSE 1 END) = #noOfAttributes
EDIT: After doing some more evaluation, I found an issue where certain expressions would be included that shouldn't have been. I've modified my query to take that in to account.
One error I see is that you have no table with an alias of b, yet you are using: a.attributeId = b.attributeId.
Try fixing that and see if it works, unless I am missing something.
EDIT: I think you just fixed this in your edit, but is it supposed to be a.attributeId = e.attributeId?
This is not a bad approach, depending on the sizes and indexes of the tables, including #nameValuePairs. If it these row counts are high or it otherwise becomes slow, you may do better to put #namValuePairs into a temp table instead, add appropriate indexes, and use a single query instead of two separate ones.
I do notice that you are putting columns into #temp that you are not using, would be faster to exclude them (though it would mean duplicate rows in #temp). Also, you second query has both a "distinct" and a "group by" on the same columns. You don't need both so I would drop the "distinct" (probably won't affect performance, because the optimizer already figured this out).
Finally, #temp would probably be faster with a clustered non-unique index on expressionid (I am assuming that this is SQL 2005). You could add it after the SELECT..INTO, but it is usually as fast or faster to add it before you load. This would require you to CREATE #temp first, add the clustered and then use INSERT..SELECT to load it instead.
I'll add an example of merging the queries in a mintue... Ok, here's one way to merge them into a single query (this should be 2000-compatible also):
-- retrieve number of name-value pairs
SET #noOfAttributes = select count(*) from #nameValuePairs
-- now select the IDs I need
-- since I did a select distinct above if the number of matches
-- for a given ID is the same as noOfAttributes then BINGO!
select
expressionId
from
(
select distinct
e.expressionId, a.attributeName, a.attributeValue
from
expressions e
join
attributes a
on
e.attributeId = a.attributeId
join --> this join does the filtering
#nameValuePairs nvp
on
a.attributeName = nvp.name and a.attributeValue = nvp.value
) as Temp
group by expressionId
having count(*) = #noOfAttributes

Resources