Slow two-table query in SQL Server - sql-server

The application I work on generates an SQL query somewhat like this:
Select
VISIT_VIEW.VISIT_ID, VISIT_VIEW.PATIENT_ID, VISIT_VIEW.MRN_ID,
VISIT_VIEW.BILL_NO, INSURANCE.INS_PAYOR'
FROM
'VISIT_VIEW
LEFT JOIN
INSURANCE ON VISIT_VIEW.visit_id = INSURANCE._fk_visit '
WHERE
'VISIT_VIEW.VISIT_ID IN (1002, 1003, 1005, 1006, 1007, 1008, 1010, 1011, <...>, 1193, 1194, 1195, 1196, 1197, 1198, 1199)'
The <...> represents a long list of ids. The size of the list depends on the results of a previous query, and in turn on the parameters selected to generate that query.
The list of IDs can be anywhere from 100 items long to above 2000.
The INSURANCE table is large, over 9 million rows. The visit table is also large, but not quite as large.
As the number of IDs goes up there is a fairly sharp increase from a duration of less than a second to over 15 minutes. The increase starts somewhere around 175 ids.
If the parameters used to generate the query are changed so that the INS_PAYOR column is not selected, and thus there is no left join, the query runs in less than a second, even with over 2000 items in the list of IDs.
The execution plan shows that 97% of the query time is devoted to a clustered seek on the INSURANCE table.
How can I rework this query to get the same results with a less horrific delay?
Do remember that the SQL is being generated by code, not by hand. It is generated from a list of fields (with knowledge of which field belongs to which table) and a list of IDs in the primary table to check. I do have access to the code that does the query generation, and can change it provided that the ultimate results of the query are exactly the same.
Thank you

The <...> represents a long list of ids. The size of the list depends on the results of a previous query
Don't do that.
Do this:
SELECT <...>
FROM VISIT_VIEW
INNER JOIN (
<previous query goes here>
) t on VISIT_VIEW.VISIT_ID = t.<ID?>
LEFT JOIN INSURANCE ON VISIT_VIEW.visit_id=INSURANCE._fk_visit

See if you see any improvements using the following...
IF OBJECT_ID('tempdb..#VisitList', 'U') IS NOT NULL
DROP TABLE #VisitList;
CREATE TABLE #VisitList (
VISIT_ID INT NOT NULL PRIMARY KEY
);
INSERT #VisitList (VISIT_ID) VALUES (1002),(1003),(1005),(1006),(1007),(1008),(1010),(1011),(<...>),(1193),(1194),(1195),(1196),(1197),(1198),(1199);
SELECT
vv.VISIT_ID,
vv.PATIENT_ID,
vv.MRN_ID,
vv.BILL_NO,
ix.INS_PAYOR
FROM
VISIT_VIEW vv
JOIN #VisitList vl
ON vv.VISIT_ID = vl.VISIT_ID
CROSS APPLY (
SELECT TOP 1
i.INS_PAYOR
FROM
INSURANCE i
WHERE
vv.visit_id=i._fk_visit
) ix;

Related

SQL query runs into a timeout on a sparse dataset

For sync purposes, I am trying to get a subset of the existing objects in a table.
The table has two fields, [Group] and Member, which are both stringified Guids.
All rows together may be to large to fit into a datatable; I already encountered an OutOfMemory exception. But I have to check that everything I need right now is in the datatable. So I take the Guids I want to check (they come in chunks of 1000), and query only for the related objects.
So, instead of filling my datatable once with all
SELECT * FROM Group_Membership
I am running the following SQL query against my SQL database to get related objects for one thousand Guids at a time:
SELECT *
FROM Group_Membership
WHERE
[Group] IN (#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, #Guid5, ..., #Guid999)
The table in question now contains a total of 142 entries, and the query already times out (CommandTimeout = 30 seconds). On other tables, which are not as sparsely populated, similar queries don't time out.
Could someone shed some light on the logic of SQL Server and whether/how I could hint it into the right direction?
I already tried to add a nonclustered index on the column Group, but it didn't help.
I'm not sure that WHERE IN will be able to maximally use an index on [Group], or if at all. However, if you had a second table containing the GUID values, and furthermore if that column had an index, then a join might perform very fast.
Create a temporary table for the GUIDs and populate it:
CREATE TABLE #Guids (
Guid varchar(255)
)
INSERT INTO #Guids (Guid)
VALUES
(#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, ...)
CREATE INDEX Idx_Guid ON #Guids (Guid);
Now try rephrasing your current query using a join instead of a WHERE IN (...):
SELECT *
FROM Group_Membership t1
INNER JOIN #Guids t2
ON t1.[Group] = t2.Guid;
As a disclaimer, if this doesn't improve the performance, it could be because your table has low cardinality. In such a case, an index might not be very effective.

SQL Server FullText Search with Weighted Columns from Previous One Column

In the database on which I am attempting to create a FullText Search I need to construct a table with its column names coming from one column in a previous table. In my current implementation attempt the FullText indexing is completed on the first table Data and the search for the phrase is done there, then the second table with the search results is made.
The schema for the database is
**Players**
Id
PlayerName
Blacklisted
...
**Details**
Id
Name -> FirstName, LastName, Team, Substitute, ...
...
**Data**
Id
DetailId
PlayerId
Content
DetailId in the table Data relates to Id in Details, and PlayerId relates to Id in Players. If there are 1k rows in Players and 20 rows in Details, then there are 20k rows in Data.
WITH RankedPlayers AS
(
SELECT PlayerID, SUM(KT.[RANK]) AS Rnk
FROM Data c
INNER JOIN FREETEXTTABLE(dbo.Data, Content, '"Some phrase like team name and player name"')
AS KT ON c. DataID = KT.[KEY]
GROUP BY c.PlayerID
)
…
Then a table is made by selecting the rows in one column. Similar to a pivot.
…
SELECT rc.Rnk,
c.PlayerID,
PlayerName,
TeamID,
…
(SELECT Content FROM dbo.Data data WHERE DetailID = 1 AND data.PlayerID = c.PlayerID) AS [TeamName],
…
FROM dbo.Players c
JOIN RankedPlayers rc ON c. PlayerID = rc. PlayerID
ORDER BY rc.Rnk DESC
I can return a ranked table with this implementation, the aim however is to be able to produce results from weighted columns, so say the column Playername contributes to the rank more than say TeamName.
I have tried making a schema bound view with a pivot, but then I cannot index it because of the pivot. I have tried making a view of that view, but it seems the metadata is inherited, plus that feels like a clunky method.
I then tried to do it as a straight query using sub queries in the select statement, but cannot due to indexing not liking sub queries.
I then tried to join multiple times, again the index on the view doesn't like self-referencing joins.
How to do this?
I have come across this article http://developmentnow.com/2006/08/07/weighted-columns-in-sql-server-2005-full-text-search/ , and other articles here on weighted columns, however nothing as far as I can find addresses weighting columns when the columns were initially row data.
A simple solution that works really well. Put weight on the rows containing the required IDs in another table, left join that table to the table to which the full text search had been applied, and multiply the rank by the weight. Continue as previously implemented.
In code that comes out as
DECLARE #Weight TABLE
(
DetailID INT,
[Weight] FLOAT
);
INSERT INTO #Weight VALUES
(1, 0.80),
(2, 0.80),
(3, 0.50);
WITH RankedPlayers AS
(
SELECT PlayerID, SUM(KT.[RANK] * ISNULL(cw.[Weight], 0.10)) AS Rnk
FROM Data c
INNER JOIN FREETEXTTABLE(dbo.Data, Content, 'Karl Kognition C404') AS KT ON c.DataID = KT.[KEY]
LEFT JOIN #Weight cw ON c.DetailID = cw.DetailID
GROUP BY c.PlayerID
)
SELECT rc.Rnk,
...
I'm using a temporary table here for evidence of concept. I am considering adding a column Weights to the table Details to avoid an unnecessary table and left join.

Performance Strategy for growing data

I know that performance tuning is something which need to be done specific to each environment. But I have put maximum effort to make my question clear to see if I am missing something in the possible improvements.
I have a table [TestExecutions] in SQL Server 2005. It has around 0.2 million records as of today. It is expected to grow as 5 million in couple of months.
CREATE TABLE [dbo].[TestExecutions]
(
[TestExecutionID] [int] IDENTITY(1,1) NOT NULL,
[OrderID] [int] NOT NULL,
[LineItemID] [int] NOT NULL,
[Manifest] [char](7) NOT NULL,
[RowCompanyCD] [char](4) NOT NULL,
[RowReferenceID] [int] NOT NULL,
[RowReferenceValue] [char](3) NOT NULL,
[ExecutedTime] [datetime] NOT NULL
)
CREATE INDEX [IX_TestExecutions_OrderID]
ON [dbo].[TestExecutions] ([OrderID])
INCLUDE ([LineItemID], [Manifest], [RowCompanyCD], [RowReferenceID])
I have following two queries for same purpose (Query2 and Query 3). For 100 records in #OrdersForRC, the Query2 is working better (39% vs 47%) whereas with 10000 records in in #OrdersForRC the Query 3 is working better (53% vs 33%) as per the execution plan).
In the initial few months of use, the #OrdersForRC table will have close to 100 records. It will gradually increase to 2500 records over a couple of months.
In the following two approaches which one is good for such a incrementally growing scenario? Or is there any strategy to make one approach work better than the other even if data grows?
Note: In Plan2, the first Query uses Hash Match
References
query optimizer operator choice - nested loops vs hash match (or merge)
Execution Plan Basics — Hash Match Confusion
Test Query
CREATE TABLE #OrdersForRC
(
OrderID INT
)
INSERT INTO #OrdersForRC
--SELECT DISTINCT TOP 100 OrderID FROM [TestExecutions]
SELECT DISTINCT TOP 5000 OrderID FROM LWManifestReceiptExecutions
--QUERY 2:
SELECT H.OrderID,H.LineItemID,H.Manifest,H.RowCompanyCD,H.RowReferenceID
FROM dbo.[TestExecutions] (NOLOCK) H
INNER JOIN #OrdersForRC R
ON R.OrderID = H.OrderID
--QUERY 3:
SELECT H.OrderID,H.LineItemID,H.Manifest,H.RowCompanyCD,H.RowReferenceID
FROM dbo.[TestExecutions] (NOLOCK) H
WHERE OrderID IN (SELECT OrderID FROM #OrdersForRC)
DROP TABLE #OrdersForRC
Plan 1
Plan 2
AS commented above you have not specified table definition of table LWManifestReceiptExecutions and how many rows in it and
You are selecting Top N rows without order by, Do you want TOP N random id or in a specific order or order does`t matter for You?
if order does matter then you can create a index on column which you required in Order By
if order id is unique in [dbo].[TestExecutions] table then you should mark it as unique drop and recreate the index if UNIQUE
Drop Index [IX_TestExecutions_OrderID] ON [dbo].[TestExecutions]
CREATE UNIQUE INDEX [IX_TestExecutions_OrderID]
ON [dbo].[TestExecutions] ([OrderID])
INCLUDE ([LineItemID], [Manifest], [RowCompanyCD], [RowReferenceID])
You asked that data is keep growing and it will reach to millions in couple of months.
No need to worry sql server can easily handle these query with proper build schema and indexes,
When this data model starting hurting then you could look at the
other options but not now, i have seen people handling billions of data in sql server.
I can see you are comparing the queries on the bases of query cost you are coming the conclusion that
Query with higher percentages mean this is more expensive,
That is not the case always query cost is based on aggregate Subtree cost of all Iterator in the query plan,
and the total estimated cost of an Iterator is a simple sum of the I/O and CPU components.
The cost values represent expected execution times (in seconds) on a particular hardware configuration
But with the morden hardware these cost might be irrelevant.
Now coming to your query,
You have expressed two queries to get the result but both are not identical,
IN PLAN 1 Query 1
Expressed by JOIN
QO is choosing Nested loop join that is good choice for particular this scenario
Every row for the key OrderID IN table #OrdersForRC seeking the value in the table dbo.[TestExecutions]
until all rows matched
IN PLAN 2 Query 2
Expressed by IN
QO is doing the same thing as query one but there is extra distinct Sort ( Sort and Stream aggregated)
the reasoning behind it is you have expressed this query as IN and table #OrdersForRC can contain duplicate Rows
Just to eliminate that is necessary.
IN PLAN 2 Query 1
Expressed by JOIN
Now the Rows in the table in #OrdersForRC in 1000, QO is choosing hash join over loop join
Because loop join for 1000 rows has more cost than hash join and loop join and rows are unordered
and can consist nulls as well thus HASH JOIN is perfect stratergy here.
IN PLAN 2 Query 2
Expressed by IN
QO has chosen Distinct Sort for the same reason as chosen in Plan 2 query 2 and then Merge Join
Because rows are now sorted ON ID column for both tables.
IF you just mark temp table as NOT NULL and Unique then its more likly you will get the same execution plan for both IN the JOIN.
CREATE TABLE #OrdersForRC
(OrderID INT not null Unique)
Execution plan

SQL Select statement loop on one column in different table

Good day Guys, Would you help me with my SQL Query. I have on proj in the web which I called INQUIRY good thing is I can store the log file of what data is being search to my project which they enter to my inquiry search box.
This is the table of Keyword have been searched in INQUIRY:
This Code :
Insert into #temptable
Select CaseNo from tblcrew
where Lastname like '%FABIANA%'
and firstname like '%MARLON%'
Insert into #temptable
Select CaseNo from tblcrew
where Lastname like '%DE JOAN%'
and firstname like '%ROLANDO%'
Insert into #temptable
Select CaseNo from tblcrew
where Lastname like '%ROSAS%'
and firstname like '%FRANCASIO%'
I want to repeat my query until all the rows in table of keyword is being search and save the result of each query into a temporary table. Is there a possibility to do that without typing all the value of in the columns of keyword.
Please anyone help me.. thanks!
All you need is join the two tables together without typing any values.
Insert into #temptable
Select c.CaseNo
from tblcrew c
inner join tblKeyword k
on c.Lastname like '%'+k.Lastname+'%'
and c.firstname like '%'+k.firstname +'%'
Usually start with the Adventure Works database for examples like this. I will be talking about exact matches with leverage an index seek, in-exact matches that leverage a index scan, and full text indexing in which you can do a in-exact match resulting in a seek.
The Person.Person table has both last and first name like your example. I keep just the primary key on business id and create one index on (last, first).
--
-- Just PK & One index for test
--
-- Sample database
use [AdventureWorks2012];
go
-- add the index
CREATE NONCLUSTERED INDEX [IX_Person_LastName_FirstName] ON [Person].[Person]
(
[LastName] ASC,
[FirstName] ASC
);
go
Run with wild card for inexact match. Run with just text for exact match. I randomly picked two names from the Person.Person table.
--
-- Run for match type
--
-- Sample database
use [AdventureWorks2012];
go
-- remove temp table
drop table #inquiry;
go
-- A table with first, last name combos to search
create table #inquiry
(
first_name varchar(50),
last_name varchar(50)
);
go
-- Add two person.person names
insert into #inquiry values
('%Cristian%', '%Petculescu%'),
('%John%', '%Kane%');
/*
('Cristian', 'Petculescu'),
('John', 'Kane');
*/
go
-- Show search values
select * from #inquiry;
go
The next step when examining run times is to clear the procedure cache and memory buffers. You do not want existing plans or data skew the numbers.
-- Remove clean buffers & clear plan cache
CHECKPOINT
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
GO
-- Show time & i/o
SET STATISTICS TIME ON
SET STATISTICS IO ON
GO
The first SQL statement will do a inner join between the temporary search values table and Person.Person.
-- Exact match
select *
from
[Person].[Person] p join #inquiry i
on p.FirstName = i.first_name and p.LastName = i.last_name
The statistics and run times.
Table 'Person'. Scan count 2, logical reads 16, physical reads 8, CPU time = 0 ms, elapsed time = 29 ms.
The resulting query plan does a table scan of the #inquiry table and a index seek of the index on a last and first name. It is a nice simple plan.
Lets retry this with a inexact match using wild cards and the LIKE operator.
-- In-Exact match
select *
from
[Person].[Person] p join #inquiry i
on p.FirstName like i.first_name and p.LastName like i.last_name
The statistics and run times.
Table 'Person'. Scan count 2, logical reads 219, CPU time = 32 ms, elapsed time = 58 ms.
The resulting query plan is a-lot more complicated. We are still doing a table scan of #inquiry since it does not have an index. However, there are a-lot of nested joins going on to used the index with a impartial match.
We added three more operators to the query and the execution time is twice that of the exact match.
In short, if you are doing inexact matches with the LIKE command, they will be more expensive.
If you are searching hundreds of thousands of records, use a FULL TEXT INDEX (FTI). I wrote two articles on this topic.
http://craftydba.com/?p=1421
http://craftydba.com/?p=1629
Every night, you will have to have a process that updates the FTI with any changes. After that one hit, you can use the CONTAINS() operator to leverage the index in fuzzy matches.
I hope I explained the differences. I have seen continued confusion on this topic and I wanted to put something out on Stack Overflow that I could reference.
Best of luck Juan.

SQL LIKE with parameter : huge performance hit

I have this query :
select top 100 id, email, amount from view_orders
where email LIKE '%test%' order by created_at desc
It takes less than a second to run.
Now I want to parameterize it :
declare #m nvarchar(200)
set #m = '%test%'
SELECT TOP 100 id, email, amount FROM view_orders
WHERE email LIKE #m ORDER BY created_at DESC
After 5 minutes, it's still running. With any other kind of test on parameters (if I replace the "like" with "="), it falls down to the first query level of performance.
I am using SQL Server 2008 R2.
I have tried with OPTION(RECOMPILE) , it drops to 6 seconds, but it's still much slower (the non-parameterized query is instantaneous). As it's a query that I expect will be run often, it's an issue.
The table's column is indexed, but the view is not, I don't know if it can make a difference.
The view joins 5 tables : one with 3,154,333 rows (users), one with 1,536,111 rows (orders), and 3 with a few dozen rows at most (order type, etc). The search is done on the "user" table (with 3M rows).
Hard-coded values :
Parameters :
Update
I have run the queries using SET STATISTICS IO ON. Here are the result (sorry I don't know how to read that) :
Hard-coded values:
Table 'currency'. Scan count 1, logical reads 201.
Table 'order_status'. Scan count 0, logical reads 200.
Table 'payment'. Scan count 1, logical reads 100.
Table 'gift'. Scan count 202, logical reads 404.
Table 'order'. Scan count 95, logical reads 683.
Table 'user'. Scan count 1, logical reads 7956.
Parameters :
Table 'currency'. scan count 1, logical reads 201.
Table 'order_status'. scan count 1, logical reads 201.
Table 'payment'. scan count 1, logical reads 100.
Table 'gift'. scan count 202, logical reads 404.
Table 'user'. scan count 0, logical reads 4353067.
Table 'order'. scan count 1, logical reads 4357031.
Update 2
I have since seen a "force index usage" hint :
SELECT TOP 100 id, email, amount
FROM view_orders with (nolock, index=ix_email)
WHERE email LIKE #m
ORDER BY created_at DESC
Not sure it would work though, I don't work at this place anymore.
It could be a parameter sniffing problem.
Better indexes or a full text search are the way to go but you might be able to get a workable compromise.
Try:
SELECT TOP 100 A, B, C FROM myview WHERE A LIKE '%' + #a + '%'
OPTION (OPTIMIZE FOR (#a = 'testvalue'));
(like Sean Coetzee suggests, I wouldn't pass in the wildcard in the parameter)
You will definetly win when you add an index to the A column.
Some time the index suggestion can be borrowed by SQL Server management studio. Paste you query and press Display Estimated Execution Plan button
CREATE INDEX index_name ON myview (A);
CREATE INDEX index_name ON myview (B);
CREATE INDEX index_name ON myview (C);
declare #a nvarchar(200)
set #a = '%testvalue%'
SELECT TOP 100 A, B, C FROM myview WHERE A LIKE #a
What happens if you try:
set #a = 'test'
select top 100 A, B, C
from myview
where A like '%' + #a + '%'
I've tried a test on some dummy data and it looks like it may be faster.
The estimated execution plan for the parameterized version is clearly not right. I don't believe I've seen a query with 100% estimated cost twice! As the cost is supposed to total 100%. It's also interesting that it believes it needs to start with orders when you're clearly filtering by something on the user table.
I'd rebuild your statistics on all of the tables that are referenced in the view.
update statistics <tablename> with resample
Do one of these for each table involved.
You can attempt running the sql directly (copy paste view body into sql) both parameterized and not to see if it's the view sql is having issues with.
At the end of the day even when you get this fixed it's really only a stop gap. You have 3million users and every time you run the query sql has to go through all 3million records (the 75% scan in your top query) to find all the possible records. The more users you get the slower the query gets. Non-fulltext indexes can't be used for a like query with wildcards at the front.
In this case you can think about a sql index like a book index. Can you use a book index with "part" of a word to find anything quickly? Nope, you've got to scan the whole index to figure out all the possibilities.
You should really consider a full text index on your view.

Resources