Slow searching large SQL table on multiple columns - sql-server

I am looking for better performance when searching a large (>200,000 rows) SQL-server table on multiple columns. The current code generatres a query something like
(
SELECT Person._pk
FROM dbo.R_Person as Person
WHERE Person._pk > 0
AND Person.first_name LIKE 'jane%'
AND Person.last_name LIKE 'morgan%'
AND Person._pk IN (
SELECT _pk FROM dbo.R_PersonView12
)
when only one name is searched on this returns promptly, but with searchs on both first and last (often needed to find the correct person, as there will be too many matches on either alone) the runtime becomes unacceptably high. Can anyone suggest a different way to sonstruct this query to improve performance here?

It's always best to look at the query plan. But you could certainly try this instead:
SELECT Person._pk
FROM dbo.R_Person as Person
WHERE Person._pk > 0
AND Person.first_name LIKE 'jane%'
AND Person.last_name LIKE 'morgan%'
AND EXISTS (
SELECT 1 FROM dbo.R_PersonView12 V
WHERE V._pk = Person._pk
)
It also looks like you have a view here - that could be the problem also. Post the DDL.
Also what columns have indexes?

You have to add computed field to Person as NameSurname . and Create index on it.
alter table dbo.R_Person add NameSurname as Name +' '+Surname
SELECT Person._pk
FROM dbo.R_Person as Person
WHERE Person._pk > 0
AND Person.NameSurname LIKE 'jane morgan%'
AND EXISTS (
SELECT 1 FROM dbo.R_PersonView12 V
WHERE V._pk = Person._pk
)

As mentioned by ElectricLlama, views tend to get nasty at times and we still have no idea if this is a simple view or a nested view with multiple ones embedded inside it.
At times when I am stuck with these kinds of problems, I simply try to look at the code of the view, embed its definition with my problematic query and then see the execution plan. Somehow the optimizer is not able to work well with nested views all the times, as I have observed.

Related

'Multiple' values for a variable

A bit of background. There are multiple tables from multiple databases that have the same schemas. So, when I query to select all columns having the same master code (in the tables, the master code is in the column called CATMASTRCAT), the same code will have multiple rows, the only same thing about them is the CATMASTRCAT column. This works for a single master code (in the script below if I set the variable to 031325-002-70 it will show multiple rows having different organizations and same data with the rest, which is the desired result).
Question is, is there a way to have multiple master codes be as an input in the variable? I'm planning to create this as a stored procedure.
This is my SQL script:
DECLARE #ProductNumber AS VARCHAR(1000)
SET #ProductNumber = ('031325-002-70')
SELECT ITEMS
,ORGANIZATION
FROM [EU].[dbo].[SOMETHING14]
WHERE ITEMS in (#ProductNumber)
UNION
SELECT ITEMS
,ORGANIZATION
FROM [EU].[dbo].[SOMETHING12]
WHERE ITEMS in (#ProductNumber)
UNION
SELECT ITEMS
,ORGANIZATION
FROM [EU].[dbo].[SOMETHING11]
WHERE ITEMS IN (#ProductNumber)
Feel free to clarify any other needed data. I'm fairly new to SQL, just self-learning. You can also lecture me about the wrong code haha and how to do this better.
Thanks!
P.S. Attached the picture of query result
Yes the best way to do this is to use a table value parameter and then change the where clause to say
WHERE catmastrcat IN (SELECT catmastrcat FROM #tablevaluename)
or you could use an inner join -- which might be faster depending on indexes and other issues - the code for that would look like this
JOIN #tablevaluename tv ON AJF_CATMASTER.catmastrcat = tv.catmastrcat

How to force reasonable execution plan for query with LIKE statement?

When creating ad-hoc queries to look for information in a table I have run into this issue over and over.
Let's say I have a table with a million records with fields id - int, createddatetime - timestamp, category - varchar(50) and content - varchar(max). I want to find all records in the last day that have a certain string in the content field. If I create a query like this...
select *
from table
where createddatetime > '2018-1-31'
and content like '%something%'
it may complete in a second because in the last day there may only be 100 records so the LIKE clause is only operating on a small number of records
However if I add one more item to the where clause...
select *
from table
where createddatetime > '2018-1-31'
and content like '%something%'
and category = 'testing'
then it could take many minutes to complete while locking up the table.
It appears to be changing from performing all the straight forward WHERE clause items first and then the LIKE on the limited set of records, over to having the LIKE clause first. There are even times where there are multiple LIKE statements and adding one more causes the query to go from a split second to minutes.
The only solutions I've found are to either generate an intermediate table (maybe temp tables would work), insert records based on the basic WHERE clause items, then run a separate query to filter by one or more LIKE statements. I've tried various JOIN and CTE approaches which usually have no improvement. Alternatively CHARINDEX also appears to work though difficult to use if trying to convert the logic of multiple LIKE statements.
Is there any hint or something that can be placed in the query statement to tell sql server to wait until records are filtered by the basic WHERE clause items before filtering by the LIKE?
I actually just tried this approach and it had the same issue...
select *
from (
select *, charindex('something', content) as found
from bounce
where createddatetime > '2018-1-31'
) t
where found > 0
while the subquery independently returns in a couple seconds, the overall query just never returns. Why is this so bad
Not fancy, but I've had better luck with temp tables than nested select statements... It will isolate the first data set, and then you can select just from that. If you're looking for quick and dirty, which usually serves my purposes for ad-hoc, this may help. If this is a permanent stored proc, the indexing suggestions may serve you better in the long run.
select *
into #like
from table
where createddatetime > '2018-1-31'
and content like '%something%'
select *
from #like
where category = 'testing'

Optimize SQL in MS SQL Server that returns more than 90% of records in the table

I have the below sql
SELECT Cast(Format(Sum(COALESCE(InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet,
BP.BoundProjectId AS ProjectId
FROM BoundProducts BP
WHERE ( BP.IsDeleted IS NULL
OR BP.IsDeleted = 0 )
GROUP BY BP.BoundProjectId
I already have an index on the table BoundProducts on this column order (BoundProjectId, IsDeleted)
Currently this query takes around 2-3 seconds to return the result. I am trying to reduce it to zero seconds.
This query returns 25077 rows as of now.
Please provide me any ideas to improvise the query.
Looking at this in a bit different point of view, I can think that your OR condition is screwing up your query, why not to rewrite it like this?
SELECT CAST(FORMAT(SUM(COALESCE(BP.InstalledSubtotal, 0)), 'F') AS MONEY) AS TotalSoldNet
, BP.BoundProjectId AS ProjectId
FROM (
SELECT BP.BoundProjectId, BP.InstalledSubtotal
FROM dbo.BoundProducts AS BP
WHERE BP.IsDeleted IS NULL
UNION ALL
SELECT BP.BoundProjectId, BP.InstalledSubtotal
FROM dbo.BoundProducts AS BP
WHERE BP.IsDeleted = 0
) AS BP
GROUP BY BP.BoundProjectId;
I've had better experience with UNION ALL rather than OR.
I think it should work totally the same. On top of that, I'd create this index:
CREATE NONCLUSTERED INDEX idx_BoundProducts_IsDeleted_BoundProjectId_iInstalledSubTotal
ON dbo.BoundProducts (IsDeleted, BoundProjectId)
INCLUDE (InstalledSubTotal);
It should satisfy your query conditions and seek index quite well. I know it's not a good idea to index bit fields, but it's worth trying.
P.S. Why not to default your IsDeleted column value to 0 and make it NOT NULLABLE? By doing that, it should be enough to do a simple check WHERE IsDeleted = 0, that'd boost your query too.
If you really want to try index seek, it should be possible using query hint forceseek, but I don't think it's going to make it any faster.
The options I suggested last time are still valid, remove format and / or create an indexed view.
You should also test if the problem is the query itself or just displaying the results after that, for example trying it with "select ... into #tmp". If that's fast, then the problem is not the query.
The index name in the screenshot is not the same as in create table statement, but I assume that's just a name you changed for the question. If the scan is happening to another index, then you should include that too.

MAX keyword taking a lot of time to select a value from a column

Well, I have a table which is 40,000,000+ records but when I try to execute a simple query, it takes ~3 min to finish execution. Since I am using the same query in my c# solution, which it needs to execute over 100+ times, the overall performance of the solution is deeply hit.
This is the query that I am using in a proc
DECLARE #Id bigint
SELECT #Id = MAX(ExecutionID) from ExecutionLog where TestID=50881
select #Id
Any help to improve the performance would be great. Thanks.
What indexes do you have on the table? It sounds like you don't have anything even close to useful for this particular query, so I'd suggest trying to do:
CREATE INDEX IX_ExecutionLog_TestID ON ExecutionLog (TestID, ExecutionID)
...at the very least. Your query is filtering by TestID, so this needs to be the primary column in the composite index: if you have no indexes on TestID, then SQL Server will resort to scanning the entire table in order to find rows where TestID = 50881.
It may help to think of indexes on SQL tables in the same way as those you'd find in the back of a big book that are hierarchial and multi-level. If you were looking for something, then you'd manually look under 'T' for TestID then there'd be a sub-heading under TestID for ExecutionID. Without an index entry for TestID, you'd have to read through the entire book looking for TestID, then see if there's a mention of ExecutionID with it. This is effectively what SQL Server has to do.
If you don't have any indexes, then you'll find it useful to review all the queries that hit the table, and ensure that one of those indexes is a clustered index (rather than non-clustered).
Try to re-work everything into something that works in a set based manner.
So, for instance, you could write a select statement like this:
;With OrderedLogs as (
Select ExecutionID,TestID,
ROW_NUMBER() OVER (PARTITION BY TestID ORDER By ExecutionID desc) as rn
from ExecutionLog
)
select * from OrderedLogs where rn = 1 and TestID in (50881, 50882, 50883)
This would then find the maximum ExecutionID for 3 different tests simultaneously.
You might need to store that result in a table variable/temp table, but hopefully, instead, you can continue building up a larger, single, query, that processes all of the results in parallel.
This is the sort of processing that SQL is meant to be good at - don't cripple the system by iterating through the TestIDs in your code.
If you need to pass many test IDs into a stored procedure for this sort of query, look at Table Valued Parameters.

Need help with Sql Server Full Text Search problem

I have a Full Text Catalog on single table, with three fields defined :-
TABLE: Animals
Fields: Name, Breed, LatinName.
Now, the Catalog seems to be working perfectly.
eg.
CREATE FUNCTION AnimalSearch
(
#Name NVARCHAR(200)
) RETURNS TABLE AS
RETURN
(
SELECT KEY_TBL.[Key] as Name,
KEY_TBL.RANK as Relevance
FROM CONTAINSTABLE(Animals, Name, #Name) AS KEY_TBL
)
Now, when i run this, i get the following results :-
Name = ma (no results)
Name = mat (no results)
Name = matt (1 result - correct).
SELECT * FROM [dbo].[AnimalSearch]('ma')
Is this the correct way to use this? I've also tried replacing CONTAINSTABLE with FREETEXTTABLE .. same thing .. no results.
Any ideas, anyone?
Edit
I understand that this could be achieved in a stored proc. I'm was hoping to do this as a Table-Valued Function, so i could use this in some Linq2Sql. If it's really unperformant, then please say so.
Not sure it's a good idea. Table valued functions do not store statistics, so performance may suffer.

Resources