SQL Server Free Text Search vs In clause - sql-server

I am currently using IN clause on a varchar field. Will using Contains of FTS help in performance?
For e.g.
Select * from Orders where City IN (‘London’ , ‘New York’)
vs
Select * from Orders where Contains (City, ‘London or New York’)
Thanks in advance.

Table Definition
CREATE TABLE Orders(ID INT PRIMARY KEY NOT NULL IDENTITY(1,1),City VARCHAR(100))
GO
INSERT INTO Orders
VALUES ('London'),('Newyork'),('Paris'),('Manchester')
,('Liverpool'),('Sheffield'),('Bolton')
GO
Create FTS on City Column using ID as the key
Used SSMS to create FTS Index.
Queries
-- Query 1
Select * from Orders
where City IN ('London' , 'NewYork')
GO
-- Query 2
Select * from Orders where
Contains (City, '"London" or "NewYork"')
GO
Execution Plans for both queries
As you can see The Query which used FTS costed 3 times more than the query which used IN Operator.
Having said this, when it comes to find Language specific terms in sql server FTS is the way to go, for example looking for Inflectional forms , Synonymous and much more Read Here for more information.

Related

Ignore Dash (-) from Full Text Search (FREETEXTTABLE) search column in SQL Server

I use CONTAINSTABLE for my searching algorithm. I want to search column value with ignoring dash in particular column value. for example, column contains '12345-67' then it should search with '1234567' as below query.
SELECT *
FROM table1 AS FT_Table
INNER JOIN CONTAINSTABLE(table2, columnname, '1234567') AS Key_Table ON FT_Table.ID = Key_Table.[Key]
Is there any way to ignore dash (-) while searching with string that doesn't contain a dash (-)?
I did some digging and spent a few hours time :)
Unfortunately, there is no way to perform it. Looks like Sql Server FTS populate the words by breaking words (except whitespaces) also special characters( -, {, ( etc.)
But it doesn't populate complete word and my understanding there is no way to provide some population rules for satisfy the need. (I mean, telling to population service, If the word contains "-" replace it with "".)
I provided an example for clarify the situation.
Firstly, create table, FTS catalog, Full text index and insert sample row for table.
CREATE TABLE [dbo].[SampleTextData]
(
[Id] int identity(1,1) not null,
[Text] varchar(max) not null,
CONSTRAINT [PK_SampleTextData] PRIMARY KEY CLUSTERED
(
[Id] ASC
)
);
CREATE FULLTEXT CATALOG ftCatalog AS DEFAULT;
CREATE FULLTEXT INDEX ON SampleTextData
(Text)
KEY INDEX PK_SampleTextData
ON ft
INSERT INTO [SampleTextData] values ('samp-le text')
Then, provide sample queries;
select * from containstable(SampleTextData,Text,'samp-le') --Success
select * from containstable(SampleTextData,Text,'samp') --Success
select * from containstable(SampleTextData,Text,'le') --Success
select * from containstable(SampleTextData,Text,'sample') -- Fail
These samples are successfully except one 'Samp-le'. For investigating the situtation, execute this query;
SELECT display_term, column_id, document_count
FROM sys.dm_fts_index_keywords (DB_ID('YourDatabase'), OBJECT_ID('SampleTextData'))
Output :
le 2 1
samp 2 1
samp-le 2 1
text 2 1
END OF FILE 2 1
The query gives us word results which are populated by FTS population service. As you see, the population results contain 'le', 'samp', 'samp-le' but not 'sample'. This is the reason how sample query getting failed.

SQL query runs into a timeout on a sparse dataset

For sync purposes, I am trying to get a subset of the existing objects in a table.
The table has two fields, [Group] and Member, which are both stringified Guids.
All rows together may be to large to fit into a datatable; I already encountered an OutOfMemory exception. But I have to check that everything I need right now is in the datatable. So I take the Guids I want to check (they come in chunks of 1000), and query only for the related objects.
So, instead of filling my datatable once with all
SELECT * FROM Group_Membership
I am running the following SQL query against my SQL database to get related objects for one thousand Guids at a time:
SELECT *
FROM Group_Membership
WHERE
[Group] IN (#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, #Guid5, ..., #Guid999)
The table in question now contains a total of 142 entries, and the query already times out (CommandTimeout = 30 seconds). On other tables, which are not as sparsely populated, similar queries don't time out.
Could someone shed some light on the logic of SQL Server and whether/how I could hint it into the right direction?
I already tried to add a nonclustered index on the column Group, but it didn't help.
I'm not sure that WHERE IN will be able to maximally use an index on [Group], or if at all. However, if you had a second table containing the GUID values, and furthermore if that column had an index, then a join might perform very fast.
Create a temporary table for the GUIDs and populate it:
CREATE TABLE #Guids (
Guid varchar(255)
)
INSERT INTO #Guids (Guid)
VALUES
(#Guid0, #Guid1, #Guid2, #Guid3, #Guid4, ...)
CREATE INDEX Idx_Guid ON #Guids (Guid);
Now try rephrasing your current query using a join instead of a WHERE IN (...):
SELECT *
FROM Group_Membership t1
INNER JOIN #Guids t2
ON t1.[Group] = t2.Guid;
As a disclaimer, if this doesn't improve the performance, it could be because your table has low cardinality. In such a case, an index might not be very effective.

Tuning Select statement to obtain faster results

I have benefited from this website for a long time now. This is my first question on the site. It is regarding performance tuning a reporting query. Here it goes.
1.
SELECT Count(b1.primkey)
from tableA b1 --WITH (NOLOCK)
join tableA b2 --WITH (NOLOCK)
on b1.email = b2.email
and DateDiff(day, b2.BookedDate , b1.BookedDate) > 1
tableA has around 7 million rows. Email is a varchar(100) field. Bookeddate is a datetime field. primkey is a primary key column that is an int.
My purpose of writing this query is to find out the count entries that have same email ids but have come in one day late. This query take about 45 minutes to run. I really want to reduce the time it takes to execute.
Since this is for reporting, i tried in vain to use --WITH (NOLOCK) option to improve the read time. I have a column store index on tableA and I know that it is being used by the SQL optimizer - can see in the execution plan. I am using SQL Server 2012.
Can someone tell me in such a case, what would be better? Using a nonclustered index on email or a nonclustered columnstore index on tableA?
Please help me.
Your query is relatively complex. You are essentially joining two tables that have 7 million records each on a column that is not unique.
How about the following query instead:
select Email
from TableA
group by Email
having MAX(BookedDate) > MIN(BookedDate) + 1
Also make sure you have an index with Email and BookedDate.
Hope this helps.
You have 3 options here:
Create clustered index on email field at least for a larger table.
But I suppose there are other queries running on these tables, and
clustered index is needed on other fields
Move emails to another table, and store email id's in TableA and
TableB; join on int field would be much faster than on varchar
fields
Create indexes on email fields with included columns BookedDate (no
need to include primkey, you can count on another field, or count(*). Code: create index idx_email on TableA include(BoodedDate)
I think that third option is the one you should go with. There's not much work to be done, and there will be great performance gain. The only problem is that index on varchar field will take a lot of space and impact insert/update operations; but you said that this is a reporting db, so I think you can allow that.

MAX keyword taking a lot of time to select a value from a column

Well, I have a table which is 40,000,000+ records but when I try to execute a simple query, it takes ~3 min to finish execution. Since I am using the same query in my c# solution, which it needs to execute over 100+ times, the overall performance of the solution is deeply hit.
This is the query that I am using in a proc
DECLARE #Id bigint
SELECT #Id = MAX(ExecutionID) from ExecutionLog where TestID=50881
select #Id
Any help to improve the performance would be great. Thanks.
What indexes do you have on the table? It sounds like you don't have anything even close to useful for this particular query, so I'd suggest trying to do:
CREATE INDEX IX_ExecutionLog_TestID ON ExecutionLog (TestID, ExecutionID)
...at the very least. Your query is filtering by TestID, so this needs to be the primary column in the composite index: if you have no indexes on TestID, then SQL Server will resort to scanning the entire table in order to find rows where TestID = 50881.
It may help to think of indexes on SQL tables in the same way as those you'd find in the back of a big book that are hierarchial and multi-level. If you were looking for something, then you'd manually look under 'T' for TestID then there'd be a sub-heading under TestID for ExecutionID. Without an index entry for TestID, you'd have to read through the entire book looking for TestID, then see if there's a mention of ExecutionID with it. This is effectively what SQL Server has to do.
If you don't have any indexes, then you'll find it useful to review all the queries that hit the table, and ensure that one of those indexes is a clustered index (rather than non-clustered).
Try to re-work everything into something that works in a set based manner.
So, for instance, you could write a select statement like this:
;With OrderedLogs as (
Select ExecutionID,TestID,
ROW_NUMBER() OVER (PARTITION BY TestID ORDER By ExecutionID desc) as rn
from ExecutionLog
)
select * from OrderedLogs where rn = 1 and TestID in (50881, 50882, 50883)
This would then find the maximum ExecutionID for 3 different tests simultaneously.
You might need to store that result in a table variable/temp table, but hopefully, instead, you can continue building up a larger, single, query, that processes all of the results in parallel.
This is the sort of processing that SQL is meant to be good at - don't cripple the system by iterating through the TestIDs in your code.
If you need to pass many test IDs into a stored procedure for this sort of query, look at Table Valued Parameters.

When to use with clause in sql

Can Anybody tell me when to use with clause.
The WITH keyword is used to create a temporary named result set. These are called Common Table Expressions.
A very basic, self-explanatory example:
WITH Administrators (Name, Surname)
AS
(
SELECT Name, Surname FROM Users WHERE AccessRights = 'Admin'
)
SELECT * FROM Administrators
For further reading and more examples, I suggest starting out with the following MSDN article:
Common Table Expressions by John Papa
In SQL Server you sometimes need the WITH clause to force a query to use an Index. This is often a necessity in spatial queries that can reduce query time from 1 minute to a few seconds.
select * from MyTable with(index(MySpatialIndex)) where...

Resources