SQL Server Fulltext search not finding my rows - sql-server

I have a SQL Server table and I'm trying to make sense of fulltext searching :-)
I have set up a fulltext catalog and a fulltext index on a table Entry, which contains among other columns a VARCHAR(20) column called VPN-ID.
There are about 200'000 rows in that table, and the VPN-ID column has values such as:
VPN-000-359-90
VPN-000-363-85
VPN-000-362-07
VPN-000-362-91
VPN-000-355-55
VPN-000-368-36
VPN-000-356-90
Now I'm trying to find rows in that table with a fulltext enabled search.
When I do
SELECT (list of columns)
FROM dbo.Entry
WHERE CONTAINS(*, 'VPN-000-362-07')
everything's fine and dandy and my rows are returned.
When I start searching with a wildcard like this:
SELECT (list of columns)
FROM dbo.Entry
WHERE CONTAINS(*, 'VPN-000-362-%')
I am getting results and everything seems fine.
HOWEVER: when I searching like this:
SELECT (list of columns)
FROM dbo.Entry
WHERE CONTAINS(*, 'VPN-000-36%')
suddenly I get no results back at all..... even though there are clearly rows that match that search criteria...
Any ideas why?? What other "surprises" might fulltext search have in store for me? :-)
Update: to create my fulltext catalog I used:
CREATE FULLTEXT CATALOG MyCatalog WITH ACCENT_SENSITIVITY = OFF
and to create the fulltext index on my table, I used
CREATE FULLTEXT INDEX
ON dbo.Entry(list of columns)
KEY INDEX PK_Entry
I tried to avoid any "oddball" options as much a I could.
Update #2: after a bit more investigation, it appears as if SQL Server Fulltext search somehow interprets my dashes inside the strings as separators....
While this query returns nothing:
SELECT (list of columns)
FROM dbo.Entry
WHERE CONTAINS(*, '"VPN-000-362*"')
this one does (splitting up the search term on the dashes):
SELECT (list of columns)
FROM dbo.Entry
WHERE CONTAINS(*, ' "VPN" AND "000" AND "362*"')
OK - seems a bit odd that a dash appears to result in a splitting up that somehow doesn't work.....

which Language for Word Breaker do you use? Have you tried Neutral?
EDIT:
in adition you should use WHERE CONTAINS([Column], '"text*"'). See MSDN for more information on Prefix Searches:
C. Using CONTAINS with
The following example returns all
product names with at least one word
starting with the prefix chain in the
Name column.
USE AdventureWorks2008R2;
GO
SELECT Name
FROM Production.Product
WHERE CONTAINS(Name, ' "Chain*" ');
GO
btw ... similar question here and here

Just wondering, but why don't you just do this:
SELECT (list of columns)
FROM dbo.Entry
WHERE [VPN-ID] LIKE 'VPN-000-36%'
It seems to me that fulltext search is not the right tool for the job. Just use a normal index on that column.

Related

Remove double quotes from fts keywords?

I need indexing JSON values with fulltext index.
In Oracle:
CREATE INDEX index_name ON tab_name (json_col_name)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS ('section group CTXSYS.JSON_SECTION_GROUP SYNC (ON COMMIT)');
In SQL Server:
CREATE FULLTEXT INDEX ON tab_name (json_col_name)
KEY INDEX primary_key_name
ON ft_cat_name
[other options...];
Both of indexes are created successfully. But when I do some query with those indexes, I take some troubles in SQL.
When I try to find the reason, I found that's caused by word-breaker.
Word-breaker keeps unnecessary characters (double quote, colon) of all FIELDS and VALUES in json text as INDEX KEYWORDS.
SELECT * FROM sys.dm_fts_index_keywords (DB_ID('db_name'), OBJECT_ID('tab_name'))
Is there anybody faced this problem? And how to resolve?
I want to know how to config word-breakers to remove unnecessary characters from keywords when populate indexes in SQL.
I've tried indexing these columns with neutral language (LCID = 0) and all unnecessary symbols in JSON string are removed from the keywords.
CREATE FULLTEXT INDEX ON tab_name (json_col_name language 0)
KEY INDEX primary_key_name
ON ft_cat_name
[other options...];

Fulltext Contains does not return rows

I have a row in a table that contains "DS012345" in a column called description
When I use this query:
Select * from Tablename where Contains(Description, ' "*012345*" ')
This query returns no result.
I have created the unique index, fulltext catalog, I have turned off the Stop Words using the Object Explorer. Still do not know why it does not return that row.
Any suggestion or cause for this?
Thanksl.
Why not just use LIKE instead to do a search.
Select * from Tablename where Description LIKE '%012345%'
Just does a search where 012345 appears anywhere within the description column.
Stop words is the number that it starts to seek for a word in your database..
Fulltext should be used to get the exact word, if you just want a part of the word you should use LIKE %...%.

Full text search asterisk returns wrong result

I have a Ship table with FTS index, which was created as:
CREATE FULLTEXT INDEX ON Ship
(
Name
)
KEY INDEX PK_Ship_Id
ON MyCatalog
WITH CHANGE_TRACKING AUTO, STOPLIST OFF;
And when I run query bellow:
select Name From Ship where CONTAINS(Name, N'"n*"');
I get wrong result, for instance "Vitamin D3 1000 Iu".
But I want get only rows where name field has any word starts with 'n' char.
FTS engine has strange 'feature', when you try find somethings as CONTAINS(Name, N'"n*"'), it searches all numbers because it keeps numbers as NN.
The best decision which was founded is in these two cases(CONTAINS(Name, N'"n*"'), CONTAINS(Name, N'"nn*"')) use "like" search.

LIKE vs CONTAINS on SQL Server

Which one of the following queries is faster (LIKE vs CONTAINS)?
SELECT * FROM table WHERE Column LIKE '%test%';
or
SELECT * FROM table WHERE Contains(Column, "test");
The second (assuming you means CONTAINS, and actually put it in a valid query) should be faster, because it can use some form of index (in this case, a full text index). Of course, this form of query is only available if the column is in a full text index. If it isn't, then only the first form is available.
The first query, using LIKE, will be unable to use an index, since it starts with a wildcard, so will always require a full table scan.
The CONTAINS query should be:
SELECT * FROM table WHERE CONTAINS(Column, 'test');
Having run both queries on a SQL Server 2012 instance, I can confirm the first query was fastest in my case.
The query with the LIKE keyword showed a clustered index scan.
The CONTAINS also had a clustered index scan with additional operators for the full text match and a merge join.
I think that CONTAINS took longer and used Merge because you had a dash("-") in your query adventure-works.com.
The dash is a break word so the CONTAINS searched the full-text index for adventure and than it searched for works.com and merged the results.
Also try changing from this:
SELECT * FROM table WHERE Contains(Column, "test") > 0;
To this:
SELECT * FROM table WHERE Contains(Column, '"*test*"') > 0;
The former will find records with values like "this is a test" and "a test-case is the plan".
The latter will also find records with values like "i am testing this" and "this is the greatest".
I didn't understand actually what is going on with "Contains" keyword. I set a full text index on a column. I run some queries on the table.
Like returns 450.518 rows but contains not and like's result is correct
SELECT COL FROM TBL WHERE COL LIKE '%41%' --450.518 rows
SELECT COL FROM TBL WHERE CONTAINS(COL,N'41') ---40 rows
SELECT COL FROM TBL WHERE CONTAINS(COL,N'"*41*"') -- 220.364 rows

SQL Server Full Text Search Escape Characters?

I am doing a MS SQL Server Full Text Search query. I need to escape special characters so I can search on a specific term that contains special characters. Is there a built-in function to escape a full text search string? If not, how would you do it?
Bad news: there's no way. Good news: you don't need it (as it won't help anyway).
I've faced similar issue on one of my projects. My understanding is that while building full-text index, SQL Server treats all special characters as word delimiters and hence:
Your word with such a character is represented as two (or more) words in full-text index.
These character(s) are stripped away and don't appear in an index.
Consider we have the following table with a corresponding full-text index for it (which is skipped):
CREATE TABLE [dbo].[ActicleTable]
(
[Id] int identity(1,1) not null primary key,
[ActicleBody] varchar(max) not null
);
Consider later we add rows to the table:
INSERT INTO [ActicleTable] values ('digitally improvements folders')
INSERT INTO [ActicleTable] values ('digital"ly improve{ments} fold(ers)')
Try searching:
SELECT * FROM [ArticleTable] WHERE CONTAINS(*, 'digitally')
SELECT * FROM [ArticleTable] WHERE CONTAINS(*, 'improvements')
SELECT * FROM [ArticleTable] WHERE CONTAINS(*, 'folders')
and
SELECT * FROM [ArticleTable] WHERE CONTAINS(*, 'digital')
SELECT * FROM [ArticleTable] WHERE CONTAINS(*, 'improve')
SELECT * FROM [ArticleTable] WHERE CONTAINS(*, 'fold')
First group of conditions will match first row (and not the second) while the second group will match second row only.
Unfortunately I could not find a link to MSDN (or something) where such behaviour is clearly stated. But I've found an official article that tells how to convert quotation marks for full-text search queries, which is [implicitly] aligned with the above described algorithm.

Resources