How to search for similar words in SQL Server - sql-server

I am using CONTAINS and FREETEXT on SQL query to search for text in big text fields.
What I noticed that the search returns result when the exact word match, but what if I want to search for similar words?
For example, when I type Carlo, it did not display anything if what I have is Carlos (with an S)
Below is a simple query similar to the one I use:
SELECT P.*
FROM MyTable AS P
WHERE(CONTAINS(P.*, 'Carlo') OR freetext(P.*, 'Carlo'))
How can I make the search bring similar words to Carlo such as Carlos, Carla, etc... without affecting the performance?

Try this
SELECT P.*
FROM MyTable AS P
WHERE CONTAINS(P.*, 'FORMSOF(INFLECTIONAL, "Carlo")')
For reference you can check documentation

Related

Optimising keyword retrieval using SQL Server's Full-Text Search

I have a SQL Server Full-Text Search indexing two columns in one of my tables.
I am pulling out suggested keywords from a web front-end based on user's input. Such that entering a phrase like 'ban' would yield words such as banana, banish, urban, husband, etc. The user would then click on one of these words to confirm their choice, or add further letters to narrow down their search.
I have the following number of total keywords, as shown by the following query:
SELECT COUNT(*) FROM sys.dm_fts_index_keywords ( DB_ID(), OBJECT_ID('Search'))
217,998
So, when querying the keywords I have a query like below:
SELECT TOP 10 *, display_term, document_count
FROM sys.dm_fts_index_keywords ( DB_ID(), OBJECT_ID('Search'))
WHERE column_id=5
AND keyword != 0xFF
AND display_term like '%ban%'
AND display_term NOT LIKE 'nn%'
However, this currently takes circa 30 seconds to run! Clearly this is far too slow to be of any use.
So, as a way of a work around I have created my own keywords table to store my keywords. Whenever I add content to my full-text search table, I run a query below to find out which keywords will be indexed:
SELECT display_term AS Term, COUNT(display_term) AS [Count]
FROM sys.dm_fts_parser('"There are many types of fruit, including apples, bananas and cherries." ', 1033, 0, 0)
WHERE display_term NOT LIKE 'nn%'
AND special_term NOT IN ('Noise Word', 'End of Sentence')
GROUP BY display_term
I then take these words and store them into my own keywords table, for later use by the web front end described above. This is much quicker.
However, I can't help feeling that I shouldn't need to create a workaround and that finding keywords is something that many people would need to be doing.
I have searched for other methods, tables, or other functionality contained within SQL Server, but all to no avail.
I have also looked into indexing the sys.dm_fts_index_keywords table. However, searching for the word "indexing" is problematic due to the nature of subject matter.
Does anyone have another method that is quick to execute, and hopefully also requires less programmatic intervention?

Finding on fulltext search troubles

I'm working with fulltext indexing in SQL Server, but I have the following trouble:
In my indexed column, there are words and numeric codes that I need to find. For example: 069-8987.15
The users will be searching by the literal code, hybrid way or without the special characters, like 069-8987.15 or 069-898715 or 069898715.
I can do that just in the first case.
SELECT [Key], [Rank]
FROM CONTAINSTABLE(dbo.History, Report, '*069-8987.15*')
If I try to use the others, I can't return anything.
How can I fix this? What do I need to do to return the data using the three search ways?

SQL Server Free Text Search with multiple search words inside a stored procedure

I am trying to do a free text search. basically the search string is being sent to a stored procedure where it executes the free text search and returns the result.
If I search for red flag, I want to return the results that matches both red and flag text.
Below is the query I use to return the results.
select * from customer where FREETEXT (*, '"RED" and "flag"')
This doesn't give me the desired result. Instead this one give the desired result.
select * from customer where FREETEXT (*, 'RED') AND FREETEXT (, 'FLAG')
My problem is since it's inside a stored procedure, I will not be able to create the second query where clause. I thought both query should return the same result. Am I doing something wrong here?
You need to use CONTAINS instead of FREETEXT:
select * from customer where CONTAINS(*, '"RED" and "flag"')
CONTAINS supports boolean syntax. FREETEXT does not -- it is more of a natural language type of search.

SQL Server Full Text Search Leading Wildcard

After taking a look at this SO question and doing my own research, it appears that you cannot have a leading wildcard while using full text search.
So in the most simple example, if I have a Table with 1 column like below:
TABLE1
coin
coinage
undercoin
select COLUMN1 from TABLE1 where COLUMN1 LIKE '%coin%'
Would get me the results I want.
How can I get the exact same results with FULL TEXT SEARCH enabled on the column?
The following two queries return the exact same data, which is not exactly what I want.
SELECT COLUMN1 FROM TABLE1 WHERE CONTAINS(COLUMN1, '"coin*"')
SELECT COLUMN1 FROM TABLE1 WHERE CONTAINS(COLUMN1, '"*coin*"')
Full text search works on finding words or stems of words. Thus, it does not find the word "coin" anywhere in "undercoin". What you seek is the ability search suffixes using full text searches and it does not do this natively. There are some hacky workarounds like creating a reverse index and searching on "nioc".

How do you boost term relevance in Sql Server Full Text Search like you can in Lucene?

I'm doing a typical full text search using containstable using 'ISABOUT(term1,term2,term3)' and although it supports term weighting that's not what I need. I need the ability to boost the relevancy of terms contained in certain portions of text. For example, it is customary for metatags or page title to be weighted differently than body text when searching web pages. Although I'm not dealing with web pages I do seek the same functionality. In Lucene it's called Document Field Level Boosting. How would one natively do this in Sql Server Full Text Search?
This is just a thought -- is it possible to isolate the part you need boosting and then add the two together? I haven't had time to put it together properly, but let's say you have a 'document' column and a computed 'header' column, you could do something like this;
with compoundResults([KEY], [RANK]) as
(
select
a.[key],
a.[rank] *0.7 + b.[rank] * 0.3
from FREETEXTTABLE(dbo.Docs, document, #term) a
inner join FREETEXTTABLE(dbo.Docs, header, #term) b
on a.[Key] = b.[Key]
)
select * from dbo.Docs c
LEFT OUTER JOIN compoundResults d
ON c.TermId = d.[KEY]
So this example uses freetexttable and not containstable, but the thing to note is that there is a CTE which selects a weighted rank, taking seven tenths from the document body and three tenths from the header.
The native functionality you're looking for doesn't exist in SQL Server FTS.
What does your data look like? Would it work to do extend the keyword patterns in some way, so that they match the corresponding parts of the document? Something like:
ISABOUT("title ~ keyword ~ title" weight 0.8, "keyword" 0.2)

Resources