MS SQL CONTAINS with Ampersand Search - sql-server

I have a FULLTEXT column called 'name', which may contain multiple words per row. Using CONTAINS, I can search for a variable number of words, and it works, except when one of the words has an ampersand in it.
Example: name = 'Bob Brown AB&CD'
This works:
CONTAINS(name, '"*Bo*" AND "*Br*"')
These do not (any search of the word with the ampersand):
CONTAINS(name, '"*AB*"')
CONTAINS(name,'"*AB&CD*"')
CONTAINS(name,'"*&*"')
I realize why the last search wouldn't work, since CONTAINS only searches from the beginning of words.
Is the ampersand being processed by FULLTEXT as a word-break? If so, that might explain why a search including the ampersand would fail. But it wouldn't explain why
CONTAINS(name, '"*AB*"')
CONTAINS(name, '"*CD*"') (assuming FULLTEXT sees "CD" as a different word)
would fail.
How can I mitigate the search so it returns the row if I search for "AB&CD" or just "AB" or just "CD"?
P.S. Dynamic SQL is not an option, so I cannot concatenate a variable number of LIKEs.

I found my answer, apparently. I set
STOPLIST = OFF
on my FULLTEXT index, and I can now search by "AB" and "AB&CD" successfully.

Related

SQL Contains() with a single word does not return all expected rows

I am running a simple SQL contains search and the result does not include all expected rows.
All I need is a search that works in the same way as LIKE %word%.
SELECT *
FROM [MyTable] where contains(Name, N'walmart')
By running this select, the rows returned seem to only contains name with "walmart" with either space before or after or some kind of other delimiter like a period ("walmart.com"). No problem here.
But one row was not returned and its value is "mywalmart". Why does this row fail to return with the contains search? If I use name LIKE '%walmart%' it works just fine.
What do I need to fix the contains search to make it work?
Contains doesn't work the way you think, for your purposes LIKE is best.
See the docs on this:
CONTAINS can search for:
A word or phrase.
The prefix of a word or phrase.
A word near another word.
A word inflectionally generated from another (for example, the word drive is the inflectional stem of drives, drove, driving, and driven).
A word that is a synonym of another word using a thesaurus (for example, the word "metal" can have synonyms such as "aluminum" and "steel").

Multi-word CONTAINS full-text search only working partially in SQL Server

I'm using SQL Server 2012 and have created full-text index for NAME column in COMPANY table. All the searches I've tested are of the following format (with variable number of words to search), matching by beginnings of words in any order:
select id, name from company where contains(name, '"ka*" AND "de*"')
The problem is that there are cases where this query doesn't return any results even though it should be perfect match. For example when company name is "ka de we oy", the example above returns a match but '"ka*" AND "de*" AND "we*"' does not and neither does searching with all the four 'words'.
There are also other cases where, strangely enough, the search does not return results even with exact words. This seems related to very short (two-letter) words. There are also some issues with searching with many (6+) words.
Is there some explicit restriction to the number of words in a single query or how short they can be? How can I fix or work around this?
Edit: it seems to be certain common English words which are entirely excluded from the index (like 'we' in the example). This is an issue since it's a requirement that a few of the common words definitely should be searchable. Is there any way to change which words are not indexed or e.g. change the 'language' of the indexing to apply different set of common words that are left out?
Apparently this is simply a case of defining correct stopwords / stoplist:
https://msdn.microsoft.com/en-us/library/ms142551.aspx
https://msdn.microsoft.com/en-us/library/cc280405.aspx
Or setting the full-text index language for the column to the actual language so that English words don't cause issues.
Edit: actually it was easiest to simply disable the stoplist for the table entirely:
ALTER FULLTEXT INDEX ON company SET STOPLIST = OFF
Hopefully this helps someone else

Matching words in close proximity

I have a table in SQL 2012 that I'm performing a full text search on.
One of the records has, as a part of a larger string, the text 'Trying out your system'.
The problem is that, if I search for two words in the target string which are too close together, I don't get a match.
select * from mytable where contains(*,'trying') -- match
select * from mytable where contains(*,'trying and out') -- no match
select * from mytable where contains(*,'trying and your') -- no match
select * from mytable where contains(*,'trying and system') -- match
I'm aware that I can search for an exact string by enclosing the search pattern in double quotes, however that's not really what I'm after.
Any suggestions how I can make all of the above search terms match?
Thanks.
This sounds like an issue with stopwords (common words like "the", "your", etc. that are usually filtered out of the full text index, thus you cannot search on them).
To prevent this from happening, you can modify your full text index so that it does not use a stoplist (in other words, every single word will be indexed and thus searchable).
ALTER FULLTEXT INDEX ON MyTable SET STOPLIST = OFF
Be sure to rebuild the full text catalog afterwards.
But only do this if you really need the ability to search on common words. Typically this is not necessary. Also, doing so may slow down your full text searches.

Full text search - Contains plus wildcard and single quote

I have a table with a name field with this
Test O'neill 123
If I use
SELECT *
FROM table F
WHERE CONTAINS ( F.*, '"Test O''neill 123"' )
it works fine but if I use a wildcard * I get no results.
SELECT *
FROM table f
WHERE CONTAINS ( F.*, '"Test O''neill 123*"' )
why is this ?
I am using a parser for my search terms and this is adding the wildcard *
I checked some sites, about escaping the ' but I haven't found anything referred to this..
Thanks in Advance
The problem is due to the combination of 1) using the Neutral language 2) plus a stoplist for your full text index 3) plus unexpected behavior when using a wildcard in a search that includes stopwords.
The Neutral language doesn't cover all of the nuances of the English language, so at index-time it considers O'neill to be 2 separate words O and neill. Then your stoplist considers O to be a stopword so this "word" is not added to the index, only neill is.
At search-time, the search engine typically ignores stop words in multi-word phrases. For example, searching for Contains(*, '"we x people"') will match the text ...we the people..., x and the both being stopwords and thus automatically "matching" each other. (I use the term "matching" loosely because the search engine is not matching the stopwords, but rather it knows that people is 1 word away from we.)
So you might expect the wildcard search Contains(*, '"we the people*"') to also find its match, except that it does not when using a stoplist. If it weren't for the stopword the in the search phrase, or if the was not considered a stopword, the search would work fine. I really can't explain this behavior but I suspect it has something to do with the way the word positions are computed. I also suspect it is not the intended behavior.
So back to your case, Contains(*, '"Test O''neill 123"') will find a match but the wildcard search Contains(*, '"Test O''neill 123*"') does not. (You can even simplify the search to Contains(*, '"O''neill*"') and you'll see that it still does not find a match.) The combination of the stopword O with a wildcard runs into the problem I explained in the last paragraph. This is the crux of the problem stated in your question.
Solutions ranging from most-effective to least-effective-but-possibly-more-practical-for-your-case:
1) Change the language on your full text index to English and re-index. This will cause O'neill to be treated as 1 word and thus you'll avoid the weird wildcard behavior that I explained. You can change the language in the full text index properties via SQL Server Management Studio or by dropping and recreating the index as follows:
ALTER FULLTEXT INDEX ON MyTable DROP (Column1)
GO
ALTER FULLTEXT INDEX ON MyTable ADD (Column1 LANGUAGE [English])
-- repeat for each column in the index
2) If you need to keep using the Neutral language, consider removing O from your stoplist and re-index.
ALTER FULLTEXT STOPLIST MyStoplist DROP 'o' LANGUAGE 'Neutral';
3) Or don't use a stoplist if you don't need one.
ALTER FULLTEXT INDEX ON MyTable SET STOPLIST = OFF
4) If none of the above solutions are practical, consider removing stopwords from the search phrase, or at least the O' prefix in surnames.

How can I permit the Full-Text Search to search for conjuction?

When I do a search like: People in England, the full-text search engine ignores the all search and returns 0 results. I think It is because It separetes each word ("People", "in" and "England") and ignores the "in" word because It may return many results.
I don't want the exact word ("People in England") but I'd like to find in the same text the words People, in and England.
You want "in" keyword (like OR, AND, ...)be considered a simple word in criteria, right?
You must configure stopword for your fulltext query, here is as link about it:
https://dba.stackexchange.com/questions/44032/searching-for-keywords-in-fulltext-indexes-using-the-contains-function

Resources