Parsing search queries for SQL 2008 FTS

Parsing search queries for SQL 2008 FTS - sql-server

We want to use SQL SERVER 2008 Full Text Search and seem to run into a lot of problems handling the search query.
If the user types in "blue dog" it just crashes sql unless we parse the search terms to include the "" around the words but that makes it a phrase instead of keywords.
I want results where blue or dog are included but that means replacing spaces with or(s) and so on. Unfortunately there seem to be far too many combination a user might type.
Are there any libraries out there (for .net) that can already parse a search string into something FT understands?
We'd like a Google like syntax :)
thanks

I was looking for the "FREETEXT" option and was using the "CONTAINS" keyword instead, my bad. Freetext is giving me the results I wanted.

Related

Querytype=Full and searching for stop words returns no results

When using azure cognitive search, we are using full query syntax. When searching for something like: the document we create a query like this (this is a simplified example):
(Title:the OR Contents:the) AND (Title:document OR Contents:document)
(we need to split up the query for unrelated reasons)
The problem is that the could be a stopword in the language we are searching in (we search in several languages), causing the entire query to fail. We would like to be able to ignore stop words in generating queries like this, of have the search engine simply return true for the specific stop word search parts
I figure the latter is not possible. (or is it?). Might there be a way to query the stop words for specific language analyzers so we can exclude the stop words ourselves? Or is there a way to alter out query to be able to handle stop words better?

If you want to strip stop words from your search query the only thing I can think of is calling the analyzer with the search query and check the returned tokens.
In this example you would call the en.microsoft analyzer with the search query "the document".
The tokens returned only contain "document", so you know "the" is considered a stop word by the analyzer. But when searching multiple languages you might need to call multiple analyzers and strip stop words for all those languages.

Azure search: Wild card queries does not work with japanese/chinese characters

I used icu_tokenizer using custom analyzer to create a search index for Japanese words. Index was created successfully. Using icu_tokenizer as for asian languages it works better than the default azure search tokenizer.
Now when I use query for string Ex:- 赤城 I see multiple search results (total 131) from the index. But when I use the wild card search with the same word, Ex: 赤城* (adding * at the end of the word) or /赤城.*/ (using regex search query) i see 0 search results. The weird part is that * seems to work with single japanese character 赤* gives me same number of search results as 赤 gives. But as soon as I increase the number of japanese characters from 1, wild card queries with * stops working and returns 0 search result. All of these queries I am testing it on search explorer on Azure portal using querytype=full (lucene syntax query)
In my application search terms are normally used as prefix search so normally we append * at the end of the search string to fetch search results but looks like these lucene wildcard queries with japanse characters just do not work. Any idea, how can I make these prefix queries (using wildcard * at end of search strings) work when search strings are given in japanese characters?
Any quick help will be much appreciated!!

I tested with my installation now and I can confirm that wildcards only work with Japanese content when you use a Japanese analyzer.
In my example I set up one index using a property Body that does not have a specific analyzer defined. Then I set up another index where Body uses the ja.microsoft language analyzer. The content in both indexes are identical. I then tried to search for 自動車 (automobile) with a trailing wildcard.
自動車* returns multiple hits from my index using the japanese analyzer. No hits are returned from the index without a specific analyzer defined.

sorry for the late reply.
Have you tried using one of the Japanese language analyzers? For example, ja.microsoft
Also, if you want to use prefix search, you can try experimenting with the suggester feature which is designed to be efficient for this scenario.

Azure search contains word not working as expected

I am new to Azure Search. I am trying to use "contains" logic in my search query. I looked it up and found out that I need to add something like following in my search query.
&queryType=full&search=/.*_search.*/
where _search in the string I want to search. Now what happens is that the "contains" logic works fine. For example, I try to search sweep and I get well sweep-cmu in the results.
But, when I search well sweep-cmu, I get zero results. Why? and how can I improve my query to get results when I enter partial and full strings.

If you want exact match for the search query please surround the query with double quotes.
eg: "well sweep-cmu"
This will return all documents which contain the exact phrase.
Since you've just started to play with Azure Search you might find this article particularly interesting. It explains how the full text search works in Azure Search.
https://learn.microsoft.com/en-us/azure/search/search-lucene-query-architecture
In order to get results for partial terms, you should use wildcard expressions in your search queries. The above article explains this in detail.
PS: Some wildcard queries can be very expensive and hence slow.

SQL Server, Full text search word breakers

In the sql server documentation for Full Text Search, and validated in production sadly, searching using english language the system will match exact phrases ignoring punctuation between words.
Books online says:
Punctuation is ignored. Therefore, CONTAINS(testing, "computer
failure") matches a row with the value, "Where is my computer? Failure
to find it would be expensive."
Is there a word breaker for english that doesn't ignore punctuation so rows like their example would not be returned?

That is the limitation of FTS or say good thing of FTS. FTS is used to fast search as well as this type of search where you don't know exact string.
If you want exact or ignoring this type of thing, you have to use Like search rather than FTS.

SQL Server CONTAINS and highlighting the matches

Contains() with FORMSOF() is great for trying to capture the user's intent while searching, but is there any way to highlight the matches.
If I search for "said", it might return texts containing "says, say, spoke" etc. Is there a way a way I can highlight the match in the results, or Is there a way to surround the match with underscores? So I might get
She _says_ yes.
I _say_ my name.
We _spoke_ for hours but he didn't _say_ much.
I've considered an after-the-fact (client-side) regex solution that would esentially remove commmon word endings like (e|ed|es|s|ing) and then look for my results with with all those options (so bakes would become bak and then I'd search for bak[a-z]?(s|d|es|ed|ing) and that works okay for words like that, but there's a whole lot of cases where the past tenses don't follow that formula, like speak vs spoke and spake.

There are two SQL Server functions that can help you with this:
The Soundex function help you to compare similar words.
and the difference function helps you to evaluate the difference.