Questions about SQL Server Full-Text Search Syntax - sql-server

I'm trying to work out some of the finer details of the Full-Text Search syntax. I have the basics down but have run up against the following questions.
Near the top of this page, it shows cases where double quotes are required and includes the example "oatmeal". Why does this need double quotes? Is this a typo?
Near the bottom of the same page, it states that any instances of AND, AND NOT, OR, or OR NOT should be wrapped in quotes in some cases. Why? Aren't these stop words anyway?
This page provides a number of examples that include quoted and unquoted search terms. What is the difference between CONTAINS(Column, 'term') and CONTAINS(Column, '"term"')?

1) Full text search uses double quotes as a text delimiter and its good practise to use them especially if you are likely to have more complex search terms. E.g. You can put phrases or words in double quotes.
"Oatmeal"
"Hot Oatmeal"
2) I think the logic in the "and and not or or not" section is for this is to find the boolean terms and wrap the content either side in quotes.
e.g. OatMeal or Hot Oatmeal
translate to
'"Oatmeal" or "hot oatmeal"'
rather than
'oatmeal "or" hot oatmeal'
3) It should be contains(column, 'terms') containstable would use the 3 term query containstable(table, column, 'terms') I have done some testing on a database here with 30k documents in it contains(,'"dentist"') and contains(,'dentist') returned the same 652 rows. Containstable returned them with the same ranking.

Related

Synonym Maps in Azure Search, synonym phrases

I'm trying to use synonym maps in Azure Search and i'm running into a problem. I want to have several words and phrases map into a single search query.
In other words, when i search for either:
product 123, product0123, product 0123
i want the search to return results for a query phrase:
product123.
After reading the tutorial it all seemed pretty straight forward.
I'm using .Net Azure.Search SDK 5.0 so i've done the following:
var synonymMap = new SynonymMap
{
Name = "test-map",
Format = SynonymMapFormat.Solr,
Synonyms = "product 123, product0123, product 0123=>product123\n"
};
_searchClient.SynonymMaps.CreateOrUpdate(synonymMap);
and i use the map on one of the search fields
index.Fields.First(x => x.Name == "Title").SynonymMaps = new[] {"test-map"};
So far so good. Now if i do a search for product0123 i get results for product123 as i would expect. But if i search for a phrase product 123 or product 0123 i get bunch of irrelevant results. It's almost as if the synonym maps do not work with multi word items.
So guess my question is, am i using synonym maps incorrectly or these maps only work with single word synonyms?
Are the phrases, product 123 or product 0123, in double quotes? It is required for the phrases to be in double quotes ("product 123"). Double quotes are the operators for phrase search and in the case for synonyms, they ensure that the terms in the phrase are analyzed and matched against the rules in the synonym map as a phrase. Without it, query parser separates the unquoted phrase to individual terms and tries synonym matching on individual terms. The query becomes product OR 123 in that case.
This documentation explains how queries are parsed (stage 1) and analyzed (stage 2). The application of synonyms in done in the second stage.
To answer your second question in the comment, unfortunately double quotes are required to match multi word synonyms. However, as an application developer, you have the full control of what gets passed to the search service. For example, given a query product 123 from the user, you can re-write the query under the hood to improve precision and recall before it gets passed to the search service. Phrasing or proximity searches can be used to improve precision and wildcard (such as fuzzy or prefix searches) can be used to improve recall of the query. You would rewrite the query product 123 to something like "product 123"~10 product 123 and synonyms will apply to the phrased part of the query.
Nate

Compound word search engine design

We have a search function using SQL Server's Full-Text Search. It is an any word search and works very well.
However, quotation marks around compound terms don't work with Full-Text Search.
So, currently a search for "peanut butter" returns peanut butter first, then peanuts and butter, etc.
We want the system to recognize certain compound terms and exclude all else.
So a search for: coffee ethiopian ground - would still perform an any word search.
However, a search for: ground coffee - would recognize the compound term and return only exact matches for "ground coffee".
Is the only way to do this to build your own dictionary of compound terms? Are there any other options?
Thanks, Jon
As long as you use CONTAINS or CONTAINSTABLE, SQL Server should honor your double quotes and match only compound word matches.
I suspect you are using FREETEXT or FREETEXTTABLE which performs more of a natural-language search and ignores double quotes.

plus sign in solr query is not correctly treated

All:
I am new to Solr, when I play with solr example with importing some random document, I use a search query in q like:
fund+report
There is no space between fund and +, and I thought it will search a word "fund+report" in the document, which rarely happen in document, but a lot results return, thequery url is:
http://localhost:8983/solr/collection1/select?q=fund%2Breport&fl=id+filename+%5Bexplain%5D&wt=xml&indent=true
I thought Solr treat my query just like:
fund report
or
fund OR report
Could anyone tell me why solr treat my query like that? And how can I make solr treat fund+report as a single word?
Thanks
The HTTP call will simply translate the + to a space . If you need an actual + sign then you need to use the URL-encoded value for + (which I think is %2B). If you are looking for the phrase fund report then you want to put double quotes around the phrase, e.g., "fund report". These should also be URL-encoded (I think the value for that is %22).
Keep in mind that if you're using stemming then a search for "fund report" will find results for "funds reports", "funding reports", etc. But maybe that is what you want.
So after all is said and done, your URL might look like the following:
http://localhost:8983/solr/collection1/select?q=%22fund%20report%22&fl=id,filename,%5Bexplain%5D&wt=xml&indent=true
Note that the fields listed for the fl parameter should be comma-delimited. I am not sure why you have the square brackets around the explain field.

Solr and searching phrases with double quotes

I have an ecommerce site where I am implementing Solr (using the Solarium library) and there are product names and descriptions that contain double quotes (usually standing for inches). Before I started to grasp the analyzer and tokenizer portion of Solr, I simply assigned the datatype of text_en_splitting to fields that would contain this data. If someone searches for the phrase - blue 1" binder - the double quote is being removed and the first 10 results being returned are not necessarily binders. The results returned seem to be matching the word blue and the number 1 (they aren't binders). Looking through the analysis of the query in Solr admin, I see the double quotes are getting removed from the WordDelimiterFilterFactory. I like WordDelimiterFilterFactory for other reasons (like dealing with the phrase post-it note) so I'm trying to find a happy medium here. Is there a better way to both index and query fields that contain double quotes that should be kept in place when performing searches (because they actually mean something)?
What I ended up doing was adding a replacement filter before the word delimiter and used the word inch.
<filter class="solr.PatternReplaceFilterFactory" pattern='(\d)"' replacement='$1 inch' replace="all"/>
Solr Query Parsers (such as DisMax) use a call to
SolrPluginUtils.stripUnbalancedQuotes(userQuery))
to remove unbalanced quotes. Balanced quotes are for phrase queries.
So you should really design your own query parser.
You may also consider replacing quotes to feet at the front end, before query comes to Solr.

difference between q=word1 word2 and q="word1 word2" in Solr/Lucene

Can someone please tell me what is the difference between:
q=word1 word2
and
q="word1 word2"
I'm trying to match a keyword "word1 word2" (yes, my keyword can have whitespaces) that is analyzed with KeywordTokenizerFactory and it seems it only works when I add the quotes in the query.
By the way I use Solr extended Dismax, don't know if this matters.
The syntax is then:
q="some text"&qf=KeywordField&qf=FrenchtextField
Edit:
The problem I have with quotes is that I have another field that contains fulltexte (analysis is basic and close to FrenchAnalyzer, including a lowercase filter)
I have 'HelloWorld' text indexed, and I can find it back with q=helloWoRLD but not with q="helloWoRLD": this unit test is broken since I added quotes in all my queries. I don't understand what is the difference between q=helloWoRLD and q="helloWoRLD" since it would still be 1 term search right?
Lucene query syntax uses spaces to separate terms so you are performing a search for "word1" in the field "q" and "word2" but with no specified field (I'm not sure how lucene behaves when no field is specified).
If you want to search for the string "word1 word2" (consecutive words) in the field q then you will have to use quotes i.e. q="word1 word2"
If you want to search for records which contain both of these words (non-consecutive) then you can search for "q=word1 AND q=word2"
I don't quite follow your hello world problem so can't comment on that. Hope this helps

Resources