Fuzzy search not working with dismax query parser - solr

There is a field in my schema 'fullText' which is of the 'text_en' type, and multivalued. The term 'tests' is in the fullText field in one document.
In solr, when I try to search using the word 'test', with the standard lucene parser with minimal distance 1, its returning the document. The query I am using is:
http://localhost:8983/solr/simple/select?q=fullText:test~1&wt=xml&indent=true
But I am doing the same using dismax, and its not returning the document. The queries I have tried are:
http://localhost:8983/solr/simple/select?q=test&wt=xml&indent=true&defType=dismax&qf=fullText~1
http://localhost:8983/solr/simple/select?q=test~1&wt=xml&indent=true&defType=dismax&qf=fullText

DisMax, by design, does not support all lucene query syntax in it's query parameter. From the documentation:
This query parser supports an extremely simplified subset of the Lucene QueryParser syntax. Quotes can be used to group phrases, and +/- can be used to denote mandatory and optional clauses ... but all other Lucene query parser special characters are escaped to simplify the user experience.
Fuzzy queries are one of the things that are not supported. There is a request to add it to the qf parameter, if you'd care to take a look, but it has not been implemented.
One good solution would be to go to the edismax query parser, instead. It's query parameter supports full lucene query parser syntax:
http://localhost:8983/solr/simple/select?q=test~1&defType=edismax&qf=fullText

Related

When to use edismax over dismax?

Solr supports dismax query parser and edismax query parser. How do we conclude when to use dismax and when edismax? When we should use edismax over dismax?
The reference guide documents this extensively.
In addition to supporting all the DisMax query parser parameters,
Extended Dismax:
supports the full Lucene query parser syntax.
supports queries such as AND, OR, NOT, -, and +.
treats "and" and "or" as "AND" and "OR" in Lucene syntax mode.
respects the 'magic field' names _val_ and _query_. These are not a
real fields in the Schema, but if used it helps do special things
(like a function query in the case of _val_ or a nested query in the
case of _query_). If _val_ is used in a term or phrase query, the
value is parsed as a function.
includes improved smart partial escaping in the case of syntax errors;
fielded queries, +/-, and phrase queries are still supported in this
mode.
improves proximity boosting by using word shingles; you do not need
the query to match all words in the document before proximity boosting
is applied.
includes advanced stopword handling: stopwords are not required in the
mandatory part of the query but are still used in the proximity
boosting part. If a query consists of all stopwords, such as "to be or
not to be", then all words are required.
includes improved boost function: in Extended DisMax, the boost
function is a multiplier rather than an addend, improving your boost
results; the additive boost functions of DisMax (bf and bq) are also
supported.
supports pure negative nested queries: queries such as +foo (-foo)
will match all documents.
lets you specify which fields the end user is allowed to query, and to
disallow direct fielded searches.
Whether these features are important to you is up to your own use case, but in most cases there is no reason to use dismax over edismax - edismax is more flexible and fixes a few issues with dismax that has crept up over the years. Unless you have a very specific reason, go with edismax.

Solr: dismax query parser doesn't support AND OR than why i am getting result

According to the Solr documentation, Dismax Query Parser doesn't support AND not OR in queries. However if I run one of the following queries:
http://xx.xx.Xx.xx:yyyy/solr/select?q=Pakistan%20OR%20India&wt=json&indent=true&defType=dismax
http://xx.xx.Xx.xx:yyyy/solr/select?q=Pakistan&wt=json&start=0&rows=20&indent=true&fl=content,url,title&fq=(title:[''+TO+*]+AND+url:[''+TO+*]+AND+content:[''+TO+*])&fq=group:ur_blogs&defType=dismax
I get results.
My question is: dismax doesn't support AND or OR in 'q' parameter or in the entire query?
As per description provided in link https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
"The DisMax query parser supports an extremely simplified subset of the Lucene QueryParser syntax. As in Lucene, quotes can be used to group phrases, and +/- can be used to denote mandatory and optional clauses. All other Lucene query parser special characters (except AND and OR) are escaped to simplify the user experience."

Solr Dismax and Edismax request gives different results for the same query

There is query that contains optional("should" clauses) mandatory and prohibited tokens. The following two queries returns different results. But should be same, not?
+_query_:"{!type=**dismax** mm='2<2 3<3 5<4 7<51%' qf='normalizedField'} opt1 opt2 +mandatory -prohibited"
VS
+_query_:"{!type=**edismax** mm='2<2 3<3 5<4 7<51%' qf='normalizedField'} opt1 opt2 +mandatory -prohibited"
With Minimum "Should" Match parameter:
mm: "2<2 3<3 5<4 7<51%"
Any ideas? Thanks
Updated
There is document in solr index:
{
...
"normalizedField":"opt1 opt3 mandatory"
...
}
searching with dismax query:
+_query_:"{!type=dismax mm='2<2 3<3 5<4 7<51%' qf='normalizedField'} opt1 opt2 +mandatory -prohibited"
"parsedquery_toString":"+(((normalizedField:opt1) (normalizedField:opt2) +(normalizedField:mandatory) -(normalizedField:prohibited))~2) ()"
return empty result(as expected)
BUT
searching with edismax query:
+_query_:"{!type=edismax mm='2<2 3<3 5<4 7<51%' qf='normalizedField'} opt1 opt2 +mandatory -prohibited"
"parsedquery_toString": "+((normalizedField:opt1) (normalizedField:opt2) +(normalizedField:mandatory) -(normalizedField:prohibited))"
return this document. WHY?
seems i found solution. I USED 5.2 solr version with known issue(https://issues.apache.org/jira/browse/SOLR-2649). After upgrade to version 5.5.1 issue is resolved) and edismax works the same as dismax(for my example)
edismax and dismax are not identical (there wouldn't be any reason for introducing edismax in that case). edismax extends the syntax set and magic of dismax, by introducing several new features:
supports the full Lucene query parser syntax.
supports queries such as AND, OR, NOT, -, and +.
treats "and" and "or" as "AND" and "OR" in Lucene syntax mode.
respects the 'magic field' names _val_ and _query_. These are not a real fields in the Schema, but if used it helps do special things (like a function query in the case of _val_ or a nested query in the case of _query_). If _val_ is used in a term or phrase query, the value is parsed as a function.
includes improved smart partial escaping in the case of syntax errors; fielded queries, +/-, and phrase queries are still supported in this mode.
improves proximity boosting by using word shingles; you do not need the query to match all words in the document before proximity boosting is applied.
includes advanced stopword handling: stopwords are not required in the mandatory part of the query but are still used in the proximity boosting part. If a query consists of all stopwords, such as "to be or not to be", then all words are required.
includes improved boost function: in Extended DisMax, the boost function is a multiplier rather than an addend, improving your boost results; the additive boost functions of DisMax (bf and bq) are also supported.
supports pure negative nested queries: queries such as +foo (-foo) will match all documents.
lets you specify which fields the end user is allowed to query, and to disallow direct fielded searches.
I've bolded the ones that easily might affect scoring, while features such as "pure negative nested queries" will change which documents are included. The same can occur because of support of the full lucene query parser syntax.
The easiest way to actually find out what's happening is to use the debugQuery feature of Solr, so you can see the scores and exactly what the dismax and edismax query is expanded to.
.. and if dismax works, you can just use that.

What is the difference between dismax and EdisMax?

I like to know what is the difference between DisMax and EDisMax..?
Is there any useful reference to know about that.? Also, I would like to know what are the queries DisMax failed to produce the result for which EDisMax is able to produce the result..?
EDisMax has some Query parameter like boost Parameter,ps Parameter,The pf2 Parameter; But apart from this query parameter, how EDisMax better than DisMax; how queries are processed between these two.What factors make EDisMax do better than DisMax..
Some queries failed to give result in DisMax but EDisMax gives result for those queries.
I googled the difference between DisMax and EDisMax. I have found, the parameters have been used in EDisMax is only the difference between DisMax and EDisMax; but I am expecting something technically to explain to others in presentation.
http://ip:8983/solr/C73/select/?defType=edismax&q=ipod OR video&fl=filename, score&hl=true&hl.fl=content contentenstem filename&hl.zetaContentField=content
for above query EDisMax produces about 238 results; but DisMax produces 0 result.
So what is the difference between handling this query by this two parser;What makes EDisMax to produce result.Thats what I like to know ....
As Dismax had a lot of limitations, EDismax query parser was added.
Check out SOLR-1553
To start with (as in Documentation) :-
The extended dismax parser was based on the original Solr dismax parser.
Supports full lucene query syntax in the absence of syntax errors
supports "and"/"or" to mean "AND"/"OR" in lucene syntax mode
When there are syntax errors, improved smart partial escaping of special characters is done to prevent them... in this mode, fielded queries, +/-, and phrase queries are still supported.
Improved proximity boosting via word bigrams... this prevents the problem of needing 100% of the words in the document to get any boost, as well as having all of the words in a single field.
advanced stopword handling... stopwords are not required in the mandatory part of the query but are still used (if indexed) in the proximity boosting part. If a query consists of all stopwords (e.g. to be or not to be) then all will be required.
Supports the "boost" parameter.. like the dismax bf param, but multiplies the function query instead of adding it in
Supports pure negative nested queries... so a query like +foo (-foo) will match all documents
However, as you would a lot of associated JIRA's to improve the query parsing capability and support for more features.
Reading through the JIRA's can be really insightful :)
In general EDisMax is an extended version of the DisMax. You can find good description and differences of both parser in the following links.
DisMax Query Parser
Extended DisMax Query Parser

Fielded searches with Solr ExtendedDisMax Query Parser

I'm having a problem using the Solr ExtendedDisMax Query Parser with query that contains fielded searches inside not-plain queries.
The case is the following.
If I send to SOLR an edismax request (defType=edismax) with parameters
qf=field1^10
q=field2:ciao
debugQuery=on (for debug purposes)
solr parses the query as I expect, in fact the debug part of the response tells me that
[parsedquery_toString] => +field2:ciao
But if I make the expression only a bit more complex, like putting the condition into brackets:
1. qf=field1^10
2. q=(field2:ciao)
I get
[parsedquery_toString] => +(((field1:field2:^2.0) (field1:ciao^2.0))~2)
where Solr seems not recognize the field syntax.
I've not found any mention to this behavior in the documentation, where instead they say that
This parser supports full Lucene QueryParser syntax including boolean operators 'AND', 'OR', 'NOT', '+' and '-', fielded search, term boosting, fuzzy...
This problem is really annoying me because I would like to do compelx boolean and fielded queries even with the edismax parser.
Do you know a way to workaround this?
EDIT: The Solr version is 3.6
If you are using Solr 3.6, there is a current issue with eDisMax and Fielded searches that was introduced with Solr 3.6. The workaround is to precede the field name with a space.
So change your query to the following:
qf=field1^10
q=( field2:ciao)
Please see eDismax: A fielded query wrapped by parens is not recognized for the more details.

Resources