I like to know what is the difference between DisMax and EDisMax..?
Is there any useful reference to know about that.? Also, I would like to know what are the queries DisMax failed to produce the result for which EDisMax is able to produce the result..?
EDisMax has some Query parameter like boost Parameter,ps Parameter,The pf2 Parameter; But apart from this query parameter, how EDisMax better than DisMax; how queries are processed between these two.What factors make EDisMax do better than DisMax..
Some queries failed to give result in DisMax but EDisMax gives result for those queries.
I googled the difference between DisMax and EDisMax. I have found, the parameters have been used in EDisMax is only the difference between DisMax and EDisMax; but I am expecting something technically to explain to others in presentation.
http://ip:8983/solr/C73/select/?defType=edismax&q=ipod OR video&fl=filename, score&hl=true&hl.fl=content contentenstem filename&hl.zetaContentField=content
for above query EDisMax produces about 238 results; but DisMax produces 0 result.
So what is the difference between handling this query by this two parser;What makes EDisMax to produce result.Thats what I like to know ....
As Dismax had a lot of limitations, EDismax query parser was added.
Check out SOLR-1553
To start with (as in Documentation) :-
The extended dismax parser was based on the original Solr dismax parser.
Supports full lucene query syntax in the absence of syntax errors
supports "and"/"or" to mean "AND"/"OR" in lucene syntax mode
When there are syntax errors, improved smart partial escaping of special characters is done to prevent them... in this mode, fielded queries, +/-, and phrase queries are still supported.
Improved proximity boosting via word bigrams... this prevents the problem of needing 100% of the words in the document to get any boost, as well as having all of the words in a single field.
advanced stopword handling... stopwords are not required in the mandatory part of the query but are still used (if indexed) in the proximity boosting part. If a query consists of all stopwords (e.g. to be or not to be) then all will be required.
Supports the "boost" parameter.. like the dismax bf param, but multiplies the function query instead of adding it in
Supports pure negative nested queries... so a query like +foo (-foo) will match all documents
However, as you would a lot of associated JIRA's to improve the query parsing capability and support for more features.
Reading through the JIRA's can be really insightful :)
In general EDisMax is an extended version of the DisMax. You can find good description and differences of both parser in the following links.
DisMax Query Parser
Extended DisMax Query Parser
Related
Solr supports dismax query parser and edismax query parser. How do we conclude when to use dismax and when edismax? When we should use edismax over dismax?
The reference guide documents this extensively.
In addition to supporting all the DisMax query parser parameters,
Extended Dismax:
supports the full Lucene query parser syntax.
supports queries such as AND, OR, NOT, -, and +.
treats "and" and "or" as "AND" and "OR" in Lucene syntax mode.
respects the 'magic field' names _val_ and _query_. These are not a
real fields in the Schema, but if used it helps do special things
(like a function query in the case of _val_ or a nested query in the
case of _query_). If _val_ is used in a term or phrase query, the
value is parsed as a function.
includes improved smart partial escaping in the case of syntax errors;
fielded queries, +/-, and phrase queries are still supported in this
mode.
improves proximity boosting by using word shingles; you do not need
the query to match all words in the document before proximity boosting
is applied.
includes advanced stopword handling: stopwords are not required in the
mandatory part of the query but are still used in the proximity
boosting part. If a query consists of all stopwords, such as "to be or
not to be", then all words are required.
includes improved boost function: in Extended DisMax, the boost
function is a multiplier rather than an addend, improving your boost
results; the additive boost functions of DisMax (bf and bq) are also
supported.
supports pure negative nested queries: queries such as +foo (-foo)
will match all documents.
lets you specify which fields the end user is allowed to query, and to
disallow direct fielded searches.
Whether these features are important to you is up to your own use case, but in most cases there is no reason to use dismax over edismax - edismax is more flexible and fixes a few issues with dismax that has crept up over the years. Unless you have a very specific reason, go with edismax.
According to the Solr documentation, Dismax Query Parser doesn't support AND not OR in queries. However if I run one of the following queries:
http://xx.xx.Xx.xx:yyyy/solr/select?q=Pakistan%20OR%20India&wt=json&indent=true&defType=dismax
http://xx.xx.Xx.xx:yyyy/solr/select?q=Pakistan&wt=json&start=0&rows=20&indent=true&fl=content,url,title&fq=(title:[''+TO+*]+AND+url:[''+TO+*]+AND+content:[''+TO+*])&fq=group:ur_blogs&defType=dismax
I get results.
My question is: dismax doesn't support AND or OR in 'q' parameter or in the entire query?
As per description provided in link https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
"The DisMax query parser supports an extremely simplified subset of the Lucene QueryParser syntax. As in Lucene, quotes can be used to group phrases, and +/- can be used to denote mandatory and optional clauses. All other Lucene query parser special characters (except AND and OR) are escaped to simplify the user experience."
There is query that contains optional("should" clauses) mandatory and prohibited tokens. The following two queries returns different results. But should be same, not?
+_query_:"{!type=**dismax** mm='2<2 3<3 5<4 7<51%' qf='normalizedField'} opt1 opt2 +mandatory -prohibited"
VS
+_query_:"{!type=**edismax** mm='2<2 3<3 5<4 7<51%' qf='normalizedField'} opt1 opt2 +mandatory -prohibited"
With Minimum "Should" Match parameter:
mm: "2<2 3<3 5<4 7<51%"
Any ideas? Thanks
Updated
There is document in solr index:
{
...
"normalizedField":"opt1 opt3 mandatory"
...
}
searching with dismax query:
+_query_:"{!type=dismax mm='2<2 3<3 5<4 7<51%' qf='normalizedField'} opt1 opt2 +mandatory -prohibited"
"parsedquery_toString":"+(((normalizedField:opt1) (normalizedField:opt2) +(normalizedField:mandatory) -(normalizedField:prohibited))~2) ()"
return empty result(as expected)
BUT
searching with edismax query:
+_query_:"{!type=edismax mm='2<2 3<3 5<4 7<51%' qf='normalizedField'} opt1 opt2 +mandatory -prohibited"
"parsedquery_toString": "+((normalizedField:opt1) (normalizedField:opt2) +(normalizedField:mandatory) -(normalizedField:prohibited))"
return this document. WHY?
seems i found solution. I USED 5.2 solr version with known issue(https://issues.apache.org/jira/browse/SOLR-2649). After upgrade to version 5.5.1 issue is resolved) and edismax works the same as dismax(for my example)
edismax and dismax are not identical (there wouldn't be any reason for introducing edismax in that case). edismax extends the syntax set and magic of dismax, by introducing several new features:
supports the full Lucene query parser syntax.
supports queries such as AND, OR, NOT, -, and +.
treats "and" and "or" as "AND" and "OR" in Lucene syntax mode.
respects the 'magic field' names _val_ and _query_. These are not a real fields in the Schema, but if used it helps do special things (like a function query in the case of _val_ or a nested query in the case of _query_). If _val_ is used in a term or phrase query, the value is parsed as a function.
includes improved smart partial escaping in the case of syntax errors; fielded queries, +/-, and phrase queries are still supported in this mode.
improves proximity boosting by using word shingles; you do not need the query to match all words in the document before proximity boosting is applied.
includes advanced stopword handling: stopwords are not required in the mandatory part of the query but are still used in the proximity boosting part. If a query consists of all stopwords, such as "to be or not to be", then all words are required.
includes improved boost function: in Extended DisMax, the boost function is a multiplier rather than an addend, improving your boost results; the additive boost functions of DisMax (bf and bq) are also supported.
supports pure negative nested queries: queries such as +foo (-foo) will match all documents.
lets you specify which fields the end user is allowed to query, and to disallow direct fielded searches.
I've bolded the ones that easily might affect scoring, while features such as "pure negative nested queries" will change which documents are included. The same can occur because of support of the full lucene query parser syntax.
The easiest way to actually find out what's happening is to use the debugQuery feature of Solr, so you can see the scores and exactly what the dismax and edismax query is expanded to.
.. and if dismax works, you can just use that.
There is a field in my schema 'fullText' which is of the 'text_en' type, and multivalued. The term 'tests' is in the fullText field in one document.
In solr, when I try to search using the word 'test', with the standard lucene parser with minimal distance 1, its returning the document. The query I am using is:
http://localhost:8983/solr/simple/select?q=fullText:test~1&wt=xml&indent=true
But I am doing the same using dismax, and its not returning the document. The queries I have tried are:
http://localhost:8983/solr/simple/select?q=test&wt=xml&indent=true&defType=dismax&qf=fullText~1
http://localhost:8983/solr/simple/select?q=test~1&wt=xml&indent=true&defType=dismax&qf=fullText
DisMax, by design, does not support all lucene query syntax in it's query parameter. From the documentation:
This query parser supports an extremely simplified subset of the Lucene QueryParser syntax. Quotes can be used to group phrases, and +/- can be used to denote mandatory and optional clauses ... but all other Lucene query parser special characters are escaped to simplify the user experience.
Fuzzy queries are one of the things that are not supported. There is a request to add it to the qf parameter, if you'd care to take a look, but it has not been implemented.
One good solution would be to go to the edismax query parser, instead. It's query parameter supports full lucene query parser syntax:
http://localhost:8983/solr/simple/select?q=test~1&defType=edismax&qf=fullText
I have a disjunctive Lucene query like this:
(clause_1 OR clause_2 OR ... OR clause_N) and I want to add an additive boost query on top of it, just like the bq parameter in DisMax parser.
I tried {!edismax qf='' bq='my_boost_query'}(clause_1 OR clause_2 OR ... OR clause_N) but it returned zero result. (I assume it might be caused by the empty qf parameter)
Is is possible to do it without rewriting my query using DisMax format? Perhaps some special syntax of Lucene query parser, like _val_ and such? Or may be DisMax wrapper does the job, but I am just missing something in the above query syntax?