Solr Query very slow with extensive qf - solr

I'm operating a productive Solr server with more than 700.000 datasets in it. I'm using the query mode dismax with the following settings:
mm = 2<-1 5<80%
tie = 0.1
qf = title^4 text title_bg^4 text_bg title_hr^4 text_hr title_cs^4 text_cs title_da^4 text_da title_nl^4 text_nl title_et^4 text_et title_fi^4 text_fi title_fr^4 text_fr title_de^4 text_de title_el^4 text_el title_hu^4 text_hu title_ga^4 text_ga title_it^4 text_it title_lv^4 text_lv title_lt^4 text_lt title_mt^4 text_mt title_pl^4 text_pl title_pt^4 text_pt title_ro^4 text_ro title_sk^4 text_sk title_sl^4 text_sl title_es^4 text_es title_sv^4 text_sv name^4 tags^2 groups^2
The qf value is very extended because some fields are stored in multiple languages for this particular query I want to search in all languages. But the query is very slow. It takes about 12 seconds to get a response. The hardware of the server is more than sufficient. I noticed that the extent of the qf value and the response time are connected. When I strip down qf is the response time gets much better. Is this the expected behavior? Should qf not be too big? Is there a way to tweak the performance for this case?

this sounds like a good use case for query reranking.
You use a simpler query first (for example removing all title* stuff from qf might still give good results) and then use the full complex qf you have now for the reranking step.

Related

How to utilize a Lucene Query class (CommonTermsQuery) with SolrJ

I want to use Lucene's CommonTermsQuery class for a query executed with SolrJ, so how do I utilize Lucene's Query classes? And what are the differences between those classes and what appears to be Solr's query parsers?
Solr currently doesn't include a query parser that uses CommonTermsQuery, but you can add your own query parsers to Solr by compiling a .jar by yourself and then adding that jar in a <lib .. directive in solrconfig.xml.
There's an existing example on how to make a QParserPlugin for Solr with CommonTermsQuery available as a gist, so that's probably a good place to start for a custom plugin. You'll select the custom QueryParser through the standard {!syntax} in start of a query. Since SolrJ is just the client talking to a Solr server, the plugin itself has to be implemented and loaded on the server (or if you're running in SolrCloud / Cluster mode, on all servers).
A Query Parser takes free form text (which is what Solr is great at) and converts it into a set of Query classes for Lucene to execute (which represents the query, in the way that the query parser thought that the user wanted to express herself).
The differences between Solr's query parser and Lucene's query parser are several, but most people use the edismax or dismax query parser these days (these may have evolved into the Lucene QP over time as well unknown to me):
Differences in the Solr Query Parser include (these are from an older page on the Solr Wiki - I'm not sure if there's a more recent version available, but since Solr and Lucene's code merged into a single tree and got synchronized, I guess there are less new differences introduced compared to when they were separate projects):
Range queries [a TO z], prefix queries a*, and wildcard queries a*b are constant-scoring (all matching documents get an equal score). The scoring factors tf, idf, index boost, and coord are not used. There is no limitation on the number of terms that match (as there was in past versions of Lucene).
Lucene 2.1 has also switched to use ConstantScoreRangeQuery for its range queries.
A * may be used for either or both endpoints to specify an open-ended range query.
field:[* TO 100] finds all field values less than or equal to 100
field:[100 TO *] finds all field values greater than or equal to 100
field:[* TO *] matches all documents with the field
Pure negative queries (all clauses prohibited) are allowed.
-inStock:false finds all field values where inStock is not false
-field:[* TO *] finds all documents without a value for field
A hook into FunctionQuery syntax. Quotes will be necessary to encapsulate the function when it includes parentheses.
Example: _val_:myfield
Example: _val_:"recip(rord(myfield),1,2,3)"
Nested query support for any type of query parser (via QParserPlugin).
Quotes will often be necessary to encapsulate the nested query if it contains reserved characters.
Example: query:"{!dismax qf=myfield}how now brown cow"

In Solr, how can we use terms external to the search query to bias result ordering?

We're working on a plan to identify content tags our users are interested in. So, for instance, we may determine that User X consumes content tagged with "kermit" and "piggy" more often than other tags. These are their "favored tags."
When the users search, we'd like to favor/bias documents that contain these terms.
This means we can't boost the documents at index time, because every user will have different favored tags. Additionally, they may not be searching for the favored tags themselves. They may search for "gonzo," and so we absolutely want to give them documents with "gonzo," but we want to boost documents that also contain "kermit" or "piggy."
These favored tags are not used to actually query the index, but rather are used to bias the result ordering. The favored tags become something of a tie-breaker -- all else being equal, documents containing these terms will rank higher.
This is new/planned development, so we can use whatever version and parser stack is optimal to solve this problem.
Solution in SolrNet
The question was correctly answered below, but here's the code for SolrNet just in case someone else is using it.
var localParams = new LocalParams();
localParams.Add("bq", "kermit^10000); //numeric value is the degree of boost
var solr = ServiceLocator.Current.GetInstance<ISolrOperations<MySolrDocumentClass>>();
solr.Query(new SolrQuery("whatever") + localParams);
You didn't specify which query parser you're using, but if you are using the Dismax or Extended Dismax query parser, the bq argument should do exactly what you're looking for. bq adds search criteria to a search solely for the purpose of affecting the relevancy, but not to limit the result set.
From the Dismax documentation:
The bq (Boost Query) Parameter
The bq parameter specifies an additional, optional, query clause that
will be added to the user's main query to influence the score. For
example, if you wanted to add a relevancy boost for recent documents:
q=cheese
bq=date:[NOW/DAY-1YEAR TO NOW/DAY]
You can specify multiple bq parameters. If you want your query to be
parsed as separate clauses with separate boosts, use multiple bq
parameters.
In this case, you may want to add &bq=kermit&bq=piggy to the end of your Solr query. If you aren't using one of these query parsers, this need may be exactly the motivation you need to switch.

What is the difference between dismax and EdisMax?

I like to know what is the difference between DisMax and EDisMax..?
Is there any useful reference to know about that.? Also, I would like to know what are the queries DisMax failed to produce the result for which EDisMax is able to produce the result..?
EDisMax has some Query parameter like boost Parameter,ps Parameter,The pf2 Parameter; But apart from this query parameter, how EDisMax better than DisMax; how queries are processed between these two.What factors make EDisMax do better than DisMax..
Some queries failed to give result in DisMax but EDisMax gives result for those queries.
I googled the difference between DisMax and EDisMax. I have found, the parameters have been used in EDisMax is only the difference between DisMax and EDisMax; but I am expecting something technically to explain to others in presentation.
http://ip:8983/solr/C73/select/?defType=edismax&q=ipod OR video&fl=filename, score&hl=true&hl.fl=content contentenstem filename&hl.zetaContentField=content
for above query EDisMax produces about 238 results; but DisMax produces 0 result.
So what is the difference between handling this query by this two parser;What makes EDisMax to produce result.Thats what I like to know ....
As Dismax had a lot of limitations, EDismax query parser was added.
Check out SOLR-1553
To start with (as in Documentation) :-
The extended dismax parser was based on the original Solr dismax parser.
Supports full lucene query syntax in the absence of syntax errors
supports "and"/"or" to mean "AND"/"OR" in lucene syntax mode
When there are syntax errors, improved smart partial escaping of special characters is done to prevent them... in this mode, fielded queries, +/-, and phrase queries are still supported.
Improved proximity boosting via word bigrams... this prevents the problem of needing 100% of the words in the document to get any boost, as well as having all of the words in a single field.
advanced stopword handling... stopwords are not required in the mandatory part of the query but are still used (if indexed) in the proximity boosting part. If a query consists of all stopwords (e.g. to be or not to be) then all will be required.
Supports the "boost" parameter.. like the dismax bf param, but multiplies the function query instead of adding it in
Supports pure negative nested queries... so a query like +foo (-foo) will match all documents
However, as you would a lot of associated JIRA's to improve the query parsing capability and support for more features.
Reading through the JIRA's can be really insightful :)
In general EDisMax is an extended version of the DisMax. You can find good description and differences of both parser in the following links.
DisMax Query Parser
Extended DisMax Query Parser

Is it possible to boost mlt queries in solr?

Specifically if I'm doing a query using the solr mlt handler (http://wiki.apache.org/solr/MoreLikeThisHandler) and stream.body to supply the source doc is there any way to boost result documents based on document age?
I already know how to do that for a regular query using dismax (http://wiki.apache.org/solr/FunctionQuery#Date_Boosting) but I can't quite figure out the magic incantation to do it for the mlt handler.
It looks like the mlt handler is written to handle one of two cases:
q=[typical query goodness which can include date boosting]
stream.body=[url]
If q is present, stream.body is ignored and vice-versa, so unfortunately I don't think you'll be able to do what you want in a single call without patching the MoreLikeThisHandler.
BUT: If you need this in a hurry, you can do it with two queries
Run your same MLT query solely for the purpose of retrieving the interesting-terms and boosts (e.g with mlt.interestingTerms=details&mlt.boost=true&rows=0)
Using the interesting-terms and boosts from (1), run a standard Solr query (non-MLT) with the date-boosting function you desire.

What is the proper way to boost items with newer dates?

I have a more like this query which I would like to update to return newer documents first. According to the documentation, I would need to add recip(ms(NOW,mydatefield),3.16e-11,1,1) to my query.
But when I try to add it to either of mlt.qf or bf parameters. The results stay exactly the same.
This is my query:
/solr/mlt?
q=id:cms.article.137861
&defType=edismax
&rows=3
&indent=on
&mlt.fl=series_id,tags,title,text
&mlt.qf=show_id text^1.1 title^1.1 tags^90
&wt=json
&fl=url,title,tags,django_id,content_type_id
&bf=recip(ms(NOW,pub_date),3.16e-11,1,1)
this is taken from the solr wiki (its down but i have it cached)
i think this is what you are looking for.
How can I boost the score of newer documents
Do an explicit sort by date (relevancy scores are ignored)
Use an index-time boost that is larger for newer documents
Use a FunctionQuery to influence the score based on a date field.
In Solr 1.3, use something of the form recip(rord(myfield),1,1000,1000)
In Solr 1.4, use something of the form recip(ms(NOW,mydatefield),3.16e-11,1,1)
http://lucene.apache.org/solr/api/org/apache/solr/search/function/ReciprocalFloatFunction.html http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html
A full example of a query for "ipod" with the score boosted higher the newer the product is:
http://localhost:8983/solr/select?q={!boost b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}ipod
One can simplify the implementation by decomposing the query into multiple arguments:
http://localhost:8983/solr/select?q={!boost b=$dateboost v=$qq}&dateboost=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)&qq=ipod
Now the main "q" argument as well as the "dateboost" argument may be specified as defaults in a search handler in solrconfig.xml, and clients would only need to pass "qq", the user query.
To boost another query type such as a dismax query, the value of the boost query is a full sub-query and hence can use the {!querytype} syntax. Alternately, the defType param can be used in the boost local params to set the default type to dismax. The other dismax parameters may be set as top level parameters.
http://localhost:8983/solr/select?q={!boost b=$dateboost v=$qq defType=dismax}&dateboost=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)&qf=text&pf=text&qq=ipod
Consider using reduced precision to prevent excessive memory consumption. You would instead use recip(ms(NOW/HOUR,mydatefield),3.16e-11,1,1). See this thread for more information.
apparently your date field is not a TrieDate

Resources