Is it possible to run dismax query without computing score? I just need this to make some tests - what is influence of score computing in Solr searching performance.
For now i have dismax query like this:
{
limit : 10,
params:{
defType:"dismax",
q:"${query}",
q.op:"${operator}",
qf:"${fields}",
indent:"off"
}
}
Is there some easy way to achieve what i want? Maybe i should use filter query but how can i specify operators and use query phase from user as it is without any processing in fq?
You can introduce different query parser through localparams, so something like:
q=*:*&fq={!type=dismax qf='${fields}'}${query}
.. should work. If these are not local variables that you expand yourself, the syntax might be slightly different.
Related
Is there a way to apply a boost inside a SpanOrQuery? (Solr 7.1, lucene of same version.)
Example of query structure generated by edismax query parser (inside the usual +BooleanQuery / DisjunctionMaxQuery for multiple fields):
SpanNearQuery
List of SpanOrQuery
List of SpanTermQuery or SpanNearQuery <-- I want to boost those terms
I want to apply a boost to each clause inside the SpanOrQuery. So I tried (via custom query parser extending edismax):
SpanNearQuery
List of SpanOrQuery
List of SpanBoostQuery, each wrapping:
a SpanTermQuery or a SpanNearQuery.
The boost seems to be ignored, although the query is executed successfully (except boost not applied).
Here is the use case:
The input is a sentence, a phrase query (quoted with slop) to valuate proximity. I am using edismax query parser and a SynonymGraphFilter and I want to apply a different boost for each synonym, so the boost information is attached to each term in the synonym file (e.g. 0.7_foo). In case of multi-term synonyms (expanded version), edismax generates a SpanQuery, which sounds to me like a graph search, so far so good.
Where it fails is when I insert a SpanBoostQuery to wrap each clause inside the SpanOrQuery (via a custom query parser extending edismax, extracting boosts from term text). While the query is still returning results, the boost seems just ignored.
Is that a misusage? A bug? Any advice about how I can fix it or work around it please?
To have a similar behavior, I replaced the SpanQuery by a list of PhraseQueries, one for each possible combination of terms in the SpanQuery, which can result in lots of phrases, the performances seem highly affected and we lose lots of information.
Thanks a lot!
Edit:
The tree above describes the query generated by my custom query parser (I checked the resulting Query by debugging it), but for whoever wants to figure out how this is implemented, I first let edismax return a parsed query, then I go through the tree recursively, rebuilding a new Query, until I reach the leaf (SpanTermQuery), and I apply a code like below to wrap the term in a boost (SpanBoostQuery is provided by lucene).
if (initialQuery instanceof SpanTermQuery) {
SpanTermQuery q = (SpanTermQuery) initialQuery;
// parse and extract term/boost from q.getTerm(),
// e.g. "0.7_foo" -> {term: foo, boost: 0.7}
q = new SpanTermQuery(term);
if (boost >= 0 && boost != 1) {
return new SpanBoostQuery(q, boost);
} else {
return q;
}
}
I am aware it is not an optimal solution, but so far it works except the missing boost. Advices are welcome, but it would be a separate topic, and not as important for me as my question above :)
I'm operating a productive Solr server with more than 700.000 datasets in it. I'm using the query mode dismax with the following settings:
mm = 2<-1 5<80%
tie = 0.1
qf = title^4 text title_bg^4 text_bg title_hr^4 text_hr title_cs^4 text_cs title_da^4 text_da title_nl^4 text_nl title_et^4 text_et title_fi^4 text_fi title_fr^4 text_fr title_de^4 text_de title_el^4 text_el title_hu^4 text_hu title_ga^4 text_ga title_it^4 text_it title_lv^4 text_lv title_lt^4 text_lt title_mt^4 text_mt title_pl^4 text_pl title_pt^4 text_pt title_ro^4 text_ro title_sk^4 text_sk title_sl^4 text_sl title_es^4 text_es title_sv^4 text_sv name^4 tags^2 groups^2
The qf value is very extended because some fields are stored in multiple languages for this particular query I want to search in all languages. But the query is very slow. It takes about 12 seconds to get a response. The hardware of the server is more than sufficient. I noticed that the extent of the qf value and the response time are connected. When I strip down qf is the response time gets much better. Is this the expected behavior? Should qf not be too big? Is there a way to tweak the performance for this case?
this sounds like a good use case for query reranking.
You use a simpler query first (for example removing all title* stuff from qf might still give good results) and then use the full complex qf you have now for the reranking step.
We're working on a plan to identify content tags our users are interested in. So, for instance, we may determine that User X consumes content tagged with "kermit" and "piggy" more often than other tags. These are their "favored tags."
When the users search, we'd like to favor/bias documents that contain these terms.
This means we can't boost the documents at index time, because every user will have different favored tags. Additionally, they may not be searching for the favored tags themselves. They may search for "gonzo," and so we absolutely want to give them documents with "gonzo," but we want to boost documents that also contain "kermit" or "piggy."
These favored tags are not used to actually query the index, but rather are used to bias the result ordering. The favored tags become something of a tie-breaker -- all else being equal, documents containing these terms will rank higher.
This is new/planned development, so we can use whatever version and parser stack is optimal to solve this problem.
Solution in SolrNet
The question was correctly answered below, but here's the code for SolrNet just in case someone else is using it.
var localParams = new LocalParams();
localParams.Add("bq", "kermit^10000); //numeric value is the degree of boost
var solr = ServiceLocator.Current.GetInstance<ISolrOperations<MySolrDocumentClass>>();
solr.Query(new SolrQuery("whatever") + localParams);
You didn't specify which query parser you're using, but if you are using the Dismax or Extended Dismax query parser, the bq argument should do exactly what you're looking for. bq adds search criteria to a search solely for the purpose of affecting the relevancy, but not to limit the result set.
From the Dismax documentation:
The bq (Boost Query) Parameter
The bq parameter specifies an additional, optional, query clause that
will be added to the user's main query to influence the score. For
example, if you wanted to add a relevancy boost for recent documents:
q=cheese
bq=date:[NOW/DAY-1YEAR TO NOW/DAY]
You can specify multiple bq parameters. If you want your query to be
parsed as separate clauses with separate boosts, use multiple bq
parameters.
In this case, you may want to add &bq=kermit&bq=piggy to the end of your Solr query. If you aren't using one of these query parsers, this need may be exactly the motivation you need to switch.
Currently I having the query like this
q=mysearchparameters
It is working fine, and I think it will search for this keyword in all the fields, now I want to retrieve data only based in some specific field like this
q=name:'somename'+specialization:'somespecialization'
is it possible to query like, here I getting some unexpected datas for my second query.
You can have multiple queries, ANDed together, like this:
q=name:somename AND specialization:somespecialization
or ORer together like this:
q=name:somename OR specialization:somespecialization
Or you can use filter queries to AND them together:
q=*:*&fq=name:somename&fq=specialization:somespecialization
I won't get into queries versus filter queries as it is covered better elsewhere:
SOLR filter-query vs main-query
in order to perform a multicriteria request, you'd better do :
q=*:*
fq= name:*somename*
fq= specialization:*specializstr*
http req example : http://localhost:8983/solr/datav6/select?q=*%3A*&fq=data%3A*carlos*%5E5&fq=entity%3Aemployee&wt=json&indent=true
it' saffer on the results, faster on execution and consumes less ram.
enjoy and give me some feedback please! :)
I have a more like this query which I would like to update to return newer documents first. According to the documentation, I would need to add recip(ms(NOW,mydatefield),3.16e-11,1,1) to my query.
But when I try to add it to either of mlt.qf or bf parameters. The results stay exactly the same.
This is my query:
/solr/mlt?
q=id:cms.article.137861
&defType=edismax
&rows=3
&indent=on
&mlt.fl=series_id,tags,title,text
&mlt.qf=show_id text^1.1 title^1.1 tags^90
&wt=json
&fl=url,title,tags,django_id,content_type_id
&bf=recip(ms(NOW,pub_date),3.16e-11,1,1)
this is taken from the solr wiki (its down but i have it cached)
i think this is what you are looking for.
How can I boost the score of newer documents
Do an explicit sort by date (relevancy scores are ignored)
Use an index-time boost that is larger for newer documents
Use a FunctionQuery to influence the score based on a date field.
In Solr 1.3, use something of the form recip(rord(myfield),1,1000,1000)
In Solr 1.4, use something of the form recip(ms(NOW,mydatefield),3.16e-11,1,1)
http://lucene.apache.org/solr/api/org/apache/solr/search/function/ReciprocalFloatFunction.html http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html
A full example of a query for "ipod" with the score boosted higher the newer the product is:
http://localhost:8983/solr/select?q={!boost b=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)}ipod
One can simplify the implementation by decomposing the query into multiple arguments:
http://localhost:8983/solr/select?q={!boost b=$dateboost v=$qq}&dateboost=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)&qq=ipod
Now the main "q" argument as well as the "dateboost" argument may be specified as defaults in a search handler in solrconfig.xml, and clients would only need to pass "qq", the user query.
To boost another query type such as a dismax query, the value of the boost query is a full sub-query and hence can use the {!querytype} syntax. Alternately, the defType param can be used in the boost local params to set the default type to dismax. The other dismax parameters may be set as top level parameters.
http://localhost:8983/solr/select?q={!boost b=$dateboost v=$qq defType=dismax}&dateboost=recip(ms(NOW,manufacturedate_dt),3.16e-11,1,1)&qf=text&pf=text&qq=ipod
Consider using reduced precision to prevent excessive memory consumption. You would instead use recip(ms(NOW/HOUR,mydatefield),3.16e-11,1,1). See this thread for more information.
apparently your date field is not a TrieDate