Solr/Lucene contextual SynonymFilter - solr

I created a version of the SynonymFilterFactory to load the Synonym configuration file from the DB, refreshing every X seconds because of the customer requirements.
Now I need something similar, I need to use different synonyms for different value of a parameter in the query, I will use the fq parameter to do this because every user only search on documents that will match a categorization term on a field.
So I need to create the SynonymFilter, with different configuration based on the fq parameter, e.g.: if fq=A I use a set o synonyms, if B another SET.
How can I read the fq parameter in the create method of a FilterFactory? Where can I search for this query parameter?

I'll be satisfied also to catch the request original get values in the filter factory

Related

Synonyms in Solr Query Results

Whenever I query a string in solr, i want to get the synonyms of field value if there exists any as a part of query result, is it possible to do that in solr
There is no direct way to fetch the synonyms used in the search results. You can get close by looking at how Solr parsed your query via the debugQuery=true parameter and looking at the parsedQuery value in the response but it would not be straightforward. For example, if you search for "tv" on a text field that uses synonyms you will get something like this:
$ curl "localhost:8983/solr/your-core/select?q=tv&debugQuery=true"
{
...
"parsedquery":"SynonymQuery(Synonym(_text_:television _text_:televisions _text_:tv _text_:tvs))",
...
Another approach would be to load in your application the synonyms.txt file that Solr uses and do the mapping yourself. Again, not straightforward,

Filter on fields only if present on a document

Is it possible to filter a document by the value provided only if the document has the field.
For context,
I have document types A,B,C that have the field.
I also have document types D and E that don't.
I could define a query such that the filter only applies to the first subset, but I might later add a new document type to the first set which will invalidate this filter.
You'll have to combine the query with an match against all documents, except those who have a value in the field:
myfield:foobar OR (*:* NOT myfield:*)
.. should do what you want. That being said, I'd probably wait to introduce these additional queries until I actually see that it's needed, as it will make each query more expensive without possibly being necessary in the future - but that's up to your judgement.

Solr DisMax query equivalent

I am trying to set up elevate handler in SOLR 3.5.0 and I need the equivalent of the below query in dismax format which defines different boost values on the same field based on the match type(exact match gets 200 whereas wildcard match gets 100).
q=name:(foo*^100.0 OR foo^200.0)
This is one way to solve this problem.
Keep a text field with only WhiteSpaceTokenizer (and maybe LowerCaseFilter depending on your case-sensitivity needs). Use this field for the exact match. Let's call this field name_ws.
Instead of using a wild-card query on name_ws, use a text-type copy field with EdgeNGramTokenizer in your analyzer chain, which will output tokens like:
food -> f, fo, foo, food
Let's call this field name_edge.
Then you can issue this dismax query:
q=foo&defType=dismax&qf=name_ws^200+name_edge^100
(Add debugQuery=on to verify if the scoring works the way you want.)

Solr: where to store additional information?

I want to provide additional information per each indexed document during index time.
And access this information in the same analyzer during query time to compare it.
So. Theoretically it would be great to write this value into some field present in this document and at query time search this field also.
f.e. I have an animals db. I want to find all documents with 3 words 'dog' inside. (just an example). I can setup for my "animals" field my custom BaseTokenFilterFactory which will produce my custom TokenFilter which will just count all 'dog' words and store this number somewhere. So. Where I can store this value to access it at searching time?
Your example sounds like something which will be better suited to be handled by custom Similarity or a query function in Solr and not as a custom analyzer.
For example if using Solr 4.0 you can use the function termfreq(field,term) to order by the number of times dog appears. or you can use it as a filter like so:
fq={!frange l=3 u=100000}termfreq(animals,"dog")
This will filter all documents whose animals field doesn't have at least 3 occurrences of the word dog.
The advantage of using this method is that you don't affect the scoring of the documents only filter them.
The ability to filter by function exists since Solr 1.4 so even if you are using an earlier version of Solr (>1.4) you can easily write the "termfreq" function query yourself

Adding date boosting to complex SOLR queries

I currently have a SOLR query which uses the query (q), query fields (qf) and phrase fields (pf) to retrieve the results I want. An example is:
/solr/select
?q=superbowl
&qf=title^3+headline^2+intro+fulltext
&pf=title^3+headline^2+intro+fulltext
&fl=id,title,ts_modified,score
&debugQuery=true
The idea is that the title and headline of the "main item" give the best indication of what the result is "about", but the intro and fulltext provides some input too. Ie, imagine a collection of links, where the collection itself has metadata (what it's a collection of), but each link has it's own data (title of the link, synopsis, etc). If we search for "superbowl", the most relevant results are the ones with "superbowl" in the collection metadata, the least relevant results are those with "superbowl" in just the synopsis of one of the links... but they're all valid results.
What I'm trying to do is add a boost to the relevancy score so that the most recent results float towards the top, but retaining title,headline,intro,fulltext as part of the formula. A recent result with the search string in the collection metadata would be more relevant than one with it only in the links metadata... but that "links only" recent result might be more relevant than a very old result with the search string in the collection metadata. (I hope that's somewhat clear).
The problem is that I can't figure out how to combine the boost function documented on the SOLR site with the use of the qf/pf fields. Specifically...
From the SOLR site, something like the following works to boost the results by date:
/solr/select
?q={!boost%20b=$dateboost%20v=$qq}
&dateboost=ord(ts_modified)
&qq=superbowl
&fl=ts_modified,score
&debugQuery=true
However, I can't figure out how to combine that query with the use of qf and pf. Any suggestions would be more than welcome.
Thanks to danben's response, I was able to come up with the following:
/solr/select
?q={!boost%20b=$dateboost%20v=$qq%20defType=dismax}
&dateboost=ord(ts_modified)
&qq=superbowl
&qf=title^3+headline^2+intro^2+fulltext
&pf=title^3+headline^2+intro^2+fulltext
&fl=ts_modifieds,score
&debugQuery=true
It looks like the actual problems I was having were:
I left spaces in the q param instead of escaping them (%20) when copy/pasting
I didn't include the defType=dismax in my q param, so that it would pay attention to the qf/pf parameters
Check out http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
This is based on the ms function, which returns the difference in milliseconds between two timestamps / dates, and ReciprocalFloatFunction which increases as the value passed decreases.
Since you are using the DisMaxRequestHandler, you may need to specify your query using the bq/bf parameters. From http://lucene.apache.org/solr/api/org/apache/solr/handler/DisMaxRequestHandler.html:
bq - (Boost Query) a raw lucene query that will be included in the
users query to influence the score. If
this is a BooleanQuery with a default
boost (1.0f), then the individual
clauses will be added directly to the
main query. Otherwise, the query will
be included as is. This param can be
specified multiple times, and the
boosts are are additive. NOTE: the
behaviour listed above is only in
effect if a single bq paramter is
specified. Hence you can disable it by
specifying an additional, blank, bq
parameter.
bf - (Boost Functions) functions (with optional boosts) that will be
included in the users query to
influence the score. Format is:
"funcA(arg1,arg2)^1.2
funcB(arg3,arg4)^2.2". NOTE:
Whitespace is not allowed in the
function arguments. This param can be
specified multiple times, and the
functions are additive.
Here is a nice article about Date-boosting Solr search results:
http://www.metaltoad.com/blog/date-boosting-solr-drupal-search-results
In Drupal this can be simply achieved by the following code:
using Apachesolr module
/**
* Implements hook_apachesolr_query_alter().
*/
function hook_search_apachesolr_query_alter(DrupalSolrQueryInterface $query) {
$query->addParam('bf', array('freshness' =>
'recip(abs(ms(NOW/HOUR,dm_field_date)),3.16e-11,1,.1)'
));
}

Resources