Need to check for range in Solr if function - solr

We are just building a Solr index for a knowledge base and I have some problems implementing
boosting.
First of all: We want to have multiplicative boosting and no additive.
And: The more hits a document has, the more it should be boosted, but only to a certain degree.
First of all we thounght about a function like boost=sum(div(hits,10000),1), but that would push certain
documents too much.
So we thought about something like this
(beside some others, but those all work and only these give me an error):
&boost=if(hits,[0+TO+100],1)
&boost=if(hits,[101+TO+250],1.25)
&boost=if(hits,[250+TO+100000],1.5)
Error is:
org.apache.solr.search.SyntaxError: Expected identifier at pos 8 str='if(hits,[101 TO 250],1.25)'
So the obvious reason is the range in the if function, if I remove that with a single value, all works, but that does not really help me.
So my question is: Is it not possible to combine an "if()" function with a range of values to match?
I know I could try a million different ways to solve this, but actually we would be glad to have it in some way like this, as the boost param values could be configurable for the different ranges and it's easy to get that syntax working with our framework to access Solr.
However, if there is no chance to get this running, I am of course open for alternative solutions.
Thanks a lot,
Markus

You can use bq (Boost Query) as following:
&bq=hits:[0 TO 100]^1.0

So to clean this up here:
It is not possible to use a range within an if function.
But we found a way with the map function which pretty much does what we wanted to achieve with
that if-range attempt:
&boost=map(hits, 0, 100, 1, map(hits,101, 250, 1.25, map(hits,250, 10000, 1.5)))

Related

If possible, what is the Solr query syntax to filter by doc size?

Solr 4.3.0
I want to find the larger size documents.
I'm trying to build some test data for testing memory usage, but I keep getting the smaller sized documents. So, if I could add a doc size clause to my query it would help me find more suitable documents.
I'm not aware of this possibility, most likely there is no support for it.
I could see one possible approach - you could add size of the document during indexing in some separate field, which will later use to filter on.
Another possible case - is to use TermVectorComponent, which could return term vectors for matched documents, which could lead to some understanding of "how big" this document is. Not easy and simple, though.
Example of the possibly useful output:
Third possible option (kudos to MatsLindh for the idea): to use sorting function norm() for a specific field. There are some limitations:
You need to use some classic similarity
The field you're sorting on should contains norms
Example of the sorting function: sort:norm(field_name) desc

Solr Distance Filter from a Radius Field

Hi I am very new to Solr queries (like a few hours), so please excuse me if this is a naive question, but is there a way on the geo filter to set the radius from a field.
{!geofilt pt=35.3459327,-97.4705935 sfield=locs_field_location$latlon d=fs_radius}
Or do a subquery to return the value of that field fs_field_job_search_radius and place it in there. I can return the value from the field list so I was hoping it could go in there, in some method.
This is similar to this Filtering by distance vs. field value in Solr but I do not know if he got it working or where I would need to start to write a function as was suggested. Also this is on a Solr server I do not control. It is controlled by my hosting company, so I do not know if I can even create functions. Thanks.
Took a work around, but I got what I was trying to accomplish I believe.
fq={!frange l=0 h=12742}sub(radius_field,geodist(field,point))
The 12742 is the diameter of the earth in km as I still needed a hard number for that, but I doubt most are searching in space. So basically we subtract the distance from radius_field to find out if it is in range.
radius_field - distance
If the results are a positive number than it is within range. If it is a negative number than it is not. Please let me know if I screwed up my logic. Thanks.

How get around solr's spellcheck maxEdit limit of 2?

I would like to provide results for words that are severly misspelled. Do you have any suggestions on how I can that in solr 5. The built in solr.DirectSolrSpellChecker doesn't seem to be very flexible.
Thanks for any help you can provide.
You may want to consider instead an analyzer stack that creates phonetic mapping or other transformations that reduce spelling to more-general representation. An example shows one of them (DoubleMetaphone). But there are many different ones depending on the possible reasons the words are being misspelt.

Custom SOLR-sorting that is aware of its neighbours

For a SOLR search, I want to treat some results differently (where the field "is_promoted" is set to "1") to give them a better ranking. After the "normal" query is performed, the order of the results should be rearranged so that approximately 30 % of the results in a given range (say, the first 100 results) should be "promoted results". The ordering of the results should otherwise be preserved.
I thought it would be a good idea to solve this by making a custom SOLR plugin. So I tried writing a SearchComponent, but it seems like you can't change the ordering of search results after it has passed through the QueryComponent (since they are cached)?
One could have written some kind of custom sort function (or a function query?) but the challenge is that the algorithm needs to know about the score/ordering of the other surrounding results. A simple increase in the score won't do the trick.
Any suggestions on how this should be implemented?
Just answered this question on the Solr users list. The RankQuery feature in Solr 4.9 is designed to solve this type of problem. You can read about RankQueries here: http://heliosearch.org/solrs-new-rankquery-feature/

Solr - Nearest Match - Does this functionality exist?

Can Solr give you a nearest match when comparing "fingerprint" type data stored in the Solr datastore. For example,
eJyFk0uyJSEIBbcEyEeWAwj7X8JzfDvKnuTAJIojWACwGB4QeM
HWCw0vLHlB8IWeF6hf4PNC2QunX3inWvDCO9WsF7heGHrhvYV3qvPEu-
87s9ELLi_8J9VzknReEH1h-BOKRULBwyZiEulgQZZr5a6OS8tqCo00cd
p86ymhoxZrbtQdgUxQvX5sIlF_2gUGQUDbM_ZoC28DDkpKNCHVkKCgpd
OHf-wweX9adQycnWtUoDjABumQwbJOXSZNur08Ew4ra8lxnMNuveIem6
LVLQKsIRLAe4gbj5Uxl96RpdOQ_Noz7f5pObz3_WqvEytYVsa6P707Jz
j4Oa7BVgpbKX5tS_qntcB9G--1tc7ZDU1HamuDI6q07vNpQTFx22avyR
Can it find this record if it was presented with something extremely similar? And can it provide back a confidence score?
one straighforward approach could be to use a fuzzy search, and pick the first hit (by score), then you need to check whether the hit is good a match or not, maybe by testing you could find some good rule of thumbs.
But not sure if perf would be an issue with such long tokens. Use Lucene4.0 where fuzzy perf is much improved.
You may try experimenting with Ngram filter factory. You may pick a min/max gram size that is consistent with a matching/similar finger print.
If you have a tight range of minGramSize and maxGramSize, you can match documents with similar fingerprint without having to iterate over false positives.

Resources