Percentile calculated By JsonQuery is not always same - solr

For 1000+ records satisfying the search criteria in filter query solr collection gives different percentile value every time. I've been using same filter query and using json facet query to get percentile inside one queryfacet.
Sample Query :
`
json.facet = {
time: "sum(time)",
users: "sum(numofusers)",
queryfacet: {
q: "time: [0 TO 50000}",
type: query,
facet: {
timepercentile: "percentile(time, 95)"
}
}
}
`

The percentile function is an approximation and is not an exact value. It's documented under the stats functions:
percentiles
A list of percentile values based on cut-off points specified by the parameter value, such as 1,99,99.9. These values are an approximation, using the t-digest algorithm. This statistic is computed for numeric field types and is not computed by default.
The percentile function in the JSON Facet API uses the same method:
Percentile estimates via t-digest algorithm. When sorting by this metric, the first percentile listed is used as the sort value.
You can read more about the t-digest algorithm on the GitHub repository.
Since these values are based on estimates, I'm guessing there's some minor variance in which elements gets sampled; it might also depend on the structure of your index (number of nodes, when they get updated, when the commits gets issued, etc.).

Related

SORL facet fields order by descending value

I am using SOLR 6.5.1 with facet filters.
My query has:
facet.limit=-1 --> to generate all possible facets values
facet.sort=index --> to order facets values not by number of occurrences but by the value itsef
For instance, one facet has integers as values (in particular the fields contains years). So the values are (occurences in brackets):
2010 (438)
2011 (547)
...
2017 (367)
The facet is correctly ordered by value but with asc order (2010-->2017). How can obtaint the reverse order (2017-->2010)?
Thanks
UMG
You won't be able to specify the sort direction with the simple facet API (the old one used directly in the URL). But since you're retrieving all the possible facets, you can reverse the direction in your client side controller before outputting the values. Exactly how you do that depends on which language you're using.
But if you'd switch over to the more modern JSON-based facet API, you can specify the sort order directly on each level of the facet:
"sort":"index desc"
Specifies how to sort the buckets produced. “count” specifies document count, “index” sorts by the index (natural) order of the bucket value. One can also sort by any facet function / statistic that occurs in the bucket. The default is “count desc”. This parameter may also be specified in JSON like sort:{count:desc}. The sort order may either be “asc” or “desc”

Sort or filter results by function query defined in a field

I have a Solr 6.2 instance running, and I'm exploring its advantages and limitations. One limitation I've run into seems to be that you can't sort or filter the data based off of a field function query.
.../solr/collection/select?q=*:*&fl=*,total:sum(v1,v2)&fq=total:[10 TO *]
Solr responds with an error stating that the total field does not exist. Indeed, the field is not defined in my schema because it's not a stored part of the dataset - it's calculated at query time. They call it a pseudo field. I haven't been able to find an example in the documentation or a solution online. So, is there a way around this?
.../solr/collection/select?q=*:*&fl=*,total:sum(v1,v2)&fq={!frange l=10} sum(v1,v2)
I have very same problem as you.
I want to query particular division value of two fields.
I tried to used [0.3 TO *] like you.
You can also use upper bound for your range if you need.
http://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.6.pdf
"l" is for lower bound.
"u" is for upper bound.
fq={!frange l=0 u=2.2} sum(user_ranking,editor_ranking)
Maybe this works for you?
you can do this. instead of total try sum.
you can find more info here. https://wiki.apache.org/solr/FunctionQuery#What_is_a_Function.3F
an example from the sole wiki.
Example Function Queries
To give you a better understanding of how function queries can be used in Solr, suppose an index stores the dimensions in meters x,y,z of some hypothetical boxes with arbitrary names stored in field boxname. Suppose we want to search for box matching name findbox but ranked according to volumes of boxes. The query parameters would be:
q=boxname:findbox val:"product(x,y,z)"
This query will rank the results based on volumes. In order to get the computed volume, you will need to request the score, which will contain the resultant volume:
&fl=*, score
Suppose that you also have a field storing the weight of the box as weight. To sort by the density of the box and return the value of the density in score, you would submit the following query:
http://localhost:8983/solr/collection_name/select?q=boxname:findbox val:"div(weight,product(x,y,z))"&fl=boxname x y z weight score`
you can read more about it here. https://cwiki.apache.org/confluence/display/solr/Function+Queries
Try this
solr/collection/select?q=*:* _val_:"sum(v1,v2)"&fl=* score&fq={!frange l=10 }sum(v1,v2)

Solr: ranking of results when querying multiple shards

If I'm querying across two shards and first shard returned 10 rows and second one returned 100 rows, how is the combined result set ranked? Will I end up with results from first shard (the one with least result) appearing first?
When each of the shard returns result for a given query, the results are sorted by the similarity score for each document. The similarity score is a relative measure of how well the document matches to the search query.
Now these results from different shards are merged by the similarity score and presented to the user/application. The similarity scores are calculated within shards before the merge of results happen.
You can include parameters &shard.info=true and fl=*,score into the query and see the result. Then observe what is the maxScore returned by each shard and look at each document with score. You will get the insight how the result are merged.

Solr facet sum instead of count

I'm new to Solr and I'm interested in implementing a special facet.
Sample documents:
{ hostname: google.com, time_spent: 100 }
{ hostname: facebook.com, time_spent: 10 }
{ hostname: google.com, time_spent: 30 }
{ hostname: reddit.com, time_spent: 20 }
...
I would like to return a facet with the following structure:
{ google.com: 130, reddit.com: 20, facebook.com: 10 }
Although solr return values are much more verbose than this, the important point is how the "counts" for the facets are the sum of the time_spent values for the documents rather than the actual count of the documents matching the facet.
Idea #1:
I could use a pivot:
q:*:*
&facet=true
&facet.pivot=hostname,time_spent
However, this returns the counts of all the unique time spent values for every unique hostname. I could sum this up in my application manually, but this seems wasteful.
Idea #2
I could use the stats module:
q:*:*
&stats=true
&stats.field=time_spent
&stats.facet=hostname
However, this has two issues. First, the returned results contain all the hostnames. This is really problematic as my dataset has over 1m hostnames. Further, the returned results are unsorted - I need to render the hostnames in order of descending total time spent.
Your help with this would be really appreciated!
Thanks!
With Solr >=5.1, this is possible:
Facet Sorting
The default sort for a field or terms facet is by bucket count
descending. We can optionally sort ascending or descending by any
facet function that appears in each bucket. For example, if we wanted
to find the top buckets by average price, then we would add sort:"x
desc" to the previous facet request:
$ curl http://localhost:8983/solr/query -d 'q=*:*&
json.facet={
categories:{
type : terms,
field : cat,
sort : "x desc", // can also use sort:{x:desc}
facet:{
x : "avg(price)",
y : "sum(price)"
}
}
}
'
See Yonik's Blog: http://yonik.com/solr-facet-functions/
For your use case this would be:
json.facet={
hostname_time:{
type: terms,
field: hostname,
sort: "time_total desc",
facet:{
time_total: "sum(time_spent)",
}
}
}
Calling sum() in nested facets worked for us only in 6.3.0.
I believe what you are looking for is an aggregation component, but be aware that solr is a full text search engine and not the database.
So, answer of your question is , go with idea#1. Otherwise you should have used Elastics Search or MongoDB or even Redis which are equipped with such aggregation components.

Solr - get range facets inferior to a threshold

I'm using Solr 4.3. I have built range facet for field price, for which I gave a f.price.facet.range.start, a f.price.facet.range.end and f.price.facet.range.gap, but I cant figure out how to compute the facet for values inferior or superior to a certain value.
Maybe I dont know the exact syntax : f.price.facet.range.other.before=1000000.
According to the documentation on on Facet Range Other, this will only work for values that fall within the range being computed. So for your example, if 1000000 is not within your current range start/end values, you will not get a result from the range.other.before parameter. However, you can still get the facet for this price, by including it as a separate facet.query request.
For your example, you would include te following parameter:
facet.query=price:[* TO 1000000]

Resources