Grouped records with aggregate fields - solr

I'm running an instance of Solr 6.2. One of the use cases I'm exploring is to return records grouped by a field, including summed columns (facets) and sorted by those columns. I realize Solr is not meant to be utilized as a relational database, but is this possible?
Using the JSON API, I send the following data payload to the query endpoint of my Solr instance:
{
query: "*:*",
filter: ["status:1", "date:[2016-10-11T00:00:00Z-7DAYS/DAY TO 2016-10-11T00:00:00Z]"],
limit: 10,
params: {
group: true,
group.field: name,
group.facet: true
},
facet: {
funcs: {
type: terms,
field: name,
sort: { sum_v1: desc },
limit: 10,
facet: {
sum_v1: "sum(v1)",
sum_v2: "sum(v2)",
sum_v3: "sum(v3)"
}
}
}
This returns 10 records at a time in both the groups key and facets key of the response JSON. However, the sorted facet buckets do not match up with the grouped records. How can I get the facet counts with the relevant groups?
The only workaround I can come up with is to do a query for the grouped records first, then do another query using the id's from that query to get the facet counts. However, the downside is that I'd lose the ability to sort or filter by any of the facet counts.

Related

How to limit facet request to get a certain number of rows

I have a Solr storage with a huge number of documents. Here's an example of my document structure:
{
"country":"USA",
"company":"Corsair",
"product":"RM650X 650W",
"price":"140",
"on_stock":"yes"
},
I'd like to make a facet request to Solr data to receive a certain number of rows (e.g. 200).
Here's a desired result:
The problem is I can't limit the data properly.
In Solr documentation it says that "facet.limit parameter specifies the maximum number of constraint counts (essentially, the number of facets for a field that are returned) that should be returned for the facet fields. This parameter can be specified on a per-field basis to apply a distinct limit to each field with the syntax of f.<fieldname>.facet.limit "
And here comes the tricky part.
I tried to use a limit of 200 for the first column (Country / Region). Here's my request:
country:{
type: terms,
field: country,
limit: 200, # Limit's here
facet:{
company:{
type: terms,
field: company,
limit: -1
facet:{
product:{
type: terms,
field: product,
limit: -1
}
}
}
This query returns 200 results for a country facet, but since every country has a different number of nested companies and every company has a different number of nested products, I get thousands of rows of data.
Then I tried to use a limit of 200 for the last column (Product). Here's my request:
country:{
type: terms,
field: country,
limit: -1,
facet:{
company:{
type: terms,
field: company,
limit: -1
facet:{
product:{
type: terms,
field: product,
limit: 200 # Limit's here
}
}
}
This query returns 200 results for every product lying withing every company lying within every country. In other words, the limit is local for every nested category, not global. And again I get thousands of rows of data.
Is it possible to achieve my goal in Solr?

Determine terms for facet fields in solr

I base on facet.field and I have one situation. In my store i have base products and variants, when I use facet.field I get count with base products and variants:
Category:
Chairs(30) <- this is count of base products and variants
Tables(20) <- this is count of base products and variants
I want to add some terms for facet.field in order to that facet return count only of variants, every product has field like "productType":"baseProduct" or "productType":"variantProduct"
I want to use those fields.
Any ideas? how can I use this in some query , please help
You can use facet.pivot to get distinct counts for each type:
&facet.pivot=productType,category
You can also use the JSON Facet API to do two separate facets:
{
base: {
type: terms,
field: category,
domain: { filter: "productType:baseProduct" }
},
variant: {
type: terms,
field: category,
domain: { filter : "productType:variantProduct" }
}
}

Solr facet with additional metadata

Is it possible to use additional metadata fields when using Solr facets? I would like to aggregate one attribute by counting them and desplaying the related group as additional metadata field.
http://localhost:8983/solr/gitIndex/select?indent=on&q=*:*&rows=0&wt=json&
json.facet={
Repository_s: {
type: terms,
field: Repository_s,
limit: 10,
facet: {
x:"count()"
}
}
}
The result should look like this:
...
"facets":{
"count":1354013,
"<name of attribute>":{
"buckets":[{
"val":"<value of attribute>",
"count":173997,
"<metadata_field>":<value of metadata_field>},
...
A solution is to use facet pivots - it'll get you any values in a secondary field under each facet, and if the value is unique for the set of documents, it'll just be a single value.
The reference guide has the syntax for non-json facets.

Solr facet sum instead of count

I'm new to Solr and I'm interested in implementing a special facet.
Sample documents:
{ hostname: google.com, time_spent: 100 }
{ hostname: facebook.com, time_spent: 10 }
{ hostname: google.com, time_spent: 30 }
{ hostname: reddit.com, time_spent: 20 }
...
I would like to return a facet with the following structure:
{ google.com: 130, reddit.com: 20, facebook.com: 10 }
Although solr return values are much more verbose than this, the important point is how the "counts" for the facets are the sum of the time_spent values for the documents rather than the actual count of the documents matching the facet.
Idea #1:
I could use a pivot:
q:*:*
&facet=true
&facet.pivot=hostname,time_spent
However, this returns the counts of all the unique time spent values for every unique hostname. I could sum this up in my application manually, but this seems wasteful.
Idea #2
I could use the stats module:
q:*:*
&stats=true
&stats.field=time_spent
&stats.facet=hostname
However, this has two issues. First, the returned results contain all the hostnames. This is really problematic as my dataset has over 1m hostnames. Further, the returned results are unsorted - I need to render the hostnames in order of descending total time spent.
Your help with this would be really appreciated!
Thanks!
With Solr >=5.1, this is possible:
Facet Sorting
The default sort for a field or terms facet is by bucket count
descending. We can optionally sort ascending or descending by any
facet function that appears in each bucket. For example, if we wanted
to find the top buckets by average price, then we would add sort:"x
desc" to the previous facet request:
$ curl http://localhost:8983/solr/query -d 'q=*:*&
json.facet={
categories:{
type : terms,
field : cat,
sort : "x desc", // can also use sort:{x:desc}
facet:{
x : "avg(price)",
y : "sum(price)"
}
}
}
'
See Yonik's Blog: http://yonik.com/solr-facet-functions/
For your use case this would be:
json.facet={
hostname_time:{
type: terms,
field: hostname,
sort: "time_total desc",
facet:{
time_total: "sum(time_spent)",
}
}
}
Calling sum() in nested facets worked for us only in 6.3.0.
I believe what you are looking for is an aggregation component, but be aware that solr is a full text search engine and not the database.
So, answer of your question is , go with idea#1. Otherwise you should have used Elastics Search or MongoDB or even Redis which are equipped with such aggregation components.

SOLR sort by IN Query

I was wondering if it is possible to sort by the order that you request documents from SOLR. I am running a In based query and would just like SOLR to return them based on the order that I ask.
In (4,2,3,1) should return me documents ordered 4,2,3,1.
Thanks.
You need Sorting in solr, to order them by field.
I assume that "In based query" means something like: fetch docs whose fieldx has values in (val1,val2). You can a field as multi-valued field and facet on that field. A facet query is a 'is in' search, out of the box (so to say) and it can do more sophisticated searches too.
Edited on OP's query:
Updating a document with a multi-valued field in JSON here. See the line
"my_multivalued_field": [ "aaa", "bbb" ] /* use an array for a multi-valued field */
As for doing a facet query, check this.
You need to do one or more fq statements:
&fq=field1:[400 to 500]
&fq=field2:johnson,thompson
Also do read up on the fact (in link above) that you need to facet on stored rather than indexed fields.
You can easily apply sorting with QueryOptions and field sort (ExtraParams property - I am sorting by savedate field, descending):
var results = _solr.Query(textQuery,
new QueryOptions
{
Highlight = new HighlightingParameters
{
Fields = new[] { "*" },
},
ExtraParams = new Dictionary<string, string>
{
{"fq", dateQuery},
{"sort", "savedate desc"}
}
});

Resources