SOLR - group by field and then get distinct value by another field - database

I'm using apache solr for searching records. In my case I'm having table which has columns category and sub-category, etc.
I want to group by category and then get the distinct list of sub-category from grouped results. Is that possible in apache solr?
If yes, please do help me to solve this.
Thanks in advance.

You can do that with a pivot facet:
facet=on&facet.pivot=category,subcategory
This will give you a facet with all the sub categories for each category.
You can also use the Facet JSON API. Example adopted from that page:
top_categories:{
type: terms,
field: category,
limit: 5,
facet:{
top_subcategories:{
type: terms,
field: subcategory,
limit: 20
}
}
}

Related

Determine terms for facet fields in solr

I base on facet.field and I have one situation. In my store i have base products and variants, when I use facet.field I get count with base products and variants:
Category:
Chairs(30) <- this is count of base products and variants
Tables(20) <- this is count of base products and variants
I want to add some terms for facet.field in order to that facet return count only of variants, every product has field like "productType":"baseProduct" or "productType":"variantProduct"
I want to use those fields.
Any ideas? how can I use this in some query , please help
You can use facet.pivot to get distinct counts for each type:
&facet.pivot=productType,category
You can also use the JSON Facet API to do two separate facets:
{
base: {
type: terms,
field: category,
domain: { filter: "productType:baseProduct" }
},
variant: {
type: terms,
field: category,
domain: { filter : "productType:variantProduct" }
}
}

Solr Distinct on multi-valued field

I have Solr documents with a multi-valued field, and need the distinct values from it. I have to filter by a different field, but my result doesn't have to incklude anything other than the distinct categories.
Documents:
{CountryCode: 'US', Product:'A', Categories:[1,2,3]},
{CountryCode: 'US', Product:'B', Categories:[1,3,77,88]},
{CountryCode: 'JP', Product:'B', Categories:[1,2]}
{CountryCode: 'JP', Product:'B', Categories:[444,555]}
Filter for only CountryCode = 'US'
Result:
{[1,2,3,77,88]}
I tried field collapsing/grouping, but it doesn't work on multi-valued fields.
I tried terms(thanks to suggestion by Persimmonium), but it doesn't want to filter only the 'US' categories. The fact that terms gave how many times a category occurs is a bonus, but not required in this case.
Any suggestions?
Edited after your comment.
One way to achieve this is with:
a fq to get the set of docs you are interested in
then facet on Categories, setting 'limit' high enough to get all values
A fancier way might be usingStreaming Expressions. But faceting is just simpler.

Grouped records with aggregate fields

I'm running an instance of Solr 6.2. One of the use cases I'm exploring is to return records grouped by a field, including summed columns (facets) and sorted by those columns. I realize Solr is not meant to be utilized as a relational database, but is this possible?
Using the JSON API, I send the following data payload to the query endpoint of my Solr instance:
{
query: "*:*",
filter: ["status:1", "date:[2016-10-11T00:00:00Z-7DAYS/DAY TO 2016-10-11T00:00:00Z]"],
limit: 10,
params: {
group: true,
group.field: name,
group.facet: true
},
facet: {
funcs: {
type: terms,
field: name,
sort: { sum_v1: desc },
limit: 10,
facet: {
sum_v1: "sum(v1)",
sum_v2: "sum(v2)",
sum_v3: "sum(v3)"
}
}
}
This returns 10 records at a time in both the groups key and facets key of the response JSON. However, the sorted facet buckets do not match up with the grouped records. How can I get the facet counts with the relevant groups?
The only workaround I can come up with is to do a query for the grouped records first, then do another query using the id's from that query to get the facet counts. However, the downside is that I'd lose the ability to sort or filter by any of the facet counts.

Solr facet with additional metadata

Is it possible to use additional metadata fields when using Solr facets? I would like to aggregate one attribute by counting them and desplaying the related group as additional metadata field.
http://localhost:8983/solr/gitIndex/select?indent=on&q=*:*&rows=0&wt=json&
json.facet={
Repository_s: {
type: terms,
field: Repository_s,
limit: 10,
facet: {
x:"count()"
}
}
}
The result should look like this:
...
"facets":{
"count":1354013,
"<name of attribute>":{
"buckets":[{
"val":"<value of attribute>",
"count":173997,
"<metadata_field>":<value of metadata_field>},
...
A solution is to use facet pivots - it'll get you any values in a secondary field under each facet, and if the value is unique for the set of documents, it'll just be a single value.
The reference guide has the syntax for non-json facets.

Count of Facets in Solr?

Is there a way to get the count of different facets in solr?
This is an example facet:
Camera: 20
Computer: 80
Monitor: 40
Laptop: 120
Tablet: 30
What I need to get here is "5" as in 5 different electronic items. Of course I can count the facet but some of them have many items inside and fetching and counting them really slow it down.
You need to apply Solr Patch SOLR-2242 to get the Facet distinct count.
In SOLR 5.1 will be a "JSON Facet API" with function "unique(state)".
http://yonik.com/solr-facet-functions/
Not an exact answer to this question, but if facet distinct count doesn't work for anyone, you can get that information using group.ngroups:
group = true
group.field = [field you are faceting on]
group.ngroups = true
The disadvantage of this approach is that if you want ungrouped results, you will have to run the query a second time.
Of course the patch is recommended, but the cheesy way to do this for a simple facet, as you have here, is just to use the length of the facet divided by 2. For example, since you are faceting on type:
facetCount = facet_counts.facet_fields.type.length/2
Use this GET request:
http://localhost:8983/solr/core_name/select?json.facet={x:"unique(electronic_items)"}&q=*:*&rows=0

Resources