Solr facet query filtering - solr

I'm trying to build a facet query on the manufacturer field when the search term = "LENS" but want to eliminate all those manufactures where there is no lens..
For example:- I need the following output but want to eliminate "Kodak" since there is not lens from that manufacturer....
"facet_fields": {
"manu" : [
"Canon USA": 25,
"Olympus": 21,
"Sony": 12,
"Panasonic": 9,
"Nikon": 4,
"Kodak":0
],
http://localhost/solr/collection1/select?q=lens&rows=0&wt=json&indent=true&facet=true&facet.query=lens&facet.field=manu
does not yield the correct result

You can use facet.mincount to only retrieve facet keys that have a value above a certain treshold. This is 0 by default.
facet.mincount=1
You can also supply the value on a per-field basis if you're doing multiple facets in a single request, f.manu.facet.mincount=1.
Additionally, there should be no need to do a facet.query when you're already performing the same query as the actual query. The facet.query is useful if you want to do arbitrary queries for a facet, within the same document set already returned by your query.

Related

Can Solr JSON Facet API stat functions be combined with a domain filter to limit the results of the function?

My use case is to make a query to Solr, and to extract counts of unique terms for certain fields within the result set. The trick is that within my counts, I need to limit the output to only terms that match a certain input string--without adjusting the main Solr query. E.g., "Solr, give me results for 'War and Peace', and give me the first ten facets on author where the author field has 'doge' in it, and give me a count of all unique author values in the result set where the author field has 'doge' in it."
The Solr JSON Facet API allows me to facet using stat functions; in this case, I'm interested in using the unique() function to get the counts I need. So, e.g.,
{
"author_count": "unique(author)"
}
...tells me the total number of unique values for 'author' in the result set. This is good.
I can limit the output of a facet using the domain change option, like so:
{
"author_facet": {
"type": "terms",
"field": "author",
"mincount": 1,
"limit": 10,
"offset": 0,
"domain": {
"filter": "author:doge"
}
}
}
This is also good.
The problem I'm having is that when I send both of these choices, the result of the unique() call (in author_count) is a count of all unique author values in the base result set, regardless of whether the author contains 'doge'. The author_facet results do correctly limit the output to only authors with 'doge' in them. But I need to also apply that limit to the results of the unique() function.
I cannot alter the base query, because it represents user input that is independent of the facet filtering input. E.g, the user will have searched for "War and Peace," and now want to see only those facets where the author is 'doge', with a count of the total authors matching 'doge'.
If it is meaningful to the answer, I am running Solr 9.0.0.
Is there a way to apply domain filtering to Solr stat functions in the JSON Facet API, such as unique()?
EDIT: To clarify: The number of authors with 'doge' may be very large, and so would exceed the number of actual facets that should be returned. I'm limiting the facet response to 100, but there could be 978 authors with 'doge'. I want to inform the user of that 978 count while only returning the top 100.

Solr - Constrain More Like This Results Only

Let's say I have a particular sweater with code:blue-sweater that is color:blue. I would like to find similar products using the description field, with the constraint that similar products are not blue (-color:blue).
From the Solr wiki:
If you want to filter the similar results given by MoreLikeThis you
have to use the MoreLikeThisHandler. It will consider the similar
document result set as the main one so will apply the specified
filters (fq) on it. If you use the MoreLikeThisComponent and apply
query filters it will be applyed to the result set returned by the
main query (QueryComponent) and not to the one returned by the
MoreLikeThisComponent.
These are the params I'm using; the qt param sets the request handler as MoreLikeThis:
{
q: "code:"blue-sweater"",
qt: "mlt",
mlt: "true",
fl: "description,brand,gender,price",
mlt.boost: "true",
mlt.fl: "description",
fq: "-color:"blue"",
rows: "6",
mlt.mintf: "0",
mlt.mindf: "0"
}
The issue is that I can only specify the FilterQuery param once, which sets fq for both the initial query ("code:"blue-sweater") and for the MoreLikeThis results).
Since the filter of -color:blue excludes my initial query (the blue sweater), I am left with no MoreLikeThis results. How do I get around this?
If the only products in the core are color:blue, I still want to return them, but they should be at the bottom of possible results.
Edit
I did some digging around, and it seems that the only way to boost a MoreLikeThis query is by with mlt.qf:
Query fields and their boosts using the same format as that used by
the DisMax Query Parser. These fields must also be specified in
mlt.fl. (source)
I have tried to do a regular query with the DisMax parser with a value constraint (like in_stock:[* TO 10]), but the constraint on the field value gets ignored entirely. You can only do plain boosts on a field (color^2).
So it seems that this is a limitation of MoreLikeThis relying on the DisMax parser instead of the EdisMax parser.

Does solr support overlapping range facets?

I have not found any example for overlapping range based facets
Do solr even support overlapping range facets? example, something like : [0-10],[5-15],[10-20]
Well a facet is a filter, so if you add multiple, separate range filters you're essentially saying "filter for values 0-10 and filter for values 5-15". So only the values in range 5-10 satisfy both those filters and that's all you'll get. If you want results that satisfy any of the ranges, you could join them into a single facet query parameter with an OR operator, e.g.
fq = count:[0 TO 10] OR count[5 TO 15]
and that's the same as filtering count:[0 TO 15]. Just depends what kind of functionality you expect from overlapping ranges.

Solr facet counts for specific field values

Solr creates multi-select facet counts for me as described here:
https://web.archive.org/web/20131202095639/http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams
I also have various predefined searches that allow a user to browse the catalog. Here is one such example and its query parameters:
q=*:*
fq={!tag=g}genre:western
facet=on
facet.field={!ex=g}genre
facet.mincount=1
facet.limit=50
With this search I get up to 50 genre values in the facet list. I then go through and mark which values were selected by the user; western in this case. This works well except when western is pushed out of the top 50. So I manually add it to the list to make a total of 51. This way the user can see that it is indeed selected. The problem is I have to leave the count for western blank because I don't know it.
Is there a way to get counts for specific facet values such as western in this case? Or another approach to solve this issue?
I am using Solr 4.7.0.
Solr allows you to create a query-based facet count by using the facet.query parameter. When creating a filter query (fq) that's based on a facet field value, I now create a corresponding facet query:
facet.query={!ex=g}genre:western
and add it to the rest of my parameters:
q=*:*
fq={!tag=g}genre:western
facet=on
facet.field={!ex=g}genre
facet.query={!ex=g}genre:western
facet.mincount=1
facet.limit=50
The facet_queries object will now be populated in the solr response:
{
...
"facet_counts": {
"facet_queries": {
"{!ex=g}genre:western": 7
},
...
},
...
}
Regardless of what is returned in the facet_fields object, I'm now guaranteed to have a facet count for genre:western. With some parsing, facet field counts can be extracted from the facet queries.

Restrict multi field facet calculation to subset of possible values

I have a non trivial SOLR query, which already involves a filter query and facet calculations over multiple fields. One of the facet fields is a a multi value integer field, that is used to store categories. There are many possible categories and new ones are created dynamically, so using multiple fields is not an option.
What I want to do, is to restrict facet calculation over this field to a certain set of integers (= categories). So for example I want to calculate facets of this field, but only taking categories 3,7,9 and 15 into account. All other values in that field should be ignored.
How do I do that? Is there some build in functionality which can be used to solve this? Or do I have to write a custom search component?
The parameter can be defined for each field specified by the facet.field parameter – you can do it, by adding a parameter like this: facet.field_name.prefix.
I don't know about any way to define the facet base that should be different from the result, but one can use the facet.query to explicitly define each facet filter, e.g.:
facet.query={!key=3}category:3&facet.query={!key=7}category:7&facet.query={!key=9}category:9&facet.query={!key=15}category:15
Given the solr schema/data from this gist, the results will have something like this:
"facet_counts": {
"facet_queries": {
"3": 1,
"7": 1,
"9": 0,
"15": 0
},
"facet_fields": {
"category": [
"2",
2,
"1",
1,
"3",
1,
"7",
1,
"8",
1
]
},
"facet_dates": {},
"facet_ranges": {}
}
Thus giving the needed facet result.
I have some doubts about performance here(especially when there will be more than 4 categories and if the initial query is returning a lot of results), so it is better to do some benchmarking, before using this in production.
Not exactly the answer to my own question, but the solution we are using now: The numbers I want to filter on, build distinct groups. So we can prefix the id with a group id like this:
1.3
1.8
1.9
2.4
2.5
2.11
...
Having the data like this in SOLR, we can use facted prefixes to facet only over a single group: http://wiki.apache.org/solr/SimpleFacetParameters#facet.prefix

Resources