Restrict multi field facet calculation to subset of possible values - solr

I have a non trivial SOLR query, which already involves a filter query and facet calculations over multiple fields. One of the facet fields is a a multi value integer field, that is used to store categories. There are many possible categories and new ones are created dynamically, so using multiple fields is not an option.
What I want to do, is to restrict facet calculation over this field to a certain set of integers (= categories). So for example I want to calculate facets of this field, but only taking categories 3,7,9 and 15 into account. All other values in that field should be ignored.
How do I do that? Is there some build in functionality which can be used to solve this? Or do I have to write a custom search component?

The parameter can be defined for each field specified by the facet.field parameter – you can do it, by adding a parameter like this: facet.field_name.prefix.

I don't know about any way to define the facet base that should be different from the result, but one can use the facet.query to explicitly define each facet filter, e.g.:
facet.query={!key=3}category:3&facet.query={!key=7}category:7&facet.query={!key=9}category:9&facet.query={!key=15}category:15
Given the solr schema/data from this gist, the results will have something like this:
"facet_counts": {
"facet_queries": {
"3": 1,
"7": 1,
"9": 0,
"15": 0
},
"facet_fields": {
"category": [
"2",
2,
"1",
1,
"3",
1,
"7",
1,
"8",
1
]
},
"facet_dates": {},
"facet_ranges": {}
}
Thus giving the needed facet result.
I have some doubts about performance here(especially when there will be more than 4 categories and if the initial query is returning a lot of results), so it is better to do some benchmarking, before using this in production.

Not exactly the answer to my own question, but the solution we are using now: The numbers I want to filter on, build distinct groups. So we can prefix the id with a group id like this:
1.3
1.8
1.9
2.4
2.5
2.11
...
Having the data like this in SOLR, we can use facted prefixes to facet only over a single group: http://wiki.apache.org/solr/SimpleFacetParameters#facet.prefix

Related

Number of fields returned by Solr Suggester

By the default Solr Suggester component is returning 3 fields for each of the suggestions:
{
"term": "electronics and computer1",
"weight": 2199,
"payload": ""
}
Is there a way to extend number of fields returned for each of the suggestions? I would like to have for example additional fields here which I've added to the index (e.g. ID of an index record).
you can always stuff a number of infoss into a single payload field, separated by some char (like |). Simple, but works.

How to filter an array in Azure Search

I have following Data in my Index,
{
"name" : "The 100",
"lists" : [
"2c8540ee-85df-4f1a-b35f-00124e1d3c4a;Bellamy",
"2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike",
"2c8540ee-85df-4f1a-b35f-00155c02e581;Clark"
]
}
I have to get all the documents where the lists has Pike in it.
Though a full search query works with Any I could't get the contains work.
$filter=lists/any(t: t eq '2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike')
However i am not sure how to search only with Pike.
$filter=lists/any(t: t eq 'Pike')
I guess the eq looks for a full text search, is there any way with the given data structure I should make this query work.
Currently the field lists has no searchable property only the filterable property.
The eq operator looks for exact, case-sensitive matches. That's why it doesn't match 'Pike'. You need to structure your index such that terms like 'Pike' can be easily found. You can accomplish this in one of two ways:
Separate the GUIDs from the names when you index documents. So instead of indexing "2c8540ee-85df-4f1a-b35f-00155c40f11c;Pike" as a single string, you could index them as separate strings in the same array, or perhaps in two different collection fields (one for GUIDs and one for names) if you need to correlate them by position.
If the field is searchable, you can use the new search.ismatch function in your filter. Assuming the field is using the standard analyzer, full-text search will word-break on the semicolons, so you should be able to search just for "Pike" and get a match. The syntax would look like this: $filter=search.ismatch('Pike', 'lists') (If looking for "Pike" is all your filter does, you can just use the search and searchFields parameters to the Search API instead of $filter.) If the "lists" field is not already searchable, you will need to either add a new field and re-index the "lists" values, or re-create your index from scratch with the new field definition.
Update
There is a new approach to solve this type of problem that's available in API versions 2019-05-06 and above. You can now use complex types to represent structured data, including in collections. For the original example, you could structure the data like this:
{
"name" : "The 100",
"lists" : [
{ "id": "2c8540ee-85df-4f1a-b35f-00124e1d3c4a", "name": "Bellamy" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c40f11c", "name": "Pike" },
{ "id": "2c8540ee-85df-4f1a-b35f-00155c02e581", "name": "Clark" }
]
}
And then directly query for the name sub-field like this:
$filter=lists/any(l: l/name eq 'Pike')
The documentation for complex types is here.

Solr facet query filtering

I'm trying to build a facet query on the manufacturer field when the search term = "LENS" but want to eliminate all those manufactures where there is no lens..
For example:- I need the following output but want to eliminate "Kodak" since there is not lens from that manufacturer....
"facet_fields": {
"manu" : [
"Canon USA": 25,
"Olympus": 21,
"Sony": 12,
"Panasonic": 9,
"Nikon": 4,
"Kodak":0
],
http://localhost/solr/collection1/select?q=lens&rows=0&wt=json&indent=true&facet=true&facet.query=lens&facet.field=manu
does not yield the correct result
You can use facet.mincount to only retrieve facet keys that have a value above a certain treshold. This is 0 by default.
facet.mincount=1
You can also supply the value on a per-field basis if you're doing multiple facets in a single request, f.manu.facet.mincount=1.
Additionally, there should be no need to do a facet.query when you're already performing the same query as the actual query. The facet.query is useful if you want to do arbitrary queries for a facet, within the same document set already returned by your query.

Solr query that only matches when 2 values are together with no other values on a multivalued field

A query like this:
&q=im_field_teams:(4667 AND 4675)
Will return results like this:
docs: [
{
im_field_teams: [
4675,
4667
]
},
Which is great! However of course I also get values like this:
{
im_field_teams: [
4660,
4702,
4675,
4667,
4684,
]
}
Which is exactly what I'm trying to avoid.
I want an "ONLY AND" or AND AND NOT * or something like that. Matching documents that include only a pair of values (no more, no less) in a multivalued field.
(For what it's worth this: seems to work but feels really wrong.)
&q=im_field_teams:(4667 AND 4675) -im_field_teams:[0 TO 4666] -im_field_teams:[4668 TO 4674] -im_field_teams:[4676 TO *]
There is no direct way of doing this. But there are workarounds/ possible options that can be done (see below), but with possibly simpler query syntax. The query with negation can get tricky sometimes if we have overlapping ranges and with multiple AND clauses.
During index add another field with the size of the array in the multi valued field. Say for example if a document has (4660, 4702, 4675, 4667, 4684) then this field will have size of 5. In Query time, include this field depending on how many entries you are adding. In this case if you are searching for q=im_field_teams:(4667 AND 4675) then add another clause AND im_field_teams_size:2 condition. This will ensure that all terms are matched and only those.

Solr Faceting on Multiple Concatenated Fields

I need a way to get facets on two combined field names. To show you what I mean, take a look at the query as it is now:
{
"responseHeader":{
"status":0,
"QTime":16,
"params":{
"facet":"true",
"indent":"true",
"q":"productId:(1 OR 2 OR 3 OR 4)",
"facet.field":["productMetaType",
"productId"],
"rows":"10"}},
"response":{"numFound":4,"start":0,"docs":[
{
"productId":1,
"productMetaType":"PRIMARY_PHOTO",
"url":"1_PRIM.JPG"},
{
"productId":1,
"productMetaType":"OTHER_PHOTO",
"url":"1_1.JPG"},
{
"productId":1,
"productMetaType":"OTHER_PHOTO",
"url":"1_2.JPG"},
{
"productId":2,
"productMetaType":"OTHER_PHOTO",
"url":"2_1.JPG"}]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"productMetaType":[
"PRIMARY_PHOTO",1,
"OTHER_PHOTO",3],
"productId":[
"1",3,
"2",1]},
"facet_dates":{},
"facet_ranges":{}
}
}
I get two facet fields, productMetaType and productId. What I need to do is somehow combine those fields so I get data back something like this:
1_PRIMARY_PHOTO, 1,
1_OTHER_PHOTO, 2,
2_PRIMARY_PHOTO, 0,
2_OTHER_PHOTO, 1
Does the pivot functionality do this? Unfortunately, we're running Solr 3.1, so pivot isn't available, but if that is the only way to do this, I might have some ammo for upgrading.
The only other thing I could think of was some how concatenating the field names. I am new to Solr and don't know what is possible. Any advice or assistance is appreciated. Thank you for your time.
Yes, Pivot would work do the trick, but as you observed, this feature is only available in Solr trunk.
Your idea to combine both fields would work too. Actually, if your fields have a limited number of values, the easiest and most flexible way to do this would be to use facet queries:
productId:1 AND productMetaType:PRIMARY_PHOTO
productId:2 AND productMetaType:OTHER_PHOTO
productId:1 AND productMetaType:OTHER_PHOTO
productId:2 AND productMetaType:PRIMARY_PHOTO
Otherwise, just create a new field in your Solr schema.xml with string type, recreate your index by adding your documents as previously, but with this new field (that you can generate as you wish, using '_' as a separator between the two field values would work perfectly).

Resources