Sort solr from least to highest based on facet count - solr

This my facet query
http://localhost:8983/solr/test_words/select?q=*:*&facet=true&facet.field=keyword
On searching I get results ordered based on facet count from highest to lowest
Example : {"and" :10, "to": 9, "also" : 8}
But instead I want the results ordered based on facet count from lowest to highest
Example : {"tamil" :1, "english" :2, "french":3}
I also tried
http://localhost:8983/solr/test_words/select?q=*:*&facet=true&facet.field=keyword&facet.sort=count
Which is not giving expected results. Pls help me on this!

The "old" facet interface doesn't support sorting by asc as far I know - it's always sorted from most common term to the least common one.
The JSON facet API does however support asc and desc for sorting:
sort - Specifies how to sort the buckets produced.
count specifies document count, index sorts by the index (natural) order of the bucket value. One can also sort by any facet function / statistic that occurs in the bucket. The default is count desc. This parameter may also be specified in JSON like sort:{count:desc}. The sort order may either be asc or desc.
"facet": {
keywords: {
"type": "terms",
"field": "keyword",
"sort": "count asc",
"limit": 5
}
}

Related

Solr faceting on a Query Function result

Is it possible to produce solr facets for a field which is the result of Query Function?
I have an index of products with a price field for each store they are available in:
{
"id" : "p1",
"name_s" : "Product 1",
"description_s" : "The first product",
"price_l1_d" : 19.99,
"price_l2_d" : 20.00,
"price_l3_d" : 20.99,
"price_l4_d" : 19.99,
"price_l5_d" : 25.00,
"price_l6_d" : 18.00
},
{
"id" : "p2",
"name_s" : "Product 2",
"description_s" : "The second product",
"price_l1_d" : 12.99,
"price_l2_d" : 15.00,
"price_l3_d" : 13.49,
"price_l4_d" : 14.00,
"price_l5_d" : 12.50,
"price_l6_d" : 16.00
}
and I need my query to return the cheapest price in the customer's 3 closest stores.
I know I can return this value using fl=min(price_l2_d, price_l4_d, price_l6_d) and I can even sort on this but is it possible to return a "Price" facet based on this value for each document? Ideally I'd like to be able to show all products whose minimum price (in my 3 stores) is between 0-5, 5-10, 10-15, 15-20 etc etc and filter on this.
I've tried using min(price_l2_d, price_l4_d, price_l6_d) as facet.field but I receive an undefined field error. Is there a better way?
I cannot produce this value at index time because the closest 3 stores could be any combination of three price fields (in this example there is 6 but thee are likely to be over 200)
While not THE solution, I have found A solution which should work. Unfortunately it's not possible to create a traditional facet for price ranges as you would with a single integer attribute, but a two-point slider is possible.
Using the JSON facet API (as suggested by a comment on the original question) and the following:
{
"max" : "max(min(price_l2_d, price_l4_d, price_l6_d))",
"min" : "min(min(price_l2_d, price_l4_d, price_l6_d))"
}
I can return the boundaries of the slider with the smallest minimum price at the three stores and the biggest minimum price.
The values on this slider can then be applied using the {!frange} function as follows:
fq={!frange l=0 u=20}min(price_l2_d, price_l4_d, price_l6_d)
where l is the lower bound and u is the upper bound
Hopefully this helps anyone else looking for an answer to this.

Can Solr JSON Facet API stat functions be combined with a domain filter to limit the results of the function?

My use case is to make a query to Solr, and to extract counts of unique terms for certain fields within the result set. The trick is that within my counts, I need to limit the output to only terms that match a certain input string--without adjusting the main Solr query. E.g., "Solr, give me results for 'War and Peace', and give me the first ten facets on author where the author field has 'doge' in it, and give me a count of all unique author values in the result set where the author field has 'doge' in it."
The Solr JSON Facet API allows me to facet using stat functions; in this case, I'm interested in using the unique() function to get the counts I need. So, e.g.,
{
"author_count": "unique(author)"
}
...tells me the total number of unique values for 'author' in the result set. This is good.
I can limit the output of a facet using the domain change option, like so:
{
"author_facet": {
"type": "terms",
"field": "author",
"mincount": 1,
"limit": 10,
"offset": 0,
"domain": {
"filter": "author:doge"
}
}
}
This is also good.
The problem I'm having is that when I send both of these choices, the result of the unique() call (in author_count) is a count of all unique author values in the base result set, regardless of whether the author contains 'doge'. The author_facet results do correctly limit the output to only authors with 'doge' in them. But I need to also apply that limit to the results of the unique() function.
I cannot alter the base query, because it represents user input that is independent of the facet filtering input. E.g, the user will have searched for "War and Peace," and now want to see only those facets where the author is 'doge', with a count of the total authors matching 'doge'.
If it is meaningful to the answer, I am running Solr 9.0.0.
Is there a way to apply domain filtering to Solr stat functions in the JSON Facet API, such as unique()?
EDIT: To clarify: The number of authors with 'doge' may be very large, and so would exceed the number of actual facets that should be returned. I'm limiting the facet response to 100, but there could be 978 authors with 'doge'. I want to inform the user of that 978 count while only returning the top 100.

Create mongodb index based on gt and lt

I have Collection has many documents which called "products",
I want to improve performance by creating an index for it.
The problem is IDK how the index works, So IDK index will helpful or not.
My most frequently used query is about fields "storeId" and "salesDates"
storeId is just string so I think it good to create an index,
But the tricky one is salesDates, salesDates is Object has two fields from and to like this
product {
...someFields,
storeId: string,
salesDate {
from: Date time Number
to: Date time Number
}
}
My query is based on $gt $lt for example
product.find({
storeId: "blah",
salesDate.from : {$gt: 1423151234, $lt: 15123123123 }
})
OR
product.find({
storeId: "blah",
salesDate.from: {$gt: 1423151234},
salesDate.to: {$lt: 15123123123 }
})
What is the proper index for this case?
For your specific use case, I would recommend you to create an index only on
the from key and use $ge and $le in your find query.
The reason is that the lesser the number of keys you are indexing (in cases where the multiple key queries can be avoided), the better it is.
make sure that you follow the below order for both the indexing and find operation.
Index Command and Order:
db,product.createIndex({
"storeId": 1,
"salesDate.from": -1, // Change `-1` to `1` if you want to ISODate key to be indexed in Ascending order
})
Find Command:
db,product.find({
"storeId": "blah",
"salesDate.from": {$gt: 1423151234, $lt: 15123123123 },
})

How to properly optimize a query in mongodb?

There is a query of the following type that takes a long time (in a collection of millions of records), the indexes are set on the _id and cpe_id fields, the state. How to understand the matter in the $ in operator due to the increase in search and also because of the large collection, as I have complexity O (N * logM), where N is the length in in, M is the number of elements in the collection. Are there any options to somehow improve the performance of the query?
db.collection.aggregate([
{$match :
{"cpe_id" :
{$in : ["e389439e-bd04-f3fb-c512-00193b0c4385","d389439e-bd04-f3fb-c512-00193b13d00c"....]}
}
},
{$sort : {state: 1, _id : 1}},
{$skip : 0},
{$limit : 100},
])
The $in operator can be effectively serviced by an index on the queried field, i.e. {cpe_id: 1}.
In terms of performance, it will need to scan one region of the index for each value in the provided array. I would expect that part to scale linearly with the array size.
The sort can be accomplished using an index as well, but MongoDB can use only one index to perform the sort, and it must be the same index used to select the documents.
If there is no single index that can be use for both, it will first find all matching documents, load them into memory, sort them, and only then apply the skip and limit.
If there is an index on {cpe_id: 1, state: 1, _id: 1} or {state: 1, _id: 1, cpe_id: 1} there are several optimizations the query planner can use:
documents are selected using the index, so no non-matching documents are loaded
since the values in the index are already sorted in the desired order, it can omit the in-memory sort
without the blocking sort, the execution can halt after (skip + limit) documents have been found.
You can use the db.collection.explain shell helper or explain command with the "allPlansExecution" option to see which indexes were considered, and how each performed.

Restrict multi field facet calculation to subset of possible values

I have a non trivial SOLR query, which already involves a filter query and facet calculations over multiple fields. One of the facet fields is a a multi value integer field, that is used to store categories. There are many possible categories and new ones are created dynamically, so using multiple fields is not an option.
What I want to do, is to restrict facet calculation over this field to a certain set of integers (= categories). So for example I want to calculate facets of this field, but only taking categories 3,7,9 and 15 into account. All other values in that field should be ignored.
How do I do that? Is there some build in functionality which can be used to solve this? Or do I have to write a custom search component?
The parameter can be defined for each field specified by the facet.field parameter – you can do it, by adding a parameter like this: facet.field_name.prefix.
I don't know about any way to define the facet base that should be different from the result, but one can use the facet.query to explicitly define each facet filter, e.g.:
facet.query={!key=3}category:3&facet.query={!key=7}category:7&facet.query={!key=9}category:9&facet.query={!key=15}category:15
Given the solr schema/data from this gist, the results will have something like this:
"facet_counts": {
"facet_queries": {
"3": 1,
"7": 1,
"9": 0,
"15": 0
},
"facet_fields": {
"category": [
"2",
2,
"1",
1,
"3",
1,
"7",
1,
"8",
1
]
},
"facet_dates": {},
"facet_ranges": {}
}
Thus giving the needed facet result.
I have some doubts about performance here(especially when there will be more than 4 categories and if the initial query is returning a lot of results), so it is better to do some benchmarking, before using this in production.
Not exactly the answer to my own question, but the solution we are using now: The numbers I want to filter on, build distinct groups. So we can prefix the id with a group id like this:
1.3
1.8
1.9
2.4
2.5
2.11
...
Having the data like this in SOLR, we can use facted prefixes to facet only over a single group: http://wiki.apache.org/solr/SimpleFacetParameters#facet.prefix

Resources