Solr: Fetching results with a minimum from each category - solr

I am using solr 4.4.0. The search is performed on products, each of which has a category field. I want to retrieve top n products. But, if some category has less than m products among the top n, then I want to retrieve more products only for those categories.
Eg. I have 4 categories a, b, c, d. n=20 and m=5. Now lets say the top 20(=n) have following category distribution (a:6, b:4, c:6, d:4). Categories b and d have less than m(=5) products. So I would like to fetch one more product(with the next highest score) for both these categories.
Is there a way I can do this using solr

Did you try to solve this with FieldCollapsing?
You use group.field=category, and group.limit lets you set the size of each group. Then you need to be a bit careful on how the groups are sorted, I think it was by the first doc in the group...
But I guess you can achieve what you are looking for fairly easy.

Related

In MongoDB/React what is the best practice for filtering data?

I came here today with a theoretical question. (hint: it will be long and tough, but to fully understand the problem I think I have to write every important detail. If you read it to the end huge thanks for you, you're not the hero we deserved but the hero we needed)
Story time: I'm currently building an online shop from 0. It has the same principles as an ebay: users can create advertisment for their used products. The problem is that I want to create a filtering feautre.
What is my MongoDB data structure?
My page has products with different attributes, by this I mean that the products have varying categories and values. To imagine here is an example
Product A:
Creator:User1
Category:Car
Type:BMW
Color:Red
Product B:
Creator:UserB
Category:Electronics
Type:Phone
Producer:Apple
To be more complex each user can define maximum 3 more extra category and value for each product. So for example User1 adds 2 new category and the final product will be:
Product A:
Creator:User1
Category:Car
Type:BMW
Color:Red
Number of seats:4
Fuel type: Gasoline
Because of the above mentioned when a user adds a new product there will be two type of categories: the static ones which are predefined by me(Category,Type,Color -> in car's case) and the dynamic ones which the user adds (Number of seats, Fuel Type or anything else).
Overall: My final data structure in mongoDB is not static, since there are some added categories. Because of this I have a Product collection and each document looks like the above mentioned example
How are the items shown?
I have a main page. When I populate it I make a call with $skip and a $limit attribute set to 8, so for the first time I only query 8 products. If a user clicks on a Load More button it will load another 8 product and so on.
FINALLY: My actual question ...
So at this point I guess you understand everything related to the business logic so it's time for my question: if I want to filter these dynamic products, but i don't know what is the best practice for it?
My idea:
First create a mongoDB collection named Categories. Each main category will be a document in it and we will store static and dynamic categories and values
ex:
category:car
predefined:[{type:[BMW,Mustang,Ferrari]},{color:[red,green,blue]}]
userdefiend:[{number of seats:[2,4,5,6]},{fuel type:[Gasoline,Air,Diesel]}]
We load the the values in the main page if a user clicks a specific value ex:BMW we set a limit to 8 and go through on our Product collection and get the 8 items which has a Type:BMW. If he selects another option ex: color:Red we loop again through the collection but now with two criteria: Type:BMW and color:Red.
Idea2: Create a Category collection again with this structure
categoryType:predefined
mainCategory:Car
categoryName:Type
BMW:[prodA, prodC,prodD]
Ferrari:[prodD,prodE]
...values:products which contains
categoryType:userdefined
mainCategory:Car
categoryName:Number of seats
4:[prodA, prodD],
5:[prodE]
If a user selects from Type category the BMW we load the products from the BMW fields [prodA,prodC,prodD]. If the user selects Number of seat category with a value 4 we load the [prodA, prodD] and on the webpage we use a filter with our actual products so it remains only [prodA,prodD]. And from our actual list we use findById for the specific products.
I think that these are not the best options from any perspective, but I am really confused.
What do you guys think how should I structure my categories/products to have an efficent read/write/update complexity?
Anyways thank you for reading this and if you made it until here I'm curious about your idea. Have a nice day
UPDATE:
The filtering functionality
To don't have any confusion this is my filtering idea: When a user selects a main category for example Car or Electronics I want to show only the relevant filtering categories and options. Filtering categories in Car's case are Type and Color.
I want these filtering options to have pre-poupulated options. By this I mean, that if a filtering category is Type, and there are 2 Products which has Type:BMW and Type:Ferrari I want to show these values as options for filtering. And I don't want to hardcode these options, for example I hardcoded Type:Laborghini and I have no products with type Laborghini.
By the end if a user clicks to a Type:BMW I will filter all of my products based on that criteria.
My filtering side menu will look like this:
Type: BMW,Ferrari (these values exists in my database)
Color:Red,Black,Grey,Yellow
And for user-added categories I will build a searchbar, if a user selects a userdefiened category I want to add to the filtering categories so the overall look would look like this:
Type: BMW,Ferrari (these values exists in my database)
Color:Red,Black,Grey,Yellow
Number of seats:4,6,7 (number of seats category is added by user, 4,6,7 are the existing values to this category)
You could structure Your data like having a generic Products collection. Having both
Product A:
Creator:User1
Category:Car
Type:BMW
Color:Red
Product B:
Creator:UserB
Category:Electronics
Type:Phone
Producer:Apple
rows. Whenever you show the filter component, you can select the available categories by using an Aggregate (https://stackoverflow.com/a/43570730/1859959)
This would generate search boxes like "Creator", "Category", "Type", "Color", "Producer".
The data itself would be as generic as possible.
When the user wants to add a new product, it starts out from a template, like "Car" or "Electronics". The Templates collection gives him the initial values, which should be included. So it would be like:
{Car: [{type:[BMW,Mustang,Ferrari]},{color:[red,green,blue]}],
Electronics: ... }
Selecting a Car would generate the "type" and "color" input boxes. Saving the form would insert the new row into Products.

Create a ranking mechanism like Algolia

I'm creating a platform to search products based on certain filters. Let's say I have filters A, B and C and a database of around 20,000 products. What I'm trying to build is a way to create a ranking score and query the list of products based on that score. If a product gets a match with filter A, B and C it has a score of 3, therefore, will be visible first in the list with all products of the same score. Then the products with a score of 2. Then 1 and 0.
I'm using Algolia for now using optional_filters but I was wondering if there was a way to build it with a relational database directly like Supabase. Our app is based on Nextjs + Supabase.
Any idea?

Will duplicated documents impact search results?

Will duplicated documents impact search results?
For example, we have an index that we can have the same documents repeated and different by only one field.
Index: ChannelID, ProductID, ProductName and ProductDescription
We may have the same product on different ChannelIDs. So, if we have 100 ChannelIDs, we will have 100 times the same product (document) if this product is available is on all channels.
When doing a search, because of these repetition of documents (same product name, description), will it impact the results quality?
Thanks.
Depending on your search, similar documents would all show up in search results. For example, in your ‘100 different channel ids but same product’ example, if one searches by product description (assuming the same product gets the same description), all of the 100 documents of that product would either be returned if the search matched or none of them will.

How to grab filter elements form a large search result when just using smaller pages in the application?

I have a search query that returns a bunch of records but we're using paging so we only return back 10, 25, or 50 page subsets of the data. Basically the stored procedure goes along these lines.
WITH search_results AS
(
SELECT model, brandname, msrp,
ROW_NUMBER() OVER (ORDER BY #sortExpression) as rowNumber
FROM models
WHERE ...criteria....
)
SELECT * FROM search_results
WHERE rowNumber BETWEEN ((#pageNumber-1)*#pageSize)+1 and ((#pageNumber-1)*#pageSize)+#pageSize
When I use small pages my sproc comes back very quickly, usually in under a second. However, sometimes our users will enter criteria that may return back a few thousand to potentially a ten thousand records. They'll page through and just grab a few at a time, but the actual search results have a large number. The sproc is running quickly when my page size is small but when I increase it, it takes a few seconds which is too long.
This is all fine, I am using smaller pages. The problem is that part of our solutions is a filter. This filter lists all of the brands, categories, and 4 price range quadrants for the full search results. So when they click filter it takes the lowest price and and the highest, breaks it into 4 equal sized groupings and they are on the form with checkboxes. user than can check the ranges they want to filter and the brands and categories they want to filter. This re-submits the search with new criteria.
I'm not sure how to return a full set of brands, categories and highest/lowest price without running the main procedure (in the WITH) twice. Does it make sense to dump all of that into a temporary table and then return back multiple recordsets to my business object? The results, the brand list, the category list, and then the MIN and MAX prices? Is there a pattern for returning back filter information for search results like this?
The answer is no, there's no pattern and maybe. Try to put the raw big result in a temp table and use it to return multiple record sets. Test it and see if it works better. Doing it you are (in general) using more memory and less CPU. In the tunning business sometimes there are trade offs where you can exchange memory/IO/CPU use to speed up things.

Solr - Sort search result based on facet count

I have information of products in Solr and each product is under a category. I would like to sort product search result based on facet count on Category. So if there are 100 products matching criteria under Electronics category and 50 products under Books, I would like to sort the result(or boost) the way that I see first 100 electronics and then 50 books.
Is it possible with one query?
Thanks.
I don't think this is possible; faceting does not influence search results.

Resources