How do I get the first and last document per SOLR facet, sorted by some field? - solr

I have documents with multiple facets. I have different views on the website I'm creating to view the facet stats.
As well as showing the facet stats, I would like to show example documents from each facet - specifically, the first and last documents ordered by another field.
For example, properties for sale, I want to see the first and last (based on price) for each facet (the facet can be street, area, city, post code etc).
I can solve this by calling SOLR multiple times for each facet, but it seems like something that should be built in and if so, it would reduce roundtrips a LOT. (it would mean probably 2 SOLR calls per page instead of 30 or possibly more)

Instead of faceting, you can look into
https://wiki.apache.org/solr/FieldCollapsing
Then you need to do only two queries with group.sort ASC or DESC on the field by which you want to sort.

Related

Solr: ordering by a multi valued field

I need to create a new collection on my Solr 6.1.0 cluster where every row is a content and every content can belong to one or many categories, which are specified in a multivalued field categories.
In my web app the user can search by categories, and if wanted it can even group results by category. If it wants to order by category, what about the contents which belong to more than one category?
In this case, the search results page should show the same content more times in different categories. I don't want the web application to filter and order results because in this case, it should ask Solr for every row (I know this is not advised for bad performance), so is there a way to let Solr make this? For example, repeating the same content in two categories if a flag is enabled or if I am asking Solr to sort by category?
Until now I bypassed the problem cloning one record for every category and specifying the category ID in a single int field. But this is not optimized, because in this case my index is much bigger than it could be, and every content metadata a part of category is just the same for every content, and because of this I would like to have 1 content = 1 Solr record.

Solr faceted search building widgets

We want to build a faceted search within our application. For example, if we have quantity field whose values range from 1-20 for 2000 records. We need to allow the user to filter by those values.
To, accomplish this we are planning to extract the quantity field sort, eliminate duplicate records and build a widget on the left hand side of the screen, so the user can select what we need.
Is there a way to get this faceted criteria from Solr or any better way to implement it.
This is what Solr calls a Facet, and is enabled using facet=true
&facet=true&facet.field=quantity
.. will give you a facet entry back in the response, containing a count for each unique value in the quantity field. When the user clicks a quantity link, apply a fq for that particular quantity value, such as fq=quantity:4.
You can use facet.sort to determine if the facet should be sorted by hits (most popular quantity first) or alphabetical.
Multi-Select Facets and Local Params might also be useful, if you want to still show the original counts while allowing the user to drill down into the selection when applying an fq with the selected quantity as a criteria.

How to approach a hierarchical search in Solr?

I'm trying to understand how to approach search requirements I have.
The first one is a normal product search that I know Solr can handle appropriately, where you search for a term and Solr returns relevant documents.
The second one is a search for products within a certain category. I have a hierarchical structure in my database that consists in a category with many subcategories and those have products.
The thing is, when some very specific words are searched for, the first approach shouldn't be used, instead a search for a category should be done and only products within that category should be returned, which for me is a very basic SQL query (select * from products where categoryId = 1000).
Does Solr should or can be used in the second case? If so, what is the normal approach to use?
Besides what #Mysterion proposed of filter queries you should take a look at Solr Facets which gives you very powerful catogory-like searching.
You also might want to consider multivalue field for categoryParentIds which will contain the parent categories that the product is in and thus combined with filter query and or facets will get your parent category searching.
Yes, you could use similar approach in Solr, by attributing your products with categoryId and later, while searching add filter query similiar to SQL, categoryId:10000
For more info about filter query, take a look here - http://wiki.apache.org/solr/CommonQueryParameters#fq

Solr: one of each in all categories

I have product index at solr, product has category field and I need to select one product (better would be random) from each category, how query would look like?
if you are looking for sql group by feature,
with solr 3.3 on-wards,
it has the similar feature called FieldCollapsing
Field Collapsing collapses a group of results with the same field value down to a single (or fixed number) of entries. For example, most search engines such as Google collapse on site so only one or two entries are shown, along with a link to click to see more results from that site. Field collapsing can also be used to suppress duplicate documents.

Index document "linked" to multiple users

Hi I want to index a Solr Document and tag the document with multiple associated users. I want to enable searches like "give me the documents assocaited with userid 1000,1003...9300 containing the word X. More people will be added to the document during the lifetime of the document. I want to potentially associate thousands of users to one document. There is no need to show the associated users in the results, just for search, will indexing of userid or username be more performant and scalable. What field type would be more performant and scalable, appending to a text field, a multivalued field or any other approach?
I believe that using the userid (as an integer) would be the most performant. (At least from my experience so far). Also, using a multivalued field will allow you to use a filter query on the userid field to help improve the query response time.

Resources