Facet groups in Solr - is this possible in one query - solr

I'm working with facets in Solr and I have the concept of facet groups that each contain a number of facets.
Say I have a structure like this
Product Type
- Chairs (50)
- Tables (20)
- Mirrors (5)
Color
- Yellow (5)
- Black (50)
- Red (10)
- Orange (10)
I have an OR relationship between facets within a facet group and an AND relationship between the groups.
So if I choose Chairs as a facet I get 50 products. Using the standard faceting in Solr (and assuming that each product can have exactly one product type and one color) it will now give:
Product Type
- Chairs (50)
- Tables (0)
- Mirrors (0)
Color
- Yellow (5)
- Black (30)
- Red (5)
- Orange (10)
However, what I really want is that the facet counts within Product Type stay the same as that would reflect what would happen if one of them was chosen.
Can this be done with Solr in one query?

This can implemented using tagged filters and then excluding them when creating the facet.
From the referenced page:
To implement a multi-select facet for doctype, a GUI may want to still display the other doctype values and their associated counts, as if the doctype:pdf constraint had not yet been applied. Example:
=== Document Type ===
[ ] Word (42)
[x] PDF (96)
[ ] Excel(11)
[ ] HTML (63)
To return counts for doctype values that are currently not selected, tag filters that directly constrain doctype, and exclude those filters when faceting on doctype.
q=mainquery&fq=status:public&fq={!tag=dt}doctype:pdf&facet=on&facet.field={!ex=dt}doctype
Filter exclusion is supported for all types of facets. Both the tag and ex local params may specify multiple values by separating them with commas.

Related

Splitting data, indexing and querying

I have a table in the DB with 2 columns, id and detail.
The id column has unique ids and the detail column has data like below -
A 20% B 30% C 50%
B 50% D 50%
X 10% A 40% Z 50%
I can do nothing about the way it is in the DB.
I want to let my users search using the following queries -
A < 20. - meaning all documents where A is less than 20%.
B > 30, X > 5%. - meaning all documents where B is greater than 30 and X is greater than 5.
I am unable to figure out the combination of tokenizer, filter to get this going.
What I have done is found the total number of unique types (A, B, C, ...), created those many fields in the Solr schema which are typeCode1 for A, typeCode2 for B, etc and the corresponding values which are typeValue1, typeValue2, etc. If A is not available for a document then the typeCode1 is null and so is it's typeValue1 field. I also have a mapping table in the DB where I look for which type is entered by the user and then get the corresponding field that is in Solr and then search.
EDIT - Adding a few more details
The data from the DB is fetched. Let us say it is A 20% D 30% C 50%.
Then I split on the basis of %<space> (String.split("")). So I have 3 rows in my array.
Then I check the mapping of the type in the DB to find out which Solr field name corresponds to which type.
Once I have the field then I submit A to typeCode1 and 20 to typeValue1, D to typeCode4 and 30 to typeValue4 and so on.
Currently the total number of unique types I have is 45, however, it can increase and my current approach is not scalable.
One possible solution is to add a dynamic field for each typeCode, such as A_code with 20 as the value. That will allow you to use the field as you'd use any field in Solr, and query it using intervals, above/under, do faceting on the field etc.
<dynamicField name="*_code" type="int" indexed="true" stored="true" />
The only "real" downside is that your cache size will grow, since you'll get one internal cache per field. This cache will be sized according to the total number of documents in the index. For a small index like the one you describe and with only 45 different field names that shouldn't be an issue.

Solr hierarchical facets: how to get all 2nd-level values for the top N 1st-level values

I have a pair of multi-valued index fields, author and author_norm, and I've created a hierarchical facet field for them using the pattern described at https://wiki.apache.org/solr/HierarchicalFaceting#Indexed_Terms. The facet values look like this:
0/Blow, J
1/Blow, J/Blow, Joe
1/Blow, J/Blow, Joseph
1/Blow, J/Blow, Jennifer
0/Smith, M
1/Smith, M/Smith, Michelle
1/Smith, M/Smith, Michael
1/Smith, M/Smith, Mike
Authors are associated to article records, and in most cases an article will have many authors. This means that for a Solr query that returns 100+ articles, there will potentially be 1000+ authors represented.
My problem is that when I go to display this hierarchy to the user, due to my facet.limit and facet.mincount being set to sane values, I do not have the complete set of 2nd-level values, i.e., the 2nd level of my hierarchy will be cut-off at a certain point. I will have something like this:
Blow, J (30)
Blow, Joe (17)
Blow, Joseph (9)
Smith, M (22)
Smith, Michelle (14)
Smith, Michael (6)
I would like to also have the "Blow, Jennifer (4)" and "Smith, Mike (2)" entries in this list, but they're not returned in the response because mincount cutoff is 5. So I end up with a confusing display (17 + 9 != 30, etc).
One option would be to put a little "(more)" link at the bottom of every 2nd-level list and fetch the full set via ajax. I'm not crazy about this solution because it's asking users to work/click more than they really should have to, and also because I can't control the length of the initial 2nd-level listing; sometimes it'll be 3 names + "(more)", sometimes 2 or even 1. That's just ugly.
I could set mincount=1 and limit=-1 for just my hierarchical facet field, but that would be nuts because for a large query (100k hits) I would be fetching 100k+ values I don't need. I only need the full set of 2nd-level values for the top N 1st-level values.
So unless someone has a better suggestion, I'm assuming I need to do some kind of follow-up query. So after all that, here's what I'm really asking: is is there a way to fetch these 2nd-level values in a single follow-up query. Given an initial solr response, how can I get all the 2nd-level permutations of just the top N 1st-level values of my hierarchy?
Thanks!
PS, I'm using Solr 4.0.
You can modify mincount for any level in pivot:
facet.pivot=fieldA,filedB&f.fieldA.limit=3&f.fieldB.limit=-1
Problems starts when both fields are the same facet.pivot=fieldA,filedA in that case I would probably create a copy of fieldA as fieldB

Field collapsing / Grouping - How to make SOLR return intersection of 2 resultset groups?

I have 2 fields for grouping, these 2 field can have different keywords stored in them
Ex:
Field1: CD, book, e-book
Field2: repo1, repo2, repo3, repo4
Now I want to group the combination of CD/repo1 , book/repo2, e-book/repo3,e-book/repo4,CD/repo4 rather than grouping just on field1 seperately and field2 seperately. i.e I need to group based on 2 grouped results (intersection between the grouped results). Is there a way I can make SOLR return group results for all combination?
Thanks.
BB
I don't think you can have intersection between grouped results at query time.
The other solution would be create the combination into a field at index time and use the field for grouping which would give you the results.

Solr Search Facets: How do i make them count products and NOT product varieties

The shop i'm working on sells clothing. Each item of clothing come in multiple varieties. For example Shirt A might come in: Red Large, Red Medium, Blue Large, Blue Medium, White Large, and White Medium.
At first I had added each variety as a solr doc. So for the above product I added 6 solr docs, each with the same Product ID. I got solr to group the results by Product ID and everything worked perfectly.
However the facet counts were all variety counts and not product counts. So for example .. just limiting it to the one product above - (if that were the only product in the system say).. the facet counts would show:
Red (2)
Blue (2)
White (2)
Which was correct, there were 2 documents added for each color. But really what i want to see is this:
Red (1)
Blue (1)
White (1)
As there is only 1 product for each color.
So now i'm thinking in order to do that I need to make each solr document a product.
In that case i would add the product, and add the field "color" 3 times one red, one blue, one white, and add the field size 3 times as well. But now solr doesn't really know what size goes with each color. Maybe I only have white in small.
What is the correct way to go about this to make the facet counts as they should be?
Turns out I could do this using grouping (field collapsing) here
http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters
specially these parameters added to the query
group=true
group.field=product_id"
group.limit=100
group.facet=true
group.ngroups=true
group.facet is the one that really make the facets work with the groups like i wanted them to.
I think that you have 2 options.
Option 1:
Once you get the list of facet values (Red, Blue & White in the given example), then fire the original query again with each facet value as a filter. For example, if the original query was q=xyz&group.field=ProductID then fire q=xyz&group.field=ProductID&group.ngroups=true&fq=color:Red. The ngroups value in the response will give you the required count for Red. Similarly, fire a separate query for Blue and White.
Option 2:
Create a separate field called Product_Color which includes both the ProductID and the color. For example, if a product has ID is ABC123 and color is Red, then Product_Color will be ABC123_Red. Now, to get the facets for color, fire a separate query which groups by Product_Color instead of ProductID and you will get the required facets with the correct values. Remeber to set group.truncate=true for this to work.
You can try looking into Facet Pivot, which would allow you to have single document, tree like facet with proper counts and filtering.

Grouping results and keeping facet counts consistent

Using Solr 3.3
Key Store Item Name Description Category Price
=========================================================================
1 Store Name Xbox 360 Nice game machine Electronic Games 199.99
2 Store Name Xbox 360 Nice game machine Electronic Games 199.99
3 Store Name Xbox 360 Nice game machine Electronic Games 249.99
I have data similar to above table and loaded into Solr. Item Name,
description Category, Price are searchable.
Expected result
Facet Field
Category
Electronic(1)
Games(1)
**Store Name**
XBox 360 Nice game machine priced from 199.99 - 249.99
What will be the query parameters that I can send to Solr to receive results above, basically I wan to group it by Store, ItemName, Description and min max price
And I want to keep paging consistent with the main (StoreName). The paging should be based on the Store Name group. So if 20 stores were found. I should be able to correctly page.
Please suggest
If using Solr 4.0, the new "Grouping" (which replaces FieldCollapsing) fixes this issue when you add the parameter "group.facet=true".
So to group your fields you would have add the following parameters to your search request:
group=true // Enables grouping
group.facet=true // Facet counts to be number of groups instead of documents
group.field=Store // Groups results by the field "Store"
group.ngroups=true // Tells Solr to return the number of groups found
The number of groups found is what you would show to the user and use for paging, instead of the normal total count, which would be the total number of documents in the index.
Have you looked into field collapsing? It is new in Solr 3.3.
http://wiki.apache.org/solr/FieldCollapsing
What I did is I created another field that grouped the required fields in a single field and stored it, problem solved, so now I just group only on that field and I get the correct count.

Resources