Solr Facet , how exclude only main query? - solr

My core solr "catalog" have theses field :
"category":yyyyy
"s_brand":zzzzz
"visibilty":hhhhh
"color":wwww
The start, that i call "main query", is :
q=category:computer
So , i want only computer on my result.
Then, i set facet to s_brand:
facet.field=s_brand&facet=on&q=category:computer
Here, i have list of computer and facet "s_brand" with constraint like :
s_brand :
ACER (5)
APPLE(10)
HP(4)
BANG&OLUFSEN(0)
ADIDAS(0)
REEBOK(0)
This list include all brand, not only brand they match main query category:computer.
I set facet s_marque with minCount(1) :
s_brand :
ACER (5)
APPLE(10)
HP(4)
It's good. BUT !!
Now i will add other facet like color
s_brand :
ACER (5)
APPLE(11)
HP(4)
color:
PINK(7)
BLUE(13)
Then, i filterquery ( always with mainquery category:computer) on brand
fq=brand:ACER
i have :
brand:
ACER(5)
color:
PINK(2)
BLUE(3)
I want keep the other constraint ( about ux), so i exclude filter from facet.
fq={!tag=dt}brand:ACER&facet=true&facet.field={!ex=dt}brand
Results:
s_brand :
ACER (5)
APPLE(11)
HP(4)
color:
PINK(7)
BLUE(13)
So, i have all facet who respect my main query but the count don't follow my filter query.
How i can get facet:
1 respect my main query ( only category computer)
2 who return all facet who respect this main query
3 but return count after filterquery
Like that :
s_brand :
ACER (5)
APPLE(0)
HP(0)
color:
PINK(2)
BLUE(3)
Edit : My solutions actually 11/10/2019:
Two facets for same field. One exclude / one no exclude. Then i manage result with php. I don't like but work.
I finds that : https://lucene.apache.org/solr/guide/7_5/json-facet-api.html
About bucket and sub facets ( nested facets)
Results are really special and we need to hard process them behind.. but it's really strong seems to be the good way.

Related

Solr faceting on a Query Function result

Is it possible to produce solr facets for a field which is the result of Query Function?
I have an index of products with a price field for each store they are available in:
{
"id" : "p1",
"name_s" : "Product 1",
"description_s" : "The first product",
"price_l1_d" : 19.99,
"price_l2_d" : 20.00,
"price_l3_d" : 20.99,
"price_l4_d" : 19.99,
"price_l5_d" : 25.00,
"price_l6_d" : 18.00
},
{
"id" : "p2",
"name_s" : "Product 2",
"description_s" : "The second product",
"price_l1_d" : 12.99,
"price_l2_d" : 15.00,
"price_l3_d" : 13.49,
"price_l4_d" : 14.00,
"price_l5_d" : 12.50,
"price_l6_d" : 16.00
}
and I need my query to return the cheapest price in the customer's 3 closest stores.
I know I can return this value using fl=min(price_l2_d, price_l4_d, price_l6_d) and I can even sort on this but is it possible to return a "Price" facet based on this value for each document? Ideally I'd like to be able to show all products whose minimum price (in my 3 stores) is between 0-5, 5-10, 10-15, 15-20 etc etc and filter on this.
I've tried using min(price_l2_d, price_l4_d, price_l6_d) as facet.field but I receive an undefined field error. Is there a better way?
I cannot produce this value at index time because the closest 3 stores could be any combination of three price fields (in this example there is 6 but thee are likely to be over 200)
While not THE solution, I have found A solution which should work. Unfortunately it's not possible to create a traditional facet for price ranges as you would with a single integer attribute, but a two-point slider is possible.
Using the JSON facet API (as suggested by a comment on the original question) and the following:
{
"max" : "max(min(price_l2_d, price_l4_d, price_l6_d))",
"min" : "min(min(price_l2_d, price_l4_d, price_l6_d))"
}
I can return the boundaries of the slider with the smallest minimum price at the three stores and the biggest minimum price.
The values on this slider can then be applied using the {!frange} function as follows:
fq={!frange l=0 u=20}min(price_l2_d, price_l4_d, price_l6_d)
where l is the lower bound and u is the upper bound
Hopefully this helps anyone else looking for an answer to this.

Solr facet sum instead of count

I'm new to Solr and I'm interested in implementing a special facet.
Sample documents:
{ hostname: google.com, time_spent: 100 }
{ hostname: facebook.com, time_spent: 10 }
{ hostname: google.com, time_spent: 30 }
{ hostname: reddit.com, time_spent: 20 }
...
I would like to return a facet with the following structure:
{ google.com: 130, reddit.com: 20, facebook.com: 10 }
Although solr return values are much more verbose than this, the important point is how the "counts" for the facets are the sum of the time_spent values for the documents rather than the actual count of the documents matching the facet.
Idea #1:
I could use a pivot:
q:*:*
&facet=true
&facet.pivot=hostname,time_spent
However, this returns the counts of all the unique time spent values for every unique hostname. I could sum this up in my application manually, but this seems wasteful.
Idea #2
I could use the stats module:
q:*:*
&stats=true
&stats.field=time_spent
&stats.facet=hostname
However, this has two issues. First, the returned results contain all the hostnames. This is really problematic as my dataset has over 1m hostnames. Further, the returned results are unsorted - I need to render the hostnames in order of descending total time spent.
Your help with this would be really appreciated!
Thanks!
With Solr >=5.1, this is possible:
Facet Sorting
The default sort for a field or terms facet is by bucket count
descending. We can optionally sort ascending or descending by any
facet function that appears in each bucket. For example, if we wanted
to find the top buckets by average price, then we would add sort:"x
desc" to the previous facet request:
$ curl http://localhost:8983/solr/query -d 'q=*:*&
json.facet={
categories:{
type : terms,
field : cat,
sort : "x desc", // can also use sort:{x:desc}
facet:{
x : "avg(price)",
y : "sum(price)"
}
}
}
'
See Yonik's Blog: http://yonik.com/solr-facet-functions/
For your use case this would be:
json.facet={
hostname_time:{
type: terms,
field: hostname,
sort: "time_total desc",
facet:{
time_total: "sum(time_spent)",
}
}
}
Calling sum() in nested facets worked for us only in 6.3.0.
I believe what you are looking for is an aggregation component, but be aware that solr is a full text search engine and not the database.
So, answer of your question is , go with idea#1. Otherwise you should have used Elastics Search or MongoDB or even Redis which are equipped with such aggregation components.

Retrieving distinct documents from Solr

I've had hard time explaining and finding what I need so please put your self in my shoes for a moment.
My requirement comes from a relational database background. I may be using Solr to do something it wasn't designed to do, or may be it can do what I need, I still need to confirm that. Hopefully you can assist me.
After indexing numerous documents into Solr. I need to retrieve distinct documents based on a filter. Just think about it as retrieving distinct rows while also applying a WHERE condition.
For example, in a relational database, I may have the following columns
(Country) (City) (Whatever)
Egypt Cairo Hospitals
Egypt Alex Schools
Egypt Mansoura Hospitals
Egypt Cairo Schools
If I perform this query: SELECT DISTINCT Country, City FROM mytable
I should get the following rows
(Country) (City)
Egypt Alex
Egypt Mansoura
Egypt Cairo
Now after indexing the original table (SELECT * FROM mytable), how can I achieve the SAME output from Solr ? How can I retrieve documents by saying that I need these documents to be distinct based on some fields ? I will also need to apply a not null filter for a specific field.
I don't need statistics of any kind, I only need to get the documents.
I hope I was clear enough. Thank you for your time.
this would be achievable with field collapsing by grouping by multiple fields, but unfortunately only one field is supported right now. There is an open issue, check it out.
Did you try with facet?
You should do somethings like this:
http://localhost:8983/solr/select/?q=*:*&facet=on&facet.field=city&facet.field=country
he will return you all the city (with a distinct) and the his count.
Here there is the wiki if you want to learn more about it.
I hope this help you.
Another good solution available from Solr 4 is based on Pivot (Decision Tree) Faceting.
Try with:
/solr/collection1/select?q=*:*&facet=true&facet.pivot=Country,City
This should return:
"facet_counts" : {
"facet_queries" : {},
"facet_fields" : {},
"facet_dates" : {},
"facet_ranges" : {},
"facet_pivot" : {
"Country,City" : [ {
"field" : "Country",
"value" : "Egypt",
"count" : 4,
"pivot" : [ {
"field" : "City",
"value" : "Cairo",
"count" : 2
}, {
"field" : "City",
"value" : "Alex",
"count" : 1
}, {
"field" : "City",
"value" : "Mansoura",
"count" : 1
} ]
} ]
}
}

Treat two facets as the same value

Assume a list of books with an Author field. How might one facet on the Author field, but treat the values "Stephen King" and "Richard Bachman" as the same? So that these results:
Hemmingway: 8
Stephen King: 10
Edgar Allan Poe: 20
Richard Bachman: 5
Would be displayed as:
Hemmingway: 8
Stephen King: 15
Edgar Allan Poe: 20
Note that it is unimportant if the facet title is "Stephen King", "Richard Bachman", or something else. It is only important that they are faceted together.
Note that a query-time solution is needed. Unfortunately the schema cannot be changed for this index, it is a general-purpose index and if every user could make his own schema 'tweak' it would get out of hand.
You can achieve that by combining facet fields with facet queries.
Add these to your query:
&facet=true
&facet.field=author
&facet.query=author:("Hemmingway" OR "Stephen King")
Facets returned will look like this:
facet_counts: {
facet_queries: {
"author:("Hemmingway" OR "Stephen King")" : 18
}
facet_fields: {
author: {
"Hemmingway" : 8,
"Stephen King" : 10,
"Edgar Allan Poe" : 20,
"Richard Bachman" : 5
}
}
}
You can also add an 'alias' to the facet query. Change this
&facet.query=author:("Hemmingway" OR "Stephen King")
To
&facet.query={!ex=dt key="Hemmingway"}author:("Hemmingway" OR "Stephen King")
And the facet query output will be:
facet_queries: {
"Hemmingway" : 18
}
I'm not sure if you can merge both output fields (facet_queries and facet_fields) from Solr, but doing that from any client should be straight-forward.
You need an analysis chain that converts the strings. I think SynonymFilter will do this for you if you apply it at index time and at query time. You would need to make sure the sysnonym mapping goes one way only.
I assume you do not need the whole list of facets, just top n authors. If this is the case you can do it in a post processing step.
You know your synonyms and if you put a slightly higher facet.limit(let's say 2*n) then you just have to filter out the synonyms from the result set. If you end up with < n results then just repeat the previous step(worse case you have to do one more request(s) depending on the number of synonyms).
in ex ...&facet=true&facet.field=author&facet.limit=100&facet.mincount=1
This one has nothing to do with Solr, but considering all the restrictions it might just cut it.
Best regards,

Parameter bq modify facet counts using grouping

I am using solr trunk to search some documents and group them by their category, but I have to group them first by another field. More specifically I am using this schema:
component_id: string
category: string
name: text
And I have two documents:
component_id = register1, category = category1, name='foo bar'
component_id = register1, category = category2, name='foo bar zoo'
My query is (only relevant parameters):
{edismax qf=name}(foo bar)&group.field=component_id&group.truncate=true&facet.field=category&bq=category:category1^2
And the facet results are:
'category':
'category1', 1
'category2',1
BUT, when I change the bq parameter, for example : bq=category:category1^20
The facet results have changed:
'category':
'category1', 1
'category2', 0
Is that posible ? Is a bug ? If I set group.truncate=false everything is fine for this example, but it fails for the rest of the querys.
Thanks & regards
I answer myself.
group.truncate is the correct option when your data is uniform or when your groups contains similar objects, but it has problems when mixing data from diferent categories.
if group.truncate=true |A| ∪ |B| <> |A| + |B| - A ∩ B
Everything is OK with bq parameter.

Resources