I am using solr trunk to search some documents and group them by their category, but I have to group them first by another field. More specifically I am using this schema:
component_id: string
category: string
name: text
And I have two documents:
component_id = register1, category = category1, name='foo bar'
component_id = register1, category = category2, name='foo bar zoo'
My query is (only relevant parameters):
{edismax qf=name}(foo bar)&group.field=component_id&group.truncate=true&facet.field=category&bq=category:category1^2
And the facet results are:
'category':
'category1', 1
'category2',1
BUT, when I change the bq parameter, for example : bq=category:category1^20
The facet results have changed:
'category':
'category1', 1
'category2', 0
Is that posible ? Is a bug ? If I set group.truncate=false everything is fine for this example, but it fails for the rest of the querys.
Thanks & regards
I answer myself.
group.truncate is the correct option when your data is uniform or when your groups contains similar objects, but it has problems when mixing data from diferent categories.
if group.truncate=true |A| ∪ |B| <> |A| + |B| - A ∩ B
Everything is OK with bq parameter.
Related
In a collection there are several different categories of documents. I want the highest ranked search results to be the documents from categories where, for the specific query, there are fewest matching documents.
Concrete example
Let the categories be "foo", "bar", and "baz". If I were to search for "Fred", faceted by category, I would get back the following counts:
foo: 17
bar: 1
baz: 201312
I want to construct a search and/or configure the index such that the one match from the "bar" category would be top of the search results, the 17 "foo" matches would be next, and finally the "baz" matches.
One way I think I could do this would be first to do a faceted search to get the count of matching documents in each category, and then do a second search with boosts based on the category counts - something along the lines of bq=category:bar^10000&bq=category:foo^100; the boosts of 10000 and 100 would obviously be derived from the facet counts and inserted into the query.
I would like to know if something roughly equivalent to this could be achieved in a more efficient way using only a single query, i.e. avoiding the need for a pre-query to fetch the facet counts.
As a simplified example.
I have two fields: title and keywords.
I am using edismax with the following parameter
qf: title + keywords^2
Previously, it was working fine. I have about 15M records indexed in solr. All records have non-empty title. Most records HAD non-empty keywords.
But recently, we decided to remove keywords for most records. As a result, we currently only have 1 record (out of 15M records) that has non-empty keywords.
Unfortunately, as a result of that, the keywords^2 boost specified in qf does not seem to work any more.
For that record, we have title, say, "good store", and keywords, say, "pants clothing". Now, if I search for 'good store pants', the solr matching score is exactly the same regardless of whether I use qf: title or qf: title keywords^2.5. (Again, I think it worked before when most records have non-empty keywords since the solr matching scores are different for the above comparison.)
Answering my own question.
Since there is only one record that has non-empty keywords.
Based on the IDF formula used by solr, the base value is smaller than 1. There fore, boosting it by ^2 does not help at all.
So, I think the "solution" is to add more records with non-empty keywords. Of course, this is not a real solution.
See following for output from debugQuery.
0.84748024 = weight(keywords:good in 4161) [], result of:
0.84748024 = score(doc=4161,freq=1.0 = termFreq=1.0
), product of:
3.0 = boost
0.2876821 = idf(docFreq=1, docCount=1)
0.9819638 = tfNorm, computed from:
1.0 = termFreq=1.0
1.2 = parameter k1
0.75 = parameter b
5.0 = avgFieldLength
5.2244897 = fieldLength
I have a Solr index with a multi-valued field, let's call it mvfield. It can contain arbitrary values, even though currently it is a finite set of values.
I want to find documents which contain only certain values in this field. Example:
doc1: mvfield = [a,b,c,d,e,f]
doc2: mvfield = [e,f]
doc3: mvfield = [f]
doc4: mvfield = [a,b,c,e,f]
doc5: mvfield = [e]
I want to create a query which returns documents which contain only e or f in mvfield, so in this example it should give doc2, doc3 and doc5.
I found a crude workaround using ranges:
-mvfield:[* to e} AND -mvfield:{e TO f} AND -mvfield:{f TO *]
but it seems very fragile. Is there any better way to do this?
Why not just use a filter query.
&fq=myfield:e and so on.
I am using Haystack and SOLR. And I am trying to implement faceting search on one field for multiple values. For example, I am faceting on "author" field.
john 3
kevin 2
sam 2
I want to faceted on "john" OR "sam". How can I format the URL for it?
http://localhost:8000/search/?q=*&selected_facets=author_exact:john +OR+ selected_facets=author_exact:sam
If you want to limit the resulting set of documents to those containing either john or sam, use a fq:
fq=author:sam OR author:john
If you want to only generate facets on certain values or queries, use facet.query:
facet.query=author:sam OR author:john
You will have to use OR with narrow() in your view/form (the exact implementation depends on which view/form you are using).
Since getting the list of selected_facets simply involves:
self.request.GET.getlist('selected_facets')
How you wish to implement that in your url is solely up to you:
you could do it with some kind of separator then you split them apart:
localhost:8000/search/?q=*&selected_facets=author_exact:john|sam
`for x in selected_facets:
field_name, value = x.split(':', 1)
if "|" not in value:
continue
values = x.split('|')`
you could also do it this way:
localhost:8000/search/q=*&selected_facets=author_exact:john&selected_facets=author_exact:sam
facet_dict = dict()
for x in selected_facets:
field_name, value = x.split(':', 1)
facet_dict[field_name].append(value)
Then in haystack:
sqs.narrow('author_exact:(john OR sam)')
So basically there are no strict rules/standards for how to implement multiple values in the url for faceting.
I was wondering if it is possible to sort by the order that you request documents from SOLR. I am running a In based query and would just like SOLR to return them based on the order that I ask.
In (4,2,3,1) should return me documents ordered 4,2,3,1.
Thanks.
You need Sorting in solr, to order them by field.
I assume that "In based query" means something like: fetch docs whose fieldx has values in (val1,val2). You can a field as multi-valued field and facet on that field. A facet query is a 'is in' search, out of the box (so to say) and it can do more sophisticated searches too.
Edited on OP's query:
Updating a document with a multi-valued field in JSON here. See the line
"my_multivalued_field": [ "aaa", "bbb" ] /* use an array for a multi-valued field */
As for doing a facet query, check this.
You need to do one or more fq statements:
&fq=field1:[400 to 500]
&fq=field2:johnson,thompson
Also do read up on the fact (in link above) that you need to facet on stored rather than indexed fields.
You can easily apply sorting with QueryOptions and field sort (ExtraParams property - I am sorting by savedate field, descending):
var results = _solr.Query(textQuery,
new QueryOptions
{
Highlight = new HighlightingParameters
{
Fields = new[] { "*" },
},
ExtraParams = new Dictionary<string, string>
{
{"fq", dateQuery},
{"sort", "savedate desc"}
}
});