How can I search a multi-value field for values exclusively - solr

I have a Solr index with a multi-valued field, let's call it mvfield. It can contain arbitrary values, even though currently it is a finite set of values.
I want to find documents which contain only certain values in this field. Example:
doc1: mvfield = [a,b,c,d,e,f]
doc2: mvfield = [e,f]
doc3: mvfield = [f]
doc4: mvfield = [a,b,c,e,f]
doc5: mvfield = [e]
I want to create a query which returns documents which contain only e or f in mvfield, so in this example it should give doc2, doc3 and doc5.
I found a crude workaround using ranges:
-mvfield:[* to e} AND -mvfield:{e TO f} AND -mvfield:{f TO *]
but it seems very fragile. Is there any better way to do this?

Why not just use a filter query.
&fq=myfield:e and so on.

Related

How to search multiple words in one field on solr?

I have a field in solr of type list of texts.
field1:{"key1:val1,key2:val2,key3:val3", "key1:val1,key2:val2"}
I want to form a query such that when I search for key1:val1 and key3:val3 I get the result who has both the strings i.e key1:val1 and key3:val3.
How shall I form the query?
If these are values in a multivalued field, you can't - directly. You'll have to use something like highlighting to tell you where Solr matched it.
There is no way to tell Solr "I only want the value that matched inside this set of values".
If this is a necessary way to query your index, index the values as separate documents instead in a separate collection. In that case you'd have to documents instead, one with field1:"key1:val1,key2:val2,key3:val3" and one with key1:val1,key2:val2.
You can use AND with fq.
Like:
fq=key1:val1 AND key3:val3
With this filter query you will get only records where key1 = val1 AND key3 = val3.

Solr what is the difference between query using q and df?

I just did two things.
q -> iphone
df -> brand
and
q -> brand:iphone
Both returns same result.
First one looks for iphone string in brand field. Second one returns brand field whose value is phone.
What is the purpose of df field?
There really isn't any difference - but to show WHEN it would be different, you'll have to consider the case when you query a different field than the one provided in df.
q=model:foo&df=brand
This would lead to foo being matched against values in the field model, while brand is ignored. If the person writing the query however didn't specify a field, brand would be searched.
Most of the time you'd want to use the edismax or dismax query type (defType=edismax) to be able to create more suitable rules for which fields to query and the weight between the fields, and to handle how most people use a search field:
defType=edismax&q=foo&qf=brand^10 model
.. would search the fields brand and model for foo, and give a tenfold increase in score if the hit is in the brand field compared to the model field. Just q=foo&qf=brand would replicate your first query, and since edismax also supports parts of the lucene syntax, q=brand:foo&qf=model should also work.

Faceted on multiple values of the same field in haystack

I am using Haystack and SOLR. And I am trying to implement faceting search on one field for multiple values. For example, I am faceting on "author" field.
john 3
kevin 2
sam 2
I want to faceted on "john" OR "sam". How can I format the URL for it?
http://localhost:8000/search/?q=*&selected_facets=author_exact:john +OR+ selected_facets=author_exact:sam
If you want to limit the resulting set of documents to those containing either john or sam, use a fq:
fq=author:sam OR author:john
If you want to only generate facets on certain values or queries, use facet.query:
facet.query=author:sam OR author:john
You will have to use OR with narrow() in your view/form (the exact implementation depends on which view/form you are using).
Since getting the list of selected_facets simply involves:
self.request.GET.getlist('selected_facets')
How you wish to implement that in your url is solely up to you:
you could do it with some kind of separator then you split them apart:
localhost:8000/search/?q=*&selected_facets=author_exact:john|sam
`for x in selected_facets:
field_name, value = x.split(':', 1)
if "|" not in value:
continue
values = x.split('|')`
you could also do it this way:
localhost:8000/search/q=*&selected_facets=author_exact:john&selected_facets=author_exact:sam
facet_dict = dict()
for x in selected_facets:
field_name, value = x.split(':', 1)
facet_dict[field_name].append(value)
Then in haystack:
sqs.narrow('author_exact:(john OR sam)')
So basically there are no strict rules/standards for how to implement multiple values in the url for faceting.

filter query equals to given lucene query

My Lucene query is : +((+MinimumPrice:[1000.0 TO 10000.0]) | (+MaximumPrice:[1000.0 TO 10000.0]))
its equivalent filter query: fq=MinimumPrice:[1000 TO 10000] OR MaximumPrice:[1000 TO 10000]
But I want it in the form of fq=MinimumPrice:Parameters&fq=MaximumPrice:Parameters
You can replace Parameters by any kind of range, but results count should be equal.
I would assume that your problem is that, the result count is not equal when you use it in the form of this:
fq=MinimumPrice:Parameters&fq=MaximumPrice:Parameters
It is also to be expected, because the default relation between two fq (filter queries) is an AND relation, so
fq=MinimumPrice:Parameters&fq=MaximumPrice:Parameters
actuall translates into
fq=MinimumPrice:[1000 TO 10000] AND MaximumPrice:[1000 TO 10000]

Parameter bq modify facet counts using grouping

I am using solr trunk to search some documents and group them by their category, but I have to group them first by another field. More specifically I am using this schema:
component_id: string
category: string
name: text
And I have two documents:
component_id = register1, category = category1, name='foo bar'
component_id = register1, category = category2, name='foo bar zoo'
My query is (only relevant parameters):
{edismax qf=name}(foo bar)&group.field=component_id&group.truncate=true&facet.field=category&bq=category:category1^2
And the facet results are:
'category':
'category1', 1
'category2',1
BUT, when I change the bq parameter, for example : bq=category:category1^20
The facet results have changed:
'category':
'category1', 1
'category2', 0
Is that posible ? Is a bug ? If I set group.truncate=false everything is fine for this example, but it fails for the rest of the querys.
Thanks & regards
I answer myself.
group.truncate is the correct option when your data is uniform or when your groups contains similar objects, but it has problems when mixing data from diferent categories.
if group.truncate=true |A| ∪ |B| <> |A| + |B| - A ∩ B
Everything is OK with bq parameter.

Resources