Faceted on multiple values of the same field in haystack

Faceted on multiple values of the same field in haystack - solr

I am using Haystack and SOLR. And I am trying to implement faceting search on one field for multiple values. For example, I am faceting on "author" field.
john 3
kevin 2
sam 2
I want to faceted on "john" OR "sam". How can I format the URL for it?
http://localhost:8000/search/?q=*&selected_facets=author_exact:john +OR+ selected_facets=author_exact:sam

If you want to limit the resulting set of documents to those containing either john or sam, use a fq:
fq=author:sam OR author:john
If you want to only generate facets on certain values or queries, use facet.query:
facet.query=author:sam OR author:john

You will have to use OR with narrow() in your view/form (the exact implementation depends on which view/form you are using).
Since getting the list of selected_facets simply involves:
self.request.GET.getlist('selected_facets')
How you wish to implement that in your url is solely up to you:
you could do it with some kind of separator then you split them apart:
localhost:8000/search/?q=*&selected_facets=author_exact:john|sam
`for x in selected_facets:
field_name, value = x.split(':', 1)
if "|" not in value:
continue
values = x.split('|')`
you could also do it this way:
localhost:8000/search/q=*&selected_facets=author_exact:john&selected_facets=author_exact:sam
facet_dict = dict()
for x in selected_facets:
field_name, value = x.split(':', 1)
facet_dict[field_name].append(value)
Then in haystack:
sqs.narrow('author_exact:(john OR sam)')
So basically there are no strict rules/standards for how to implement multiple values in the url for faceting.

Related

How to search multiple words in one field on solr?

I have a field in solr of type list of texts.
field1:{"key1:val1,key2:val2,key3:val3", "key1:val1,key2:val2"}
I want to form a query such that when I search for key1:val1 and key3:val3 I get the result who has both the strings i.e key1:val1 and key3:val3.
How shall I form the query?

If these are values in a multivalued field, you can't - directly. You'll have to use something like highlighting to tell you where Solr matched it.
There is no way to tell Solr "I only want the value that matched inside this set of values".
If this is a necessary way to query your index, index the values as separate documents instead in a separate collection. In that case you'd have to documents instead, one with field1:"key1:val1,key2:val2,key3:val3" and one with key1:val1,key2:val2.

You can use AND with fq.
Like:
fq=key1:val1 AND key3:val3
With this filter query you will get only records where key1 = val1 AND key3 = val3.

Solr what is the difference between query using q and df?

I just did two things.
q -> iphone
df -> brand
and
q -> brand:iphone
Both returns same result.
First one looks for iphone string in brand field. Second one returns brand field whose value is phone.
What is the purpose of df field?

There really isn't any difference - but to show WHEN it would be different, you'll have to consider the case when you query a different field than the one provided in df.
q=model:foo&df=brand
This would lead to foo being matched against values in the field model, while brand is ignored. If the person writing the query however didn't specify a field, brand would be searched.
Most of the time you'd want to use the edismax or dismax query type (defType=edismax) to be able to create more suitable rules for which fields to query and the weight between the fields, and to handle how most people use a search field:
defType=edismax&q=foo&qf=brand^10 model
.. would search the fields brand and model for foo, and give a tenfold increase in score if the hit is in the brand field compared to the model field. Just q=foo&qf=brand would replicate your first query, and since edismax also supports parts of the lucene syntax, q=brand:foo&qf=model should also work.

How to boost AND in a solr query?

Suppose a user enters a two word input for search, since the default boolean applied is OR, all entries containing all or both entries appear.
What I was interested to know, is that if conditions specifically meeting the AND condition could be boosted.
In case of multiple words, can words be specified to imply specific constraints in searching or boost few parameters in case these words are present.For e.g: , if input be "with x and y without z", can i make my solr to interpret it as (x AND y) AND (Not z)? or at least boost those entries which partially or fully meet the requirement?
EDIT:
I have tried using boost with edismax as shown here:
$query = $client->createSelect(); //create search query
$query->setQuery('memberType:'.$searchQuery.' firstName:'.$searchQuery.' gender:'.$searchQuery); //include fields required for searching //meantion fields to be searched and search query/ies
$edismax = $query->getEDisMax();
$edismax->setQueryFields('firstName memberType^3 gender^2'); //boost fields
$query->setStart($start)->setRows($rows); //vary bracketted numbers to vary results staring point and no. of rows to be displayed, use variables instead of constants
$query->setFields(array('id', 'firstName', 'lastName', 'eid', 'gender', 'memberType')); //set return fields
//$query->addSort('id', $query::SORT_ASC); //sort field and customisations
$resultSet = $client->select($query);
When i search for a name with a particular member type, like "sanjay candidate" i expect the order to be entries with sanjay and candidate, and then all users who are candidates and then all users who are sanjay, but instead i get sanjay and candidate then all who are sanjay and then all candidates.
I am not able to figure out what the issue may be or if i can provide a more customized boosting.

If you are using eDismax, you have a whole collection of boosting options for a phrase, bigram, a separate boosting query and so on. Reading through the wiki page and experiment. You should not need to do any custom coding for this scenario.

Solr - how do I index barcode

I have a documnet that contains the following data
car {
id: guid
name: string
sku: list<barcode>
}
Now,
The barcodes dont have a pattern. It can be either one of the follwings:
ABCD-EF34GD-JOHN
ABCD-C08-YUVF
I want to index my documents so that search for
1. ABCD will return both.
2. AB will return both.
3. JO - will return ABCD-EF34GD-JOHN but not car with name john.
4. If the ID (which is indexed) contains "ABCD", i dont want the document to be returned (the user doesn't see it)
so far I have defined car and sku as text_en.
But I dont get bulletes no 2 and 3.
IS there a better way to define sku attribute.
My Query is
http://....:8983/solr/vault/select?q=ABCD&qf=Name+SKU&defType=edismax
Thanks.

What you are trying to do here is actually a wildcard search on the tokens separated by the dash ("-").
An easy (but slow in performance) way is to add a star (*) at the end of your word in the query, like this:
http://....:8983/solr/vault/select?q=AB*&qf=Name+SKU&defType=edismax
Another option is to change the field type that you use to index and implement an NGram algoritm. If you use this filter in your field you will create a toklen for each combination of letters in the word you are indexing. For example: ABCD => AB, ABC, ABCD
So it will find what you are looking for and the search will be very fast, but the index will be very big and the indexation time will also increase notably.
You can find more info here:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory

what is the advantages of mutivalued option in solr

What is the advantages of mutivalued field option in solr.
I have a field with comma separated keywords.
I can do 2 things
make a non-multivalued text field
make a multivalued text field which contains each keyword
I can still query in both the cases. So whats the advantages of multivalued over non-multivalued?

advantages of multivalued: you don't need to change the document design. If en document containes multiple values in one filed, so solr/lucen can handle this field.
Also an advantage: multiple values could describe an document more exact (thing about tags of an blog post, or so)
advantages of non-multivalued: you can use specific features, which required an single term (word) in one filed, like spell checking. It's also a benefit for clustering (carrot) or grouping, which works mostly better on non-multivalued fields

Querying by the multivalue field will receive what you want.
Example: doc1 has a keyword 'abc', and doc2 has a keyword 'abcd'. If query by keyword 'abc' only doc1 should be matched.
So in non-multivalue approach both documents will matched, case you'll use like syntax.

multivalue fields can be very handy, let say you have many fields and you wish to search for several fields but not in all of them. you can create multivalue field that include all the fields that you wont to search for them on this field and search in it.
for example, let say you have fields that may have value of string or value of number. and than you wish to search on all string values that were found in the document. so you can create multivalue field for all string values and search in it.