How not to break phrase into word when do faceting - solr

All:
Im pretty new to Solr faceting search, when I specify some facet fields which have phrase as value, how can I just treat the value as a whole phrase but not word and only return one facet. For example:
If I have documents with field like:
{ "category": "baby toy"}, {"category": "clothes"}
And the result returned will look like:
["baby", 0, "boy", 0, "clothes", 1]
I wonder why it works like this and how to change it to what I mentioned above, like:
["baby boy", 1, "clothes", 1]
Thanks

The field you use for faceting should be defined in schema.xml as a string (type="string") in order for the facet to use the whole text. Otherwise it will divide it according to the way it has been tokenized.

The faceting field should be indexed as whole. So in your schema.xml check how you are token-zing this field.
you should use Keyword Tokenizer that index entire text field as a single token.
Ref https://cwiki.apache.org/confluence/display/solr/Tokenizers#Tokenizers-KeywordTokenizer

Related

SOLR: facet.field is working for each word in a field differently, how to apply facet.field for whole field sentence?

In facet.field, I have added "MerchantName" field, so I got result as below
"facet_fields":{
"MerchantName":[
"amazon",133281,
"factory",99566,
"club",99566,
"fashion",4905,
"swish",4905,
"store",1001,
"swank",1001,
"the",1001
]
}
In the above array, "club factory", "swish fashion" and "the swank store" are in a single field, but an array as you can see these are treated as a different word.
So how to apply facet query on the whole field which returns an array with whole field value?
The field MerchantName used for faceting. This field should be defined in schema.xml as a string (type="string") in order for the facet to use the whole text.
As you are using a text based field with field type as text_general, the value will be split into multiple tokens. The same is the case with MerchantName field.
Otherwise it will divide it according to the way it has been tokenized.
You can also add docValues="true" for a field MerchantName, then DocValues will automatically be used any time the field is used for sorting, faceting or function queries.
For faceting Solr could get use of DocValues - which is special way of recording field values internally that is more efficient for some purposes, such as sorting and faceting, than traditional indexing.

SOLR - Searching record based on SOLR field in passed string

I have a CSV string field say "field1" in SOLR which can have value similar to 1,5,7
Now, I want to get this record if I pass values:
1,5,6,7
OR
1,5,7,10
OR
1,5,7
Basically any of these inputs should return me this record from SOLR.
Is there anyway to achieve this. I am open for schema change if it helps.
The Standard Tokenizer (used in text fields like text_general) will not split on commas if there is no space in between characters.
That means that "1,2,3" will be indexed as a single token ("1,2,3") but it will index "1, 2, 3" as three tokens ("1", "2", "3").
If you can make sure there will be a space after the comma in the value that you are indexing and the value that you are using in your search query you might be able to achieve what you want by indexing your field as a text_general.
You can use the Analysis Screen in Solr to see how your value will be indexed and searched and see if any of the built-in field types gives you what you want.

Precise expression or word solr match

I am looking for way to match a very specific expression or word in my solr collection.
Here is an example :
I want the query to return me :
"Paris"
And not : "Paris is great"
And not : "I like Paris"
Thanks :)
If you only want exact matches, make sure the field type is defined as string. A string field will not do any tokenization or use any filters, and will only generate hits when the query is exactly the same as the value indexed.
You need to use KeywordTokenizer
This tokenizer treats the entire text field as a single token
https://lucene.apache.org/solr/guide/6_6/tokenizers.html#Tokenizers-KeywordTokenizer

Solr - EDisMax pf/pf/pf3 Fields Vs ShingleFilterFactory

Is there any benefit in using the ShingleFilterFactory with the EDisMax pf/pf2/pf3 fields? For example, if I have an untokenized index (via KeywordTokenizerFactory) with the following values:
1. "jeans"
2. "skinny jeans"
3. "red blazer"
And I query for "red skinny jeans", I would like to match only "jeans" and "skinny jeans". With EDisMax, using the pf fields should implicitly generate shingles, so is there any benefit in plugging in the ShingleFilterFactory as well?
Thanks

How does Solr process the query string when using edismax qf parameter and specify field in query

All:
[UPDATE]
After reading the debug explain, it seems that the qf will expand only
the keywords without specifying field.
===================================================================
When I learn to use edismax query parser, it said the qf paramter is:
Query Fields: specifies the fields in the index on which to perform
the query. If absent, defaults to df.
And its purpose is to generate all fields' combination with the query terms.
However, if we already specify the field in query( q prameter), I wonder what happen when I specify another different fields in qf?
For example:
q=title:epic
defType=edismax
qf=content
Could anyone give some explanation how SOLR interpret this query?
Thanks
When you specify qf it means you want solr to search for whatever is in the "q" field in these "qf" fields. So, your first and third line contradict each other:
q=title:epic
defType=edismax
qf=content
If you want to search for any document where the content field contains anything matching your search terms, but these search terms as tokens in "q" separated by +OR+.
like this...
q=I+OR+like+OR+books+ORand+OR+games
defType=edismax
qf=content
When q=title:epic. It means you has settled the query field to title, so the qf parameter could not be set as "content", in this case, you have no query result for sure. You leave the qf parameter empty or set it as "title"

Resources