Getting unexpected output in SOLR facet.pivot grouping - solr

I am trying to group the fields in Solr using facet.pivot option. Its working as expected when there is no special character in actual data. In case if any special characters are in actual data then the output is getting split into many.
below is the url I am trying to group using facet.pivot.here escalation_dl is the email address which has some special characters
/select?facet=true&facet.limit=-1&facet.pivot=job_name,escalation_dl&q=*:*
Actual Output:
"field":"job_name",
"value":"test_job1",
"count":1,
"pivot":[{
"field":"escalation_dl",
"value":"test",
"count":1},
{
"field":"escalation_dl",
"value":"gmail.com",
"count":1}]}
Expected Output
"field":"job_name",
"value":"test_job1",
"count":1,
"pivot":[{
"field":"escalation_dl",
"value":"test#gmail.com",
"count":1}]}

This is because the field you're faceting on has a field type with a Tokenizer and filters attached (such as the default text_general field). Use a string field for any field you want to facet on, as that will keep the values intact as you expect.

Related

SOLR: facet.field is working for each word in a field differently, how to apply facet.field for whole field sentence?

In facet.field, I have added "MerchantName" field, so I got result as below
"facet_fields":{
"MerchantName":[
"amazon",133281,
"factory",99566,
"club",99566,
"fashion",4905,
"swish",4905,
"store",1001,
"swank",1001,
"the",1001
]
}
In the above array, "club factory", "swish fashion" and "the swank store" are in a single field, but an array as you can see these are treated as a different word.
So how to apply facet query on the whole field which returns an array with whole field value?
The field MerchantName used for faceting. This field should be defined in schema.xml as a string (type="string") in order for the facet to use the whole text.
As you are using a text based field with field type as text_general, the value will be split into multiple tokens. The same is the case with MerchantName field.
Otherwise it will divide it according to the way it has been tokenized.
You can also add docValues="true" for a field MerchantName, then DocValues will automatically be used any time the field is used for sorting, faceting or function queries.
For faceting Solr could get use of DocValues - which is special way of recording field values internally that is more efficient for some purposes, such as sorting and faceting, than traditional indexing.

Solr Query using Facet missing the special characters and its showing in split values

I have added the documents into the solr using the solr client java API
Consider 2 fields,
field1 | field2
aaa#test.com value1
I was able to successfully index the documents.
In the solr admin UI when i executed the query i was able to see 1 record with these above values.
In the admin UI I have enabled Facet on this field and try to execute the query.
But i got result in splitted values as shown below
Checked the facet checkbox and in the facet.field = owner and then clicked execute query got the below result
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"owner":[
"com",1,
"test",1,
"aaa",1]},
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{}}}
If you see in the above result i got splited string how to get that in single output
aaa#test.com , 1
Please help me on this
The facets are generated from the tokens for the field. If you're using a text based field with a tokenizer attached, the value will be split into multiple tokens.
To get the behavior you want, use a string field and reindex your content to that field. Use a copyField instruction if you still want to be able to search with partial content against the field, and facet on the new field instead.

Solr Text field and String field - different search behaviour

I am working on Solr 4+.
I have several fields into my solr schema with different solr field types.
Does the search on text field and string field differs?
Because I am trying to search on string field (which is a copy field of few facet fields) which does not work as expected. The destination string field is indexed and stored both.
However, when I change destination field which a text field (only indexed), it works fine.
Can you suggest why this happens? What is exactly the difference between text and string fields in solr in respect to searches?
TextFields usually have a tokenizer and text analysis attached, meaning that the indexed content is broken into separate tokens where there is no need for an exact match - each word / token can be matched separately to decide if the whole document should be included in the response.
StrFields cannot have any tokenization or analysis / filters applied, and will only give results for exact matches. If you need a StrField with analysis or filters applied, you can implement this using a TextField and a KeywordTokenizer.
A general text field that has reasonable, generic cross-language defaults: it tokenizes with StandardTokenizer, removes stop words from case-insensitive "stopwords.txt" (empty by default), and down cases. At query time only, it also applies synonyms.
The StrField type is not analyzed, but indexed/stored verbatim.

Solrnet facet returning spaces

I'm using Solrnet to return search results and am also requesting the facets, in particular categories which is a multi-valued field.
The problem I'm coming up against is that the category "house products" is being returned as two seperate facets because of the space.
Is there a way of ensuring this is returned as a single facet value, or should I be escaping the value when it is added to the index?
Thanks in advance
Al
If the tokens are generated for house products then you are using text analysis for the field.
Text fields are not suggested to be used for Faceting.
You won't get the desired behavior as the text fields would be tokenized and filtered leading to the generation of multiple tokens which you see from the facets returned as response.
Use a copy field to copy the field to a String field to be able to facet on it without splitting the words.
SolrFacetingOverview :-
Because faceting fields are often specified to serve two purposes,
human-readable text and drill-down query value, they are frequently
indexed differently from fields used for searching and sorting:
They are often not tokenized into separate words
They are often not mapped into lower case
Human-readable punctuation is often not removed (other than double-quotes)
There is often no need to store them, since stored values would look much like indexed values and the faceting mechanism is used for
value retrieval.
Try to use String fields and it would be good enough without any overheads.
The faceting works on tokens, so if you have a field that is tokenized in many words it will split the facet too.
I suggest you create another field of type string used only for faceting.

Solr comma separated field - facet search

I got a field in my solr index which holds comma separated values like "area1,area2,area3,area4". There are documents in it where the value is just one value like "area6".
Now i want to make a facet search over all this values.
Example (This is what i want):
area1:10
area2:4297
area3:54
area4:65
area6:87
This is what i get
area1,area2,area3,area4: 7462
area6: 87
Does solr delivers any solutions for this problem or must i seperate the different values on my own.
While indexing you need to get tokens out of the data using ,. You can use the PatternTokenizerFactory tokenizer with , as the pattern. This would split your text whenever it finds a ,.
The field in your schema.xml should be multivalued.

Resources