Wrong facet results with special characters in facet field

Wrong facet results with special characters in facet field - solr

I have implemented Solr Search and Faceting for e-commerce stores, and facing weired issue with facet filter faceting results. This happens only when we have special character (i.e. bracket) in the facet field otherwise everything works fine.
I have implemented this using SolrNet. I checked doing raw queries into Solr directly and found that this issue might be in the Solr itself and not related to SolrNet.
Example:
I have numbers of products and filters like following:
RAM (GB)
2 GB
4 GB
8 GB
Memory (GB)
4 GB
8 GB
16 GB
Each of facet options has some products into them, so the issue is not about facet.min count. And I have applied the tagging properly as well.
Now, one of this facet works fine while the other one doesn't seems to work with bracket in facet field.
Here is my schema where I define facet fields.
<dynamicField name="f_*" type="string" indexed="true" stored="true" multiValued="true" required="false" />
<dynamicField name="pa_*" type="string" indexed="true" stored="true" multiValued="true" required="false" />
Facet works fine when I do query for field starting as pa_, but not with f_.
Query I am doing, into Solr:
../select?indent=on&wt=json&facet.field={!ex%3Dpa_RAM(GB)}pa_RAM(GB)&fq={!tag%3Dpa_RAM\(GB\)}pa_RAM\(GB\):2%2BGB&q=CategoryID:(1+OR+2+OR+3+OR+4)&start=0&rows=10&defType=edismax&facet.mincount=1&facet=true&spellcheck.collate=true
Image1
This works fine as expected.
Another query:
../select?indent=on&wt=json&facet.field={!ex%3Df_Memory(GB)}f_Memory(GB)&fq={!tag%3Df_Memory\(GB\)}f_Memory\(GB\):4%2BGB&q=CategoryID:(1+OR+2+OR+3+OR+4)&start=0&rows=10&defType=edismax&facet.mincount=1&facet=true&spellcheck.collate=true
Gives following result:
Image 2
This doesn't work. However if I remove special character from query and indexed data this works fine.
Moreover, the returned facet option is the selected one on which I added filter tag. All other facet options are not returned by Solr.
I am unable to figure out why this happens and how to fix it.
Any clue \ idea will be great!
Please refer this query and Images.(It's not a right way or perfect solution)
../select?indent=on&wt=json&facet.field={!ex%3Df_Memory(GB)}f_Memory(GB)&fq={!tag%3Df_Memory(GB)}f_Memory\(GB\):4%2BGB&q=CategoryID:(1+OR+2+OR+3+OR+4)&start=0&rows=10&defType=edismax&facet.mincount=1&facet=true&spellcheck.collate=true&fl=Id,Name,f_Memory(GB)
Reference link :Local Parameters for Faceting
Please help me!

Special characters in SOLR queries (q and fq parameters) must be escaped if you need to search them literally, otherwise queryParser assumes their special meaning. (See "Escaping special characters" in SOLR Documentation
In the example + character not escaped in fq:
{!tag=f_Memory\(GB\)}f_Memory\(GB\):4+GB
Those escaping rules do not apply to Local parameters, i.e. all is between {!and }.
In the example you escaped (and )in tag label. In this way the label defined as {!tag=f_Memory\(GB\)} in filter is different from the one referenced in {!ex=f_Memory+(GB)} in facet field so filter is not excluded during faceting and only matching documents are used to build facets.
You should write filter as:
{!tag=f_Memory(GB)}f_Memory\(GB\):4\+GB
and facet as
{!ex=f_Memory+(GB)}f_Memory+(GB)
to obtain what you're looking for.
Example of full correct request:
../select?indent=on&wt=json&facet.field={!ex%3Df_Memory(GB)}f_Memory(GB)&fq={!tag%3Df_Memory(GB)}f_Memory\(GB\):4\%2BGB&q=CategoryID:(1+OR+2+OR+3+OR+4)&start=0&rows=10&defType=edismax&facet.mincount=1&facet=true&spellcheck.collate=true
Simple real example I tested locally:
This is data in core:
Request:
http://localhost:8983/solr/test/select?q=*%3A*&fl=id%2Cf_*%2Cpa_*&wt=json&indent=true
Response:
{
"responseHeader": {
"status": 0,
"QTime": 1,
"params": {
"q": "*:*",
"indent": "true",
"fl": "id,f_*,pa_*",
"wt": "json",
"_": "1474529614808"
}
},
"response": {
"numFound": 2,
"start": 0,
"docs": [
{
"id": "1",
"f_Memory(GB)": [
"4+GB"
],
"pa_RAM(GB)": [
"2+GB",
"4GB",
"8GB"
]
},
{
"id": "2",
"f_Memory(GB)": [
"8+GB"
],
"pa_RAM(GB)": [
"4GB"
]
}
]
}
}
Working faceting:
Request:
http://localhost:8983/solr/test/select?q=*%3A*&fq=%7B!tag%3Df_Memory(GB)%7Df_Memory%5C(GB%5C)%3A4%5C%2BGB&fl=id%2Cf_*%2Cpa_*&wt=json&indent=true&facet=true&facet.field=%7B!ex%3Df_Memory(GB)%7Df_Memory(GB)
Response:
{
"responseHeader": {
"status": 0,
"QTime": 2,
"params": {
"q": "*:*",
"facet.field": "{!ex=f_Memory(GB)}f_Memory(GB)",
"indent": "true",
"fl": "id,f_*,pa_*",
"fq": "{!tag=f_Memory(GB)}f_Memory\\(GB\\):4\\+GB",
"wt": "json",
"facet": "true",
"_": "1474530054207"
}
},
"response": {
"numFound": 1,
"start": 0,
"docs": [
{
"id": "1",
"f_Memory(GB)": [
"4+GB"
],
"pa_RAM(GB)": [
"2+GB",
"4GB",
"8GB"
]
}
]
},
"facet_counts": {
"facet_queries": {},
"facet_fields": {
"f_Memory(GB)": [
"4+GB",
1,
"8+GB",
1
]
},
"facet_dates": {},
"facet_ranges": {},
"facet_intervals": {},
"facet_heatmaps": {}
}
}

Related

pivot facet query does not show nested result

I want to use pivoted facet query with solr to get counts of documents by specific 'type' in each 'region'. I run the following query:
http://localhost:8983/solr/alfresco/select?facet.pivot=ns:region,ns:type&facet=true&indent=on&q=TYPE:ns\:caseFile&rows=0&start=0&wt=json
I expect to see number documents of specific 'type' in each 'region'. But I get 'region' counts only:
....
"_pivot_mappings_": {
"ns:region,ns:type": "text#s__lt#{http://xxx.eu/model/1.0}region,text#s__lt#{http://xxx.eu/model/1.0}type"
},
"facet.pivot": "ns:region,ns:type",
...
"facet_counts": {
"facet_intervals": {},
"facet_pivot": {
"ns:region,ns:type": [
{
"field": "ns:region",
"count": 479,
"value": "{en}hk"
},
{
"field": "ns:region",
"count": 120,
"value": "{en}gk"
},
{
"field": "ns:region",
"count": 5,
"value": "{en}oc"
},
{
"field": "ns:region",
"count": 2,
"value": "{en}dep"
},
]
},
"facet_queries": {},
"facet_fields": {},
"facet_heatmaps": {},
"facet_ranges": {}
},
Pivot facets are documented to produce the results I expect, but I was unable to get nested counts, like it was shown here.
Are there any limitations in document model or index itself that prevent getting results I expect? Or is the query wrong? Is there anything I can check?

try "group",but must be single value

Problem is that the fields that I wanted to use for faceting are simply not "facetable". It is not that they are not visible solr document, thats ok because fields are not stored in the index.
What I learned so far, in order to be facetable, filed should not be tokenised and it needs to have docValues="true" in solr document schema. With these changes, faceting started to work as expected. Since the solr schema is built automatically by alfresco, there is a "facetable" marker for alfresco properties(fields). Once turned on, fileds that were previously indexed and tokenized, now become Indexed, DocValues, Omit Norms, Omit Term Frequencies & Positions. Problem solved.

Showing facet result even though not a single result found for solrQuery

I'm Working with Solr 8.0.0 & i am facing the problem when using the exclude Tag
My solr query looks like below:
http://localhost:8984/solr/HappyDemo202/select?q=*:*&
rows=6&
start=0&
wt=json&
fq={!tag=CATFACET}cat:((desktops))&
fq={!tag=TAGFACET}tag:((cool))&
fq={!tag=Price}Price:[1200 TO 1245]&
json.facet={CatFacet:{type:terms,field:cat,domain:{excludeTags:CATFACET},limit:-1,sort:{count:desc}},TagsFacet:{ type:terms,field:tag,domain:{excludeTags:TAGFACET},limit:-1,sort:{count:desc}}}
Output Of Query looks like below:
{ "responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "*:*",
"json.facet": "{CatFacet:{type:terms,field:cat,domain:{excludeTags:CATFACET},limit:-1,sort:{count:desc}},TagsFacet:{ type:terms,field:tag,domain:{excludeTags:TAGFACET},limit:-1,sort:{count:desc}}}",
"start": "0",
"fq": [
"{!tag=CATFACET}cat:((desktops))",
"{!tag=TAGFACET}tag:((cool))",
"{!tag=Price}Price:[1200 TO 1245]"
],
"rows": "6",
"wt": "json"
}
}, "response": {
"numFound": 0,
"start": 0,
"docs": [] },
"facets": {
"count": 0,
"CatFacet": {
"buckets": []
},
"TagsFacet": {
"buckets": [
{
"val": "new",
"count": 1
},
{
"val": "new1",
"count": 1
}
]
} } }
When you check the Output of Query,CatFacet is not showing any facet result because numFound is 0 but TagsFacet is showing the two facet result like new & new1. I don't know what going wrong , tagFacet must not show the two facet result if numFound is 0.
Can you please suggest,what's going wrong ? Any help will be appreciable.

You're explicitly asking for the fq for tag to be excluded for the tag facet ({excludeTags:TAGFACET}) - meaning that if you didn't have that fq there, there would be results - and you're asking for a count without that field.
If you want the facet to only count those documents being returned, drop the excludeTags value for all those facets that should only be returned for documents that are included in the result set.

Azure Search - Additional Stop Words

When creating an index definition in Azure Search, is there any way to add additional stop words just for that index. For example if you are indexing street names one would like to strip out Road, Close, Avenue etc.
And if one makes the field non-searchable i.e. the whole thing is indexed as one term, then what happens to something like Birken Court Road. Would the term being indexed be Birken Court.
Many thanks

You can define an additional set of stopwords using a custom analyzer.
For example,
{
"name":"myindex",
"fields":[
{
"name":"id",
"type":"Edm.String",
"key":true,
"searchable":false
},
{
"name":"text",
"type":"Edm.String",
"searchable":true,
"analyzer":"my_analyzer"
}
],
"analyzers":[
{
"name":"my_analyzer",
"#odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"standard_v2",
"tokenFilters":[
"lowercase",
"english_stopwords",
"my_stopwords"
]
}
],
"tokenFilters":[
{
"name":"english_stopwords",
"#odata.type":"#Microsoft.Azure.Search.StopwordsTokenFilter",
"stopwordsList":"english"
},
{
"name":"my_stopwords",
"#odata.type":"#Microsoft.Azure.Search.StopwordsTokenFilter",
"stopwords": ["road", "avenue"]
}
]
}
In this index definition I'm setting a custom analyzer on the text field that used the standard tokenizer, lowercase token filter and two stopwords token filters, one for standard english stopwords and one for the additional set of stopwords. You can test the behavior of your custom analyzer with the Analyze API, for example:
request:
{
"text":"going up the road",
"analyzer": "my_analyzer"
}
response:
{
"tokens": [
{
"token": "going",
"startOffset": 0,
"endOffset": 5,
"position": 0
},
{
"token": "up",
"startOffset": 6,
"endOffset": 8,
"position": 1
}
]
}
Analyzers are not applied to non-searchable fields, therefore the stopword in your example would not be removed. To learn more about query and document processing see: How full text search works in Azure Search.

Elasticsearch not returning hits for multi-valued field

I am using Elasticsearch with no modifications whatsoever. This means the mappings, norms, and analyzed/not_analyzed is all default config. I have a very small data set of two items for experimentation purposes. The items have several fields but I query only on one, which is a multi-valued/array of strings field. The doc looks like this:
{
"_index": "index_profile",
"_type": "items",
"_id": "ega",
"_version": 1,
"found": true,
"_source": {
"clicked": [
"ega"
],
"profile_topics": [
"Twitter",
"Entertainment",
"ESPN",
"Comedy",
"University of Rhode Island",
"Humor",
"Basketball",
"Sports",
"Movies",
"SnapChat",
"Celebrities",
"Rite Aid",
"Education",
"Television",
"Country Music",
"Seattle",
"Beer",
"Hip Hop",
"Actors",
"David Cameron",
... // other topics
],
"id": "ega"
}
}
A sample query is:
GET /index_profile/items/_search
{
"size": 10,
"query": {
"bool": {
"should": [{
"terms": {
"profile_topics": [
"Basketball"
]
}
}]
}
}
}
Again there are only two items and the one listed should match the query because the profile_topics field matches with the "Basketball" term. The other item does not match. I only get a result if I ask for clicked = ega in the should.
With Solr I would probably specify that the fields are multi-valued string arrays and are to have no norms and no analyzer so profile_topics are not stemmed or tokenized since all values should be treated as tokens (even the spaces). Not sure this would solve the problem but it is how I treat similar data on Solr.
I assume I have run afoul of some norm/analyzer/TF-IDF issue, if so how do I solve this so that even with two items the query will return ega. If possible I'd like to solve this index or type wide rather than field specific.

Basketball (with capital B) in terms will not be analyzed. This means this is the way it will be searched in the Elasticsearch index.
You say you have the defaults. If so, indexing Basketball under profile_topics field means that the actual term in the index will be basketball (with lowercase b) which is the result of the standard analyzer. So, either you set profile_topics as not_analyzed or you search for basketball and not Basketball.
Read this about terms.
Regarding to setting all the fields to not_analyzed you could do that with a dynamic template. Still with a template you can do what Logstash is doing: defining a .raw subfield for each string field and only this subfield is not_analyzed. The original/parent field still holds the analyzed version of the same text, maybe you will use in the future the analyzed field.
Take a look at this dynamic template. It's the one Logstash is using.
More specifically:
{
"template": "your_indices_name-*",
"mappings": {
"_default_": {
"_all": {
"enabled": true,
"omit_norms": true
},
"dynamic_templates": [
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"omit_norms": true,
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
]
}
}
}

Elasticsearch dynamic mapping compared to Solr dynamic field

In Solr I can define a dynamic field and tie it to a particular data type. In the following example all fields in an indexed document ending with "dt" will be indexed as a long.
<dynamicField name="*_dt" stored="true" indexed="true" type="long" multiValued="true"/>
In ElasticSearch, knowing the name of the field, I can use the "properties" sub-node in "mappings" to index a field to a particular type.
"properties": {
"msh_datetimeofmessage_hl7_dt": {
"type": "date",
"format": "YYYYMMddHHmmss"
},
I tried the following and attempted using a template, unsuccessfully.
"properties": {
"*_dt": {
"type": "date",
"format": "YYYYMMddHHmmss"
},
Does ElasticSearch provide the same functionality as Solr as described above?
Thanks in advance.

I think you may be looking for functionality provided by dynamic templates. Unless I am mistaken, your mapping would look something like this (mostly borrowed from the linked page).
PUT /my_index
{
"mappings": {
"my_type": {
"dynamic_templates": [
{ "my_date_template": {
"match": "*_dt",
"mapping": {
"type": "date",
"format": "YYYYMMDDHHmmss"
}
}}
]
}}}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Wrong facet results with special characters in facet field - solr

Related

pivot facet query does not show nested result

Showing facet result even though not a single result found for solrQuery

Azure Search - Additional Stop Words

Elasticsearch not returning hits for multi-valued field

Elasticsearch dynamic mapping compared to Solr dynamic field

Categories

Resources