pivot facet query does not show nested result

pivot facet query does not show nested result - solr

I want to use pivoted facet query with solr to get counts of documents by specific 'type' in each 'region'. I run the following query:
http://localhost:8983/solr/alfresco/select?facet.pivot=ns:region,ns:type&facet=true&indent=on&q=TYPE:ns\:caseFile&rows=0&start=0&wt=json
I expect to see number documents of specific 'type' in each 'region'. But I get 'region' counts only:
....
"_pivot_mappings_": {
"ns:region,ns:type": "text#s__lt#{http://xxx.eu/model/1.0}region,text#s__lt#{http://xxx.eu/model/1.0}type"
},
"facet.pivot": "ns:region,ns:type",
...
"facet_counts": {
"facet_intervals": {},
"facet_pivot": {
"ns:region,ns:type": [
{
"field": "ns:region",
"count": 479,
"value": "{en}hk"
},
{
"field": "ns:region",
"count": 120,
"value": "{en}gk"
},
{
"field": "ns:region",
"count": 5,
"value": "{en}oc"
},
{
"field": "ns:region",
"count": 2,
"value": "{en}dep"
},
]
},
"facet_queries": {},
"facet_fields": {},
"facet_heatmaps": {},
"facet_ranges": {}
},
Pivot facets are documented to produce the results I expect, but I was unable to get nested counts, like it was shown here.
Are there any limitations in document model or index itself that prevent getting results I expect? Or is the query wrong? Is there anything I can check?

try "group",but must be single value

Problem is that the fields that I wanted to use for faceting are simply not "facetable". It is not that they are not visible solr document, thats ok because fields are not stored in the index.
What I learned so far, in order to be facetable, filed should not be tokenised and it needs to have docValues="true" in solr document schema. With these changes, faceting started to work as expected. Since the solr schema is built automatically by alfresco, there is a "facetable" marker for alfresco properties(fields). Once turned on, fileds that were previously indexed and tokenized, now become Indexed, DocValues, Omit Norms, Omit Term Frequencies & Positions. Problem solved.

Related

Sorting the results from solr query

Hello I am trying to simply sort the results of my query alphabetically. The data that is returned looks like this:
"FacetFilters": [
{
"Id": 0,
"Name": "small",
"ResultCount": 47,
"IsSelected": false,
"Hide": false
},
{
"Id": 0,
"Name": "n/a",
"ResultCount": 1,
"IsSelected": false,
"Hide": false
},
{
"Id": 0,
"Name": "medium",
"ResultCount": 79,
"IsSelected": false,
"Hide": false
},
{
"Id": 0,
"Name": "large",
"ResultCount": 4,
"IsSelected": false,
"Hide": false
}
]
I was able to this issue post-query by reversing the list using: FacetFilters.Reverse();, but I would prefer to just get the results in the correct order through the query. Could someone please tell me what the best way is to go about this? Thank you. For the record I am using solrnet package for .Net.

You can't sort facets in descending order with the old Facet API (which is what SolrNet uses). Until SolrNet supports the JSON Facet API natively you'll have to add it yourself.
See How to implement JSON facet API in SolrNet for how to do the first part, then see Order Facet Fields by Descending Value for how to sort a facet in descending order by using the JSON facet API instead.
"sort":"index desc"

Azure Search - Additional Stop Words

When creating an index definition in Azure Search, is there any way to add additional stop words just for that index. For example if you are indexing street names one would like to strip out Road, Close, Avenue etc.
And if one makes the field non-searchable i.e. the whole thing is indexed as one term, then what happens to something like Birken Court Road. Would the term being indexed be Birken Court.
Many thanks

You can define an additional set of stopwords using a custom analyzer.
For example,
{
"name":"myindex",
"fields":[
{
"name":"id",
"type":"Edm.String",
"key":true,
"searchable":false
},
{
"name":"text",
"type":"Edm.String",
"searchable":true,
"analyzer":"my_analyzer"
}
],
"analyzers":[
{
"name":"my_analyzer",
"#odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"standard_v2",
"tokenFilters":[
"lowercase",
"english_stopwords",
"my_stopwords"
]
}
],
"tokenFilters":[
{
"name":"english_stopwords",
"#odata.type":"#Microsoft.Azure.Search.StopwordsTokenFilter",
"stopwordsList":"english"
},
{
"name":"my_stopwords",
"#odata.type":"#Microsoft.Azure.Search.StopwordsTokenFilter",
"stopwords": ["road", "avenue"]
}
]
}
In this index definition I'm setting a custom analyzer on the text field that used the standard tokenizer, lowercase token filter and two stopwords token filters, one for standard english stopwords and one for the additional set of stopwords. You can test the behavior of your custom analyzer with the Analyze API, for example:
request:
{
"text":"going up the road",
"analyzer": "my_analyzer"
}
response:
{
"tokens": [
{
"token": "going",
"startOffset": 0,
"endOffset": 5,
"position": 0
},
{
"token": "up",
"startOffset": 6,
"endOffset": 8,
"position": 1
}
]
}
Analyzers are not applied to non-searchable fields, therefore the stopword in your example would not be removed. To learn more about query and document processing see: How full text search works in Azure Search.

Wrong facet results with special characters in facet field

I have implemented Solr Search and Faceting for e-commerce stores, and facing weired issue with facet filter faceting results. This happens only when we have special character (i.e. bracket) in the facet field otherwise everything works fine.
I have implemented this using SolrNet. I checked doing raw queries into Solr directly and found that this issue might be in the Solr itself and not related to SolrNet.
Example:
I have numbers of products and filters like following:
RAM (GB)
2 GB
4 GB
8 GB
Memory (GB)
4 GB
8 GB
16 GB
Each of facet options has some products into them, so the issue is not about facet.min count. And I have applied the tagging properly as well.
Now, one of this facet works fine while the other one doesn't seems to work with bracket in facet field.
Here is my schema where I define facet fields.
<dynamicField name="f_*" type="string" indexed="true" stored="true" multiValued="true" required="false" />
<dynamicField name="pa_*" type="string" indexed="true" stored="true" multiValued="true" required="false" />
Facet works fine when I do query for field starting as pa_, but not with f_.
Query I am doing, into Solr:
../select?indent=on&wt=json&facet.field={!ex%3Dpa_RAM(GB)}pa_RAM(GB)&fq={!tag%3Dpa_RAM\(GB\)}pa_RAM\(GB\):2%2BGB&q=CategoryID:(1+OR+2+OR+3+OR+4)&start=0&rows=10&defType=edismax&facet.mincount=1&facet=true&spellcheck.collate=true
Image1
This works fine as expected.
Another query:
../select?indent=on&wt=json&facet.field={!ex%3Df_Memory(GB)}f_Memory(GB)&fq={!tag%3Df_Memory\(GB\)}f_Memory\(GB\):4%2BGB&q=CategoryID:(1+OR+2+OR+3+OR+4)&start=0&rows=10&defType=edismax&facet.mincount=1&facet=true&spellcheck.collate=true
Gives following result:
Image 2
This doesn't work. However if I remove special character from query and indexed data this works fine.
Moreover, the returned facet option is the selected one on which I added filter tag. All other facet options are not returned by Solr.
I am unable to figure out why this happens and how to fix it.
Any clue \ idea will be great!
Please refer this query and Images.(It's not a right way or perfect solution)
../select?indent=on&wt=json&facet.field={!ex%3Df_Memory(GB)}f_Memory(GB)&fq={!tag%3Df_Memory(GB)}f_Memory\(GB\):4%2BGB&q=CategoryID:(1+OR+2+OR+3+OR+4)&start=0&rows=10&defType=edismax&facet.mincount=1&facet=true&spellcheck.collate=true&fl=Id,Name,f_Memory(GB)
Reference link :Local Parameters for Faceting
Please help me!

Special characters in SOLR queries (q and fq parameters) must be escaped if you need to search them literally, otherwise queryParser assumes their special meaning. (See "Escaping special characters" in SOLR Documentation
In the example + character not escaped in fq:
{!tag=f_Memory\(GB\)}f_Memory\(GB\):4+GB
Those escaping rules do not apply to Local parameters, i.e. all is between {!and }.
In the example you escaped (and )in tag label. In this way the label defined as {!tag=f_Memory\(GB\)} in filter is different from the one referenced in {!ex=f_Memory+(GB)} in facet field so filter is not excluded during faceting and only matching documents are used to build facets.
You should write filter as:
{!tag=f_Memory(GB)}f_Memory\(GB\):4\+GB
and facet as
{!ex=f_Memory+(GB)}f_Memory+(GB)
to obtain what you're looking for.
Example of full correct request:
../select?indent=on&wt=json&facet.field={!ex%3Df_Memory(GB)}f_Memory(GB)&fq={!tag%3Df_Memory(GB)}f_Memory\(GB\):4\%2BGB&q=CategoryID:(1+OR+2+OR+3+OR+4)&start=0&rows=10&defType=edismax&facet.mincount=1&facet=true&spellcheck.collate=true
Simple real example I tested locally:
This is data in core:
Request:
http://localhost:8983/solr/test/select?q=*%3A*&fl=id%2Cf_*%2Cpa_*&wt=json&indent=true
Response:
{
"responseHeader": {
"status": 0,
"QTime": 1,
"params": {
"q": "*:*",
"indent": "true",
"fl": "id,f_*,pa_*",
"wt": "json",
"_": "1474529614808"
}
},
"response": {
"numFound": 2,
"start": 0,
"docs": [
{
"id": "1",
"f_Memory(GB)": [
"4+GB"
],
"pa_RAM(GB)": [
"2+GB",
"4GB",
"8GB"
]
},
{
"id": "2",
"f_Memory(GB)": [
"8+GB"
],
"pa_RAM(GB)": [
"4GB"
]
}
]
}
}
Working faceting:
Request:
http://localhost:8983/solr/test/select?q=*%3A*&fq=%7B!tag%3Df_Memory(GB)%7Df_Memory%5C(GB%5C)%3A4%5C%2BGB&fl=id%2Cf_*%2Cpa_*&wt=json&indent=true&facet=true&facet.field=%7B!ex%3Df_Memory(GB)%7Df_Memory(GB)
Response:
{
"responseHeader": {
"status": 0,
"QTime": 2,
"params": {
"q": "*:*",
"facet.field": "{!ex=f_Memory(GB)}f_Memory(GB)",
"indent": "true",
"fl": "id,f_*,pa_*",
"fq": "{!tag=f_Memory(GB)}f_Memory\\(GB\\):4\\+GB",
"wt": "json",
"facet": "true",
"_": "1474530054207"
}
},
"response": {
"numFound": 1,
"start": 0,
"docs": [
{
"id": "1",
"f_Memory(GB)": [
"4+GB"
],
"pa_RAM(GB)": [
"2+GB",
"4GB",
"8GB"
]
}
]
},
"facet_counts": {
"facet_queries": {},
"facet_fields": {
"f_Memory(GB)": [
"4+GB",
1,
"8+GB",
1
]
},
"facet_dates": {},
"facet_ranges": {},
"facet_intervals": {},
"facet_heatmaps": {}
}
}

Elasticsearch not returning hits for multi-valued field

I am using Elasticsearch with no modifications whatsoever. This means the mappings, norms, and analyzed/not_analyzed is all default config. I have a very small data set of two items for experimentation purposes. The items have several fields but I query only on one, which is a multi-valued/array of strings field. The doc looks like this:
{
"_index": "index_profile",
"_type": "items",
"_id": "ega",
"_version": 1,
"found": true,
"_source": {
"clicked": [
"ega"
],
"profile_topics": [
"Twitter",
"Entertainment",
"ESPN",
"Comedy",
"University of Rhode Island",
"Humor",
"Basketball",
"Sports",
"Movies",
"SnapChat",
"Celebrities",
"Rite Aid",
"Education",
"Television",
"Country Music",
"Seattle",
"Beer",
"Hip Hop",
"Actors",
"David Cameron",
... // other topics
],
"id": "ega"
}
}
A sample query is:
GET /index_profile/items/_search
{
"size": 10,
"query": {
"bool": {
"should": [{
"terms": {
"profile_topics": [
"Basketball"
]
}
}]
}
}
}
Again there are only two items and the one listed should match the query because the profile_topics field matches with the "Basketball" term. The other item does not match. I only get a result if I ask for clicked = ega in the should.
With Solr I would probably specify that the fields are multi-valued string arrays and are to have no norms and no analyzer so profile_topics are not stemmed or tokenized since all values should be treated as tokens (even the spaces). Not sure this would solve the problem but it is how I treat similar data on Solr.
I assume I have run afoul of some norm/analyzer/TF-IDF issue, if so how do I solve this so that even with two items the query will return ega. If possible I'd like to solve this index or type wide rather than field specific.

Basketball (with capital B) in terms will not be analyzed. This means this is the way it will be searched in the Elasticsearch index.
You say you have the defaults. If so, indexing Basketball under profile_topics field means that the actual term in the index will be basketball (with lowercase b) which is the result of the standard analyzer. So, either you set profile_topics as not_analyzed or you search for basketball and not Basketball.
Read this about terms.
Regarding to setting all the fields to not_analyzed you could do that with a dynamic template. Still with a template you can do what Logstash is doing: defining a .raw subfield for each string field and only this subfield is not_analyzed. The original/parent field still holds the analyzed version of the same text, maybe you will use in the future the analyzed field.
Take a look at this dynamic template. It's the one Logstash is using.
More specifically:
{
"template": "your_indices_name-*",
"mappings": {
"_default_": {
"_all": {
"enabled": true,
"omit_norms": true
},
"dynamic_templates": [
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"omit_norms": true,
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
]
}
}
}

Multiple search filtering is not working in cloudant, why?

Here i quoted my code for multiple search filtering. I could not find the mistakes in that. please give a right code to make it work well.
Employee document:
{
"_id": "527c8d9327c6f27f17df0d2e17000530",
"_rev": "24-276a8dc913559901897fd601d2f9654f",
"proj_role": "TeamMember",
"work_total_experience": "3",
"personal": {
"languages_known": [
"English","Telugu"
]},
"skills": [
{
"skill_set": "Webservices Framework",
"skill_exp": 1,
"skill_certified": "yes",
"skill_rating": 3,
},
{
"skill_set": "Microsoft",
"skill_exp": 1,
"skill_certified": "yes",
"skill_rating": 3,
}
]
"framework_competency": "Nasscom",
"type": "employee-docs"
}
Design Document:
{
"_id": "_design/sample",
"_rev": "86-1250f792e6e84f6f33447a00cf64d61d",
"views": {},
"language": "javascript",
"indexes": {
"search": {
"index": "function(doc){\n index(\"default\", doc._id);if(doc.type=='employee-docs'){\nif (doc.proj_role){index(\"project_role\", doc.proj_role);}if(doc.work_total_experience){\nindex(\"work_experience\", doc.work_total_experience);}\nif(doc.personal.languages_known){for(c in doc.personal.languages_known){ \n index(\"languages_known\",doc.personal.languages_known[c]);}} if(doc.skills){for (var i=0;i<doc.skills.length;i++){\nindex('skill_set',doc.skills[i].skill_set);}}}}"
}
}
}
Run using below URL : https://ideyeah4.cloudant.com/opteamize_new/_design/sample/_search/search?q=project_role:TeamMember%20AND%20work_experience:%223%22%20AND%20languages_known:Telugu%20AND%20skill_set:Microsoft&include_docs=true

A simple way to debug this is to query the top 100 results in your index:
https://ideyeah4.cloudant.com/opteamize_new/_design/sample/_search/search?q=*:*&limit=100
This will at least tell you whether there are any documents in your index at all.
Your current query (without URL encoding) looks like:
project_role:TeamMember AND work_experience:"3" AND languages_known:Telugu AND skill_set:Microsoft
I'd suggest that some of these search values require quotes - always true when you are searching string values. Next, you could try:
project_role:"TeamMember"
see if you get any results and refine from there.
Debugging this might also be easier if you store the values as well as index them (so you can see exactly what is indexed). To do this, add an object to each index call { "store": true }. For example,
index("languages_known", doc.personal.languages_known[c], { "store": true });
Now, when you query the index it will return a list of fields which were stored with each match.