Solr Field Alias in Solr Search is not working - solr

The following link says we can use field aliases like id,price:crazy_price_field etc. I am trying to use it but this is not working.
http://wiki.apache.org/solr/CommonQueryParameters#Field_alias
https://issues.apache.org/jira/browse/SOLR-1205
My Query:
http://localhost:8080/solr/ee_core/select?indent=on&version=2.2&q=\*%3A\*&fq=%2BinstanceId_index_store%3A217&start=0&rows=10&fl=description_index_store%2Cscore&qt=&wt=json
fl=description_index_store,score gives correct result with field names description_index_store and score
{
"responseHeader": {
"status": 0,"QTime": 1,
"params": {
"explainOther": "","fl": "description_index_store,score",
"indent": "on","start": "0","q": "*:*","hl.fl": "","qt": "",
"wt": "json","fq": "+instanceId_index_store:217","rows": "3",
"version": "2.2"
}
},
"response": {
"numFound": 128,"start": 0,"maxScore": 1,
"docs": [
{
"description_index_store": "Apple MacBook - Intel Core 2 Duo",
"score": 1
},
{
"description_index_store": "Apple MacBook - Intel Core 2 Duo",
"score": 1
},
{
"description_index_store": "HP Envy - 17.3\" - Intel Core i7",
"score": 1
}
]
}
}
but when i try to use alias like fl=description:description_index_store,score in the same query it doesn't return the field.
{
"responseHeader": {
"status": 0,"QTime": 0,
"params": {
"explainOther": "","fl": "description:description_index_store,score",
"indent": "on","start": "0","q": "*:*","hl.fl": "","qt": "",
"wt": "json","fq": "+instanceId_index_store:217","rows": "3",
"version": "2.2"
}
},
"response": {
"numFound": 128,"start": 0,"maxScore": 1,
"docs": [
{
"score": 1
},
{
"score": 1
},
{
"score": 1
}
]
}
}

You're referring to a feature which has been added to the 4.0 version of Solr, not yet released. In fact, within the fl section of that wiki page there's an exclamation mark which tells you that the following content (still within the fl section) is available only with Solr 4.0.
The SOLR-1205 issue has been addressed, together with other improvements, within SOLR-2444: Update fl syntax to support: pseudo fields, AS, transformers, and wildcards, which will be released with Solr 4.0. You might want to have a look at the Solr 4.0 roadmap to find out when it should be released.

Related

I have a datastudio error - basic json config

I've returned to try and make some datastudio custom javascript.
So I started off with a template type settings and basic js. Manifest is listing correctly - datastudio sees the custom item.
I took a long time for it to be authorised.
However, on adding the custome js, the console is reporting a load of erros.
first : data.0.type is not a valid config
second : data.0.elements.data.0.type is not a valid config.
Json:
{
"data": [
{
"id": "idtestviz",
"label": "Dimension Element Heading",
"type":"DIMENSION"
}
]
,
"style": [
{
"id": "idtestvizstyles",
"label": "Test Styles",
"elements":[
{
"id":"idtestvizfontcolor",
"label":"Font Colour",
"defaultValue":"#FFFF00"
}
]
}
]
}
It did have options in before, same error.
And appears to be the same as in https://developers.google.com/datastudio/visualization/define-config
Also it also is erroring on 'is already used in the config'
and that data.0.elements.style.0.elements.0.type required field that cannot be found
Seems like there are more checks that need to be done.
Is there a validator for json etc. before running, or has something updated on google side that their documentation hasn't been updated yet?
Or the more likely aspect, I'm missing some critical stuff...
Regards
Vince
Re checked my json config with a previous one that works, noted some errors in the objects. Corrected those and the json errors in the console have gone away.
JS errors remain - working on those... closing this question.
{
"data": [
{
"id":"test_viz_data",
"label":"Test Viz Data",
"elements":[
{
"id": "text_viz_dimensions",
"label": "Dimension Element Heading",
"type": "DIMENSION",
"options": {
"min": 1,
"max": 1
}
}
,
{
"id": "test_metrics",
"label": "Metric fields",
"type": "METRIC",
"options": {
"min": 1,
"max": 1
}
}
]
}
]
,
"style": [
{
"id": "idstyles",
"label": "Test Styles",
"elements":[
{
"id":"idfontcolor",
"label":"Font Colour",
"type":"FONT_COLOR",
"defaultValue":"#FFFF00"
}
]
}
]
,
"interactions": [
]
}

Showing facet result even though not a single result found for solrQuery

I'm Working with Solr 8.0.0 & i am facing the problem when using the exclude Tag
My solr query looks like below:
http://localhost:8984/solr/HappyDemo202/select?q=*:*&
rows=6&
start=0&
wt=json&
fq={!tag=CATFACET}cat:((desktops))&
fq={!tag=TAGFACET}tag:((cool))&
fq={!tag=Price}Price:[1200 TO 1245]&
json.facet={CatFacet:{type:terms,field:cat,domain:{excludeTags:CATFACET},limit:-1,sort:{count:desc}},TagsFacet:{ type:terms,field:tag,domain:{excludeTags:TAGFACET},limit:-1,sort:{count:desc}}}
Output Of Query looks like below:
{ "responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "*:*",
"json.facet": "{CatFacet:{type:terms,field:cat,domain:{excludeTags:CATFACET},limit:-1,sort:{count:desc}},TagsFacet:{ type:terms,field:tag,domain:{excludeTags:TAGFACET},limit:-1,sort:{count:desc}}}",
"start": "0",
"fq": [
"{!tag=CATFACET}cat:((desktops))",
"{!tag=TAGFACET}tag:((cool))",
"{!tag=Price}Price:[1200 TO 1245]"
],
"rows": "6",
"wt": "json"
}
}, "response": {
"numFound": 0,
"start": 0,
"docs": [] },
"facets": {
"count": 0,
"CatFacet": {
"buckets": []
},
"TagsFacet": {
"buckets": [
{
"val": "new",
"count": 1
},
{
"val": "new1",
"count": 1
}
]
} } }
When you check the Output of Query,CatFacet is not showing any facet result because numFound is 0 but TagsFacet is showing the two facet result like new & new1. I don't know what going wrong , tagFacet must not show the two facet result if numFound is 0.
Can you please suggest,what's going wrong ? Any help will be appreciable.
You're explicitly asking for the fq for tag to be excluded for the tag facet ({excludeTags:TAGFACET}) - meaning that if you didn't have that fq there, there would be results - and you're asking for a count without that field.
If you want the facet to only count those documents being returned, drop the excludeTags value for all those facets that should only be returned for documents that are included in the result set.

Group array doc by sequence: MongoDB groupby or mapreduce?

In mongodb, I have a collection of documents with an array of records that I want to group by similar tag preserving the natural order
{
"day": "2019-01-07",
"records": [
{
"tag": "ch",
"unixTime": ISODate("2019-01-07T09:06:56Z"),
"score": 1
},
{
"tag": "u",
"unixTime": ISODate("2019-01-07T09:07:06Z"),
"score": 0
},
{
"tag": "ou",
"unixTime": ISODate("2019-01-07T09:07:06Z"),
"score": 0
},
{
"tag": "u",
"unixTime": ISODate("2019-01-07T09:07:20Z"),
"score": 0
},
{
"tag": "u",
"unixTime": ISODate("2019-01-07T09:07:37Z"),
"score": 1
}
]
I want to group (and aggregate) the records by similar sequence of tags and NOT simply by grouping unique tags
Desired output:
{
"day": "2019-01-07",
"records": [
{
"tag": "ch",
"unixTime": [ISODate("2019-01-07T09:06:56Z")],
"score": 1
"nbRecords": 1
},
{
"tag": "u",
"unixTime": [ISODate("2019-01-07T09:07:06Z")],
"score": 0,
"nbRecords":1
},
{
"tag": "ou",
"unixTime": [ISODate("2019-01-07T09:07:06Z")],
"score": 0
},
{
"tag": "u",
"unixTime: [ISODate("2019-01-07T09:07:20Z"),ISODate("2019-01-07T09:07:37Z")]
"score": 1
"nbRecords":2
}
]
Groupby
It seems that '$groupby' aggregation operator in mongodb previously sort the array and group by the unique field
db.coll.aggregate(
[
{"$unwind":"$records"},
{"$group":
{
"_id":{
"tag":"$records.tag",
"day":"$day"
},
...
}
}
]
)
Returns
{
"day": "2019-01-07",
"records": [
{
"tag": "ch",
"unixTime": [ISODate("2019-01-07T09:06:56Z")],
"score": 1
"nbRecords": 1
},
{
"tag": "u",
"unixTime": [ISODate("2019-01-07T09:07:06Z"),ISODate("2019-01-07T09:07:20Z"),ISODate("2019-01-07T09:07:37Z")],
"score": 2,
"nbRecords":3
},
{
"tag": "ou",
"unixTime": [ISODate("2019-01-07T09:07:06Z")],
"score": 0
},
]
Map/reduce
As I'm currently using pymongo driver, I implemented the solution back in python
using itertools.groupby that as a generator performs the grouping respecting the natural order but I'm confronted to server timing out problem (cursor.NotFound Error) as an insane time processing.
Any idea of how to use directly the mapreduce function of mongo
to perform the equivalent of the itertools.groupby() in python?
Help would be very appreciated: I'm using pymongo driver 3.8 and MongoDB 4.0
Ni! Run through the array of records adding a new integer index that increments whenever the groupby target changes, then use the mongo operation on that index. .~ยด
With the recommendation of #Ale and without any tips on the way to do that in MongoDb. I switch back to a python implementation solving the cursor.NotFound problem.
I imagine that I could be done inside Mongodb but this is working out
for r in db.coll.find():
session = [
]
for tag, time_score in itertools.groupby(r["records"], key=lambda x:x["tag"]):
time_score = list(time_score)
session.append({
"tag": tag,
"start": time_score[0]["unixTime"],
"end": time_score[-1]["unixTime"],
"ca": sum([n["score"] for n in time_score]),
"nb_records": len(time_score)
})
db.col.update(
{"_id":r["_id"]},
{
"$unset": {"records": ""},
"$set":{"sessions": session}
})

How do I get french text FEMMES.COM to index as language variants of FEMMES

I need FEMMES.COM to get tokenized as singular + plural forms of the base word FEMME.
Custom Analyzer Config
"analyzers": [ { "#odata.type": "#Microsoft.Azure.Search.CustomAnalyzer", "name": "text_language_search_custom_analyzer", "tokenizer": "text_language_search_custom_analyzer_ms_tokenizer", "tokenFilters": [ "lowercase", "asciifolding" ], "charFilters": [ "html_strip" ] } ], "tokenizers": [ { "#odata.type": "#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer", "name": "text_language_search_custom_analyzer_ms_tokenizer", "maxTokenLength": 300, "isSearchTokenizer": false, "language": "english" } ], "tokenFilters": [], "charFilters": []}
Analyze API call for FEMMES
{ "analyzer": "text_language_search_custom_analyzer", "text": "FEMMES" }
Analyze API response for FEMMES
{ "#odata.context": "https://one-adscope-search-eu-stage.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", "tokens": [ { "token": "femme", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "femmes", "startOffset": 0, "endOffset": 6, "position": 0 } ] }
Analyze API response for FEMMES.COM
{ "#odata.context": "https://one-adscope-search-eu-stage.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", "tokens": [ { "token": "femmes", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "com", "startOffset": 7, "endOffset": 10, "position": 1 } ] }
Analyze API response for FEMMES COM
{ "#odata.context": "https://one-adscope-search-eu-stage.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", "tokens": [ { "token": "femme", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "femmes", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "com", "startOffset": 7, "endOffset": 10, "position": 1 } ]}
I think I figured this one out myself after some experimentation. I found the MappingCharFilter could be used to replace . with , before the indexer did the tokenization. This allowed the lemmatization/stemming to work as expected on the terms in question. I need to do more thorough integration tests with our other use cases, but I think this would solve the problem for anybody facing the same type of issue.
My previous answer was not correct. Azure Search implementation actually applies the language tokenizer BEFORE token filters. This essentially made the WordDelimiterToken filter useless in my use case.
What I ended up having to do was to pre-process data BEFORE I uploaded to Azure for indexing. In my C# code, I added some regex logic that would break apart text like FEMMES2017 into FEMMES 2017, before I sent it to Azure. This way, when the text got to Azure, the indexer would see FEMMES by itself and properly tokenize as FEMME and FEMMES using the language tokenizer.

Solr removing the 'e' from ace001 search term

Solr is removing the letter 'e' from search queries...
I'm pretty new when it comes to Solr so I don't really know where to start looking to figure this out but whenever I send a search query Solr is stripping out the 'e' character...
As you can see here when I try and search the term ace001
{
"responseHeader": {
"status": 0,
"QTime": 1,
"params": {
"q": "_text:ace001",
"indent": "true",
"wt": "json",
"debugQuery": "true",
"_": "1478467316690"
}
},
"response": {
"numFound": 0,
"start": 0,
"docs": []
},
"debug": {
"rawquerystring": "_text:ace001",
"querystring": "_text:ace001",
"parsedquery": "PhraseQuery(_text:\"ac 001 ac 001\")",
"parsedquery_toString": "_text:\"ac 001 ac 001\"",
"explain": {},
"QParser": "LuceneQParser",
"timing": {
"time": 1,
"prepare": {
"time": 1,
"query": {
"time": 1
},
"facet": {
"time": 0
},
"mlt": {
"time": 0
},
"highlight": {
"time": 0
},
"stats": {
"time": 0
},
"spellcheck": {
"time": 0
},
"debug": {
"time": 0
}
},
"process": {
"time": 0,
"query": {
"time": 0
},
"facet": {
"time": 0
},
"mlt": {
"time": 0
},
"highlight": {
"time": 0
},
"stats": {
"time": 0
},
"spellcheck": {
"time": 0
},
"debug": {
"time": 0
}
}
}
}
}
Searching a different term such as 'acb001' doesn't strip the 'b' but I noticed it does separate the numbers from the letters. I'd want Solr to match the term 'acb001' in the text field...
extract:
"rawquerystring": "_text:acb001",
"querystring": "_text:acb001",
"parsedquery": "PhraseQuery(_text:\"acb 001 acb 001\")",
"parsedquery_toString": "_text:\"acb 001 acb 001\"",
"explain": {},
"QParser": "LuceneQParser",
Would really appreciate some direction here as to how I can either further debug or ideally fix this so ace001 returns all the occurrences of just that.
Edit:
Schema is standard/default http://pastebin.com/59LbmJUp
this is happening because of solr.PorterStemFilterFactory. your default search field id is htmltext which has
<filter class="solr.PorterStemFilterFactory"/>
in the query analysis.
the PorterStemmer stems the word "ace" to "ac".
you can check it here https://tartarus.org/martin/PorterStemmer/voc.txt
search for the word "ace".
now look here which has corresponding output after stemming https://tartarus.org/martin/PorterStemmer/output.txt the corresponding word after stemming which will be "ac"
to solve this revmoe the filter during query as well as index in solrconfig.xml
Also you are using WordDelimiterFilterFactory, which will split words on alphanumeric bounderies. that is why you see "ac" and "001", if you do not want that then remove that filter too in schema.xml
you are using default schema.xml which has a lot of these unnecessary filters which you might not even need. I would suggest to strip it down to a few filters. and then add filters as you need instead of the other way.

Resources