Apache solr - number of results found for the field - solr

Any idea for count phrase frequecy , like function termfrequecy with sort asc,desc - and number found for the field ?
Data:
ID / name / surname
/ Test / Test Info
/ Test Info / test
Query:
http://localhostsolr/select?q=name:test info OR surname:test info
I would have expected an answer:
{
"responseHeader": {
"zkConnected": true,
"status": 0,
"QTime": 0,
"params": {
"q": "*.*",
"fl": "id"
}
},
"response": {
"numFound": 2,
"start": 0,
"docs": [
{
"id": "1",
"counterInName": 1,
"counterInSubname": 0,
},
{
"id": "2",
"counterInName": 0,
"counterInSubname": 1,
}
]
}
}

Related

No results being displayed if the character length is more than is 15 letters in Solr Search

While working with Solr Search encountered one issue.
When a search query is single syllable/word greater than 15 characters we are getting no results.
But if the same search query is shortened to less than 15 characters then we are getting results.
How can we increase the character limit to get search result in both the cases?
Case 1: Greater than 15 Characters
curl -XGET 'http://localhost:8983/solr/techproducts/query?debug=query&q=katanasmartwatch'
Result:
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "katanasmartwatch",
"debug": "query"
}
},
"response": {
"numFound": 0,
"start": 0,
"docs": []
}
}
Case 2: Less than 15 Characters
curl -XGET 'http://localhost:8983/solr/techproducts/query?debug=query&q=katanasmartwatc'
Result
{
"responseHeader": {
"status": 0,
"QTime": 0,
"params": {
"q": "katanasmartwatc",
"debug": "query"
}
},
"response": {
"numFound": 1,
"start": 0,
"docs": [{...}]
}
}

How to Update Array dict Elements in mongodb based on another field

How can I update a value in a document based on applying functions to another field (which is in a different embedded document)?
With the sample data below, I want to
get the col field for the farm having id 12
multiply that by 0.025
add the current value of the statistic.crypt field
ensure the value is a double by converting it with $toDouble
store the result back into statistic.crypt
data:
{
"_id": {
"$oid": "6128c238c144326c57444227"
},
"statistic": {
"balance": 112570,
"diamond": 14,
"exp": 862.5,
"lvl": 76,
"mn_exp": 2.5,
"lvl_mn_exp": 15,
"coll_ms": 8047,
"all_exp": 67057.8,
"rating": 0,
"crypt": 0
},
"inventory": {
"farm": [{
"id": 12,
"col": 100,
"currency": "diamond",
"cost": 2,
"date": "2021-09-02 18:58:39"
}, {
"id": 14,
"col": 1,
"currency": "diamond",
"cost": 2,
"date": "2021-09-02 16:57:08"
}],
"items": []
},
...
}
My initial attempt is:
self.collection
.update_many({"inventory.farm.id": 12}, [{
"$set": {
"test": {
'$toDouble': {
"$sum": [
{'$multiply':["$inventory.farm.$[].col", 0.025]},
'$test'
]
}
} }
},])
This does not work as it applies to test rather than statistic.crypt, and I cannot figure out how to modify it to apply to statistic.crypt.
A field can be updated based on another in the following stages:
add a field containing the farm
set statistic.crypt to the result of the mathematical expression (applied to the newly embedded farm)
remove extra fields
In code:
self.collection.update_many({"inventory.farm.id": 12 }, [
{
$addFields: {
hh: {
$filter: {
input: "$inventory.farm",
as: "z",
cond: { $eq: ["$$z.id", 12] },
},
},
},
},
{
$set: {
"statistic.crypt": {
$toDouble: {
$sum: [
{
$multiply: [{ $first: "$hh.col" }, 0.025],
},
"statistic.crypt",
],
},
},
},
},
{
$project: {
id_pr: 1,
id_server: 1,
role: 1,
warns: 1,
id_clan: 1,
statistic: 1,
design: 1,
date: 1,
inventory: 1,
voice: 1,
},
},)

How do I get french text FEMMES.COM to index as language variants of FEMMES

I need FEMMES.COM to get tokenized as singular + plural forms of the base word FEMME.
Custom Analyzer Config
"analyzers": [ { "#odata.type": "#Microsoft.Azure.Search.CustomAnalyzer", "name": "text_language_search_custom_analyzer", "tokenizer": "text_language_search_custom_analyzer_ms_tokenizer", "tokenFilters": [ "lowercase", "asciifolding" ], "charFilters": [ "html_strip" ] } ], "tokenizers": [ { "#odata.type": "#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer", "name": "text_language_search_custom_analyzer_ms_tokenizer", "maxTokenLength": 300, "isSearchTokenizer": false, "language": "english" } ], "tokenFilters": [], "charFilters": []}
Analyze API call for FEMMES
{ "analyzer": "text_language_search_custom_analyzer", "text": "FEMMES" }
Analyze API response for FEMMES
{ "#odata.context": "https://one-adscope-search-eu-stage.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", "tokens": [ { "token": "femme", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "femmes", "startOffset": 0, "endOffset": 6, "position": 0 } ] }
Analyze API response for FEMMES.COM
{ "#odata.context": "https://one-adscope-search-eu-stage.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", "tokens": [ { "token": "femmes", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "com", "startOffset": 7, "endOffset": 10, "position": 1 } ] }
Analyze API response for FEMMES COM
{ "#odata.context": "https://one-adscope-search-eu-stage.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", "tokens": [ { "token": "femme", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "femmes", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "com", "startOffset": 7, "endOffset": 10, "position": 1 } ]}
I think I figured this one out myself after some experimentation. I found the MappingCharFilter could be used to replace . with , before the indexer did the tokenization. This allowed the lemmatization/stemming to work as expected on the terms in question. I need to do more thorough integration tests with our other use cases, but I think this would solve the problem for anybody facing the same type of issue.
My previous answer was not correct. Azure Search implementation actually applies the language tokenizer BEFORE token filters. This essentially made the WordDelimiterToken filter useless in my use case.
What I ended up having to do was to pre-process data BEFORE I uploaded to Azure for indexing. In my C# code, I added some regex logic that would break apart text like FEMMES2017 into FEMMES 2017, before I sent it to Azure. This way, when the text got to Azure, the indexer would see FEMMES by itself and properly tokenize as FEMME and FEMMES using the language tokenizer.

Elasticsearch - value in array filter

I want to filter out all documents which contain a specific value in an array field. I.e. the value is an element of that array field.
To be specific - I want to select all documents which names contains test-name, see the example below.
So when I do an empty search with
curl -XGET localhost:9200/test-index/_search
the result is
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 50,
"max_score": 1,
"hits": [
{
"_index": "test-index",
"_type": "test",
"_id": "34873ae4-f394-42ec-b2fc-41736e053c69",
"_score": 1,
"_source": {
"names": [
"test-name"
],
"age": 100,
...
}
},
...
}
}
But in case of a more specific query
curl -XPOST localhost:9200/test-index/_search -d '{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"term": {
"names": "test-name"
}
}
}
}
}'
I don't get any results
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
There are some questions similar to this one. Although, I cannot get any of the answers to work for me.
System specs: Elasticsearch 5.1.1, Ubuntu 16.04
EDIT
curl -XGET localhost:9200/test-index
...
"names": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
...
That's because the names field is analyzed and test name gets indexed as two tokens test and name.
Searching for the test name term will hence not yield anything. If you use match instead, you'll get the document.
If you want to check for the exact value test name (i.e. the two tokens one after another), then you need to change your names field to a keyword type instead of text
UPDATE
According to your mapping, the names field is analyzed, you need to use the names.keyword field instead and it will work, like this:
curl -XPOST localhost:9200/test-index/_search -d '{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"term": {
"names.keyword": "test-name"
}
}
}
}
}'

solr query returning doclist of length 1, despite numfound being greater than 1

When querying solr with a group-by field, I a response with "num_found" greater than 1, yet the "docs" attribute only shows 1 record.
The query is something like:
http://.../solr/.../select?q=*%3A*&fq=...&wt=json&indent=true&group=true&group.field=GroupingField_s&group.ngroups=true
The results are something like:
"grouped": {
"GroupingField_s": {
"matches": 3130,
"ngroups": 283,
"groups": [
{
"groupValue": "1111",
"doclist": {
"numFound": 7,
"start": 0,
"docs": [ {/*only 1 record shown here*/} ]
},
{
"groupValue": "222",
"doclist": {
"numFound": 5,
"start": 0,
"docs": [ {/*only 1 record shown here*/} ]
}, ....
]
}
You'll have to set the group.limit parameter. This defaults to 1.
group.limit integer Specifies the number of results to return for each group. The default value is 1.
See Result Grouping.

Resources