Finding a document with a specific term on an array - arrays

I'm trying to find a document containing a specific term in an array of strings.
I have a schema like this:
{
"pages": {
"mappings":{
"site":{
"properties":{
"urls":{"type":"string"}
}
}
}
}
}
And the following data indexed on it:
% curl -XPOST 'http://local.dev:9200/pages/site/_search?pretty
{
...
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "pages",
"_type" : "site",
"_id" : "ae634fea-878f-42ca-8239-c67cca007a38",
"_score" : 1.0,
"_source":{ "urls":["https://github.com/fulano","http://fulano.com"] }
}
}
I'm trying to search for sites whose urls array contains a specific url, but I can't make it work. I tried using terms - exactly as described here but I never get any results:
% curl -XPOST 'http://local.dev:9200/pages/site/_search?pretty' -d '
{
"query": {
"filtered": {
"filter": {
"term": { "urls": "https://github.com/fulano" }
}
}
}
}'
{
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Using terms (that gets expanded into a series of bool operations by elastic):
% curl -XPOST 'http://local.dev:9200/pages/site/_search?pretty' -d '
{
"query": {
"terms" : {
"urls" : ["https://github.com/fulano"]
}
}
}'
{
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
I'm guessing this is something really silly, but I can't spot the problem. :(

This is the problem with the analyzer you are using. You need to use not_analyzed or keyword tokenizer as outlined here.

Related

elasticsearch how to use exact search and ignore the keyword special characters in keywords?

i had some id value (numeric and text combination) in my elasticsearch index, and in my program user might will input some special characters in search keyword.
and i want to know is there anyway that can let elasticsearch to use exact search and also can remove some special characters in search keywork
i already use custom analyzer to split search keyword by some special characters. and use query->match to search data, and i still got no results
data
{
"_index": "testdata",
"_type": "_doc",
"_id": "11112222",
"_source": {
"testid": "1MK444750"
}
}
custom analyzer
"analysis" : {
"analyzer" : {
"testidanalyzer" : {
"pattern" : """([^\w\d]+|_)""",
"type" : "pattern"
}
}
}
mapping
{
"article" : {
"mappings" : {
"_doc" : {
"properties" : {
"testid" : {
"type" : "text",
"analyzer" : "testidanalyzer"
}
}
}
}
}
}
here's my elasticsearch query
GET /testdata/_search
{
"query": {
"match": {
// "testid": "1MK_444-750" // no result
"testid": "1MK444750"
}
}
}
and analyzer successfully seprated separated my keyword, but i just can't match anything in result
POST /testdata/_analyze
{
"analyzer": "testidanalyzer",
"text": "1MK_444-750"
}
{
"tokens" : [
{
"token" : "1mk",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 0
},
{
"token" : "444",
"start_offset" : 4,
"end_offset" : 7,
"type" : "word",
"position" : 1
},
{
"token" : "750",
"start_offset" : 8,
"end_offset" : 11,
"type" : "word",
"position" : 2
}
]
}
please help, thanks in advance!
First off, you should probably model the testid field as keyword rather than text, it's a more appropriate data type.
You want to put in a feature whereby some characters (_, -) are effectively ignored at search time. You can achieve this by giving your field a normalizer, which tells Elasticsearch how to preprocess data for this field prior to indexing or searching. Specifically, you can declare a mapping char filter in your normalizer that replaces these characters with an empty string.
This is how all these changes would fit into your mapping:
PUT /testdata
{
"settings": {
"analysis": {
"char_filter": {
"mycharfilter": {
"type": "mapping",
"mappings": [
"_ => ",
"- => "
]
}
},
"normalizer": {
"mynormalizer": {
"type": "custom",
"char_filter": [
"mycharfilter"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"testid" : {
"type" : "keyword",
"normalizer" : "mynormalizer"
}
}
}
}
}
The following searches would then produce the same results:
GET /testdata/_search
{
"query": {
"match": {
"testid": "1MK444750"
}
}
}
GET /testdata/_search
{
"query": {
"match": {
"testid": "1MK_444-750"
}
}
}

Return documents with an array field that contains ALL elements from a user array in Elasticsearch 6.x

All my documents have a field, tags of type Array. I want to search and return all the documents that have an intersection of tags with a user-input array. The number of elements is variable, not a fixed size.
Examples:
tags:["python", "flask", "gunicorn"]
input:["python"]
This would return true because all the elements in input is in tags.
tags:["nginx", "pm2"]
input:["nodejs", "nginx", "pm2", "microservice"]
This would return false because "nodejs" and "microservice" is not in tags.
I looked into terms query but I do not think it works for arrays.
I also found this, Elasticsearch array property must contain given array items, but the solution is for old versions of Elasticsearch and the syntax has changed.
I believe you're looking for a terms_set - reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-terms-set-query.html
PUT tags
POST tags/_doc
{
"tags": ["python", "flask", "gunicorn"]
}
POST tags/_doc
{
"tags": ["nginx", "pm2"]
}
GET tags/_search
{
"query": {
"terms_set": {
"tags": {
"terms": ["nginx", "pm2"],
"minimum_should_match_script": {
"source": "params.num_terms"
}
}
}
}
}
Returned:
"hits" : {
"total" : 1,
"max_score" : 0.5753642,
"hits" : [
{
"_index" : "tags",
"_type" : "_doc",
"_id" : "XZqN_mkB94Kxh8PwtQs_",
"_score" : 0.5753642,
"_source" : {
"tags" : [
"nginx",
"pm2"
]
}
}
]
}
Querying the full list in your example:
GET tags/_search
{
"query": {
"terms_set": {
"tags": {
"terms": ["nodejs", "nginx", "pm2", "microservice"],
"minimum_should_match_script": {
"source": "params.num_terms"
}
}
}
}
}
Yields no results, as expected:
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}

Elasticsearch aggregation using top_hits field with script ordering

I have a set of documents with src, txt and flt fields. I want to query by txt field in the following way:
Group (bucketize) by src;
In each bucket calculate top 1 most relevant document;
Order each bucket by the _score * doc.flt value.
So far I have implemented 1 and 2, but not 3. Even if 3 may be not very efficient, I still want to have such an option. My query looks like:
{
"query" : {
'match' : {
'text' : {
'query' : <some text>,
'fuzziness' : 'AUTO',
'operator' : 'and'
}
}
},
"aggs": {
"by_src": {
"terms": {
"field": "src",
"size" : 10,
"order" : {"top_score" : "desc"}
},
"aggs": {
"top_hits" : {
"top_hits" : {
"sort": { "_score": { "order": "desc" } },
"size" : 1
}
},
"top_score": {
"max" : {
"script" : "_score",
}
}
}
}
}
}
I believe it's failing because you don't need to use _source field to apply the sort to each bucket, just apply the sort by the field name:
{
"query" : {
'match' : {
'text' : {
'query' : <some text>,
'fuzziness' : 'AUTO',
'operator' : 'and'
}
}
},
"aggs": {
"by_src": {
"terms": {
"field": "src",
"size" : 10,
"order" : {"top_score" : "desc"}
},
"aggs": {
"top_hits" : {
"top_hits" : {
"sort":[{
"flt": {"order": "desc"}
}],
"size" : 1
}
},
"top_score": {
"max" : {
"script" : "_score",
}
}
}
}
}
}
I am assuming your document has a field called flt that you want to use to sort. Naturally you can also change the sorting to asc if it's what you need.

How to use "wildcard" or "regexp" in terms query for nested objects/arrays

I am trying to search the documents using terms filter. I have an array of objects which in turn has a string and an array element. For example:
{
"shop" : {
"name" : "bay avenue store",
"brands": [
{
"name" : "coca-cola",
"items" : ["diet coke", "fanta", "coke-zero"]
},
{
"name" : "pepsi",
"items" : ["extra zero", "mountain dew"]
}
]
}
}
How do I use wildcard inside the "items".
I am trying something like:
{
"query": {
"nested" : {
"path" : "brands",
"query" : {
"match" : {
{"brands.items": ["*zero"]}
}
}
}
}
}
Is this possible?
Please suggest me with a solution.
Never mind, Found the solution after few hits and trails.
Here goes..
"query": {
"nested": {
"path":"brands ",
"query":{
"wildcard":{
"brands.items":{
"value":"*zero*"
}
}
}
}
}

Join elasticsearch indices while matching fields in nested/inner objects

I am trying to join 2 elasticsearch indices by using terms filter lookup. I referred to http://www.elasticsearch.org/blog/terms-filter-lookup/ and http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-terms-filter.html. These Examples lookup on an array of fields like "followers" : ["1", "3"] and join works fine for similar data.
My requirement is to join with a field inside an array of objects. When I extend the above example to include an array of objects, my query fails.
Following is the sample data:
PUT /users/user/2 {
"followers" : [
{
"userId":"1",
"username":"abc",
"location":"xyz"
},
{
"userId":"3",
"username":"def",
"location":"xyz"
}
}
]
}
PUT /tweets/tweet/1 {
"user" : "2"
}
PUT /tweets/tweet/2 {
"user" : "1"
}
I am now trying to find tweets that are created by followers of user 2
POST /tweets/_search {
"query" : {
"filtered" : {
"filter" : {
"terms" : {
"user" : {
"index" : "users",
"type" : "user",
"id" : "2",
"path" : "followers.userId"
},
"_cache_key" : "user_2_friends"
}
}
}
}
}
My search results are 0 for above query. I tried 2 other approaches as well 1)declare the followers object as a nested object during mapping and use "nested" in the query, 2)tried to add a match query for followers.userId after giving path as "followers". None yielded results.
Does terms filter lookup support array of objects? Any pointers to solving my problem would be of great help
What you're trying to do worked for me, unless I'm missing something. What version of Elasticsearch are you using? I'm using 1.3.4.
So I created both indices and added the docs you have listed:
curl -XPUT "http://localhost:9200/users"
curl -XPUT "http://localhost:9200/users/user/2 " -d '
{
"followers" : [
{
"userId":"1",
"username":"abc",
"location":"xyz"
},
{
"userId":"3",
"username":"def",
"location":"xyz"
}
]
}'
curl -XPUT "http://localhost:9200/tweets"
curl -XPUT "http://localhost:9200/tweets/tweet/1 " -d'
{
"user" : "2"
}'
curl -XPUT "http://localhost:9200/tweets/tweet/2 " -d'
{
"user" : "1"
}'
then ran your search query:
curl -XPOST "http://localhost:9200/tweets/_search " -d'
{
"query": {
"filtered": {
"filter": {
"terms": {
"user": {
"index": "users",
"type": "user",
"id": "2",
"path": "followers.userId"
},
"_cache_key": "user_2_friends"
}
}
}
}
}'
and got back this result:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "tweets",
"_type": "tweet",
"_id": "2",
"_score": 1,
"_source": {
"user": "1"
}
}
]
}
}
Here is the code I used:
http://sense.qbox.io/gist/4a2a2d77d0b6f4502ff6c5022b268acfa65ee6d2
Clear the indices if you have any
curl -XDELETE "http://example.com:9200/currencylookup/"
curl -XDELETE "http://example.com:9200/currency/"
Create the lookup table
curl -XPUT http://example.com:9200/currencylookup/type/2 -d '
{ "conv" : [
{ "currency":"usd","username":"abc", "location":"USA" },
{ "currency":"inr", "username":"def", "location":"India" },
{ "currency":"IDR", "username":"def", "location":"Indonesia" }]
}'
Lets put some dummy docs
curl -XPUT "http://example.com:9200/currency/type/USA" -d '{ "amount":"100", "currency":"usd", "location":"USA" }'
curl -XPUT "http://example.com:9200/currency/type/JPY" -d '{ "amount":"50", "currency":"JPY", "location":"JAPAN" }'
curl -XPUT "http://example.com:9200/currency/type/INR" -d '{ "amount":"50", "currency":"inr", "location":"INDIA" }'
curl -XPUT "http://example.com:9200/currency/type/IDR" -d '{ "amount":"30", "currency" : "IDR", "location": "Indonesia" }'
Time to check the output
curl http://example.com:9200/currency/_search?pretty -d '{
"query" : {
"filtered" : {
"filter" : {
"terms" : {
"currency" : {
"index" : "currencylookup",
"type" : "type",
"id" : "2",
"path" : "conv.currency"
},
"_cache_key" : "currencyexchange"
}
}
}
}
}'
Results
# curl http://example.com:9200/currency/_search?pretty -d '{
"query" : {
"filtered" : {
"filter" : {
"terms" : {
"currency" : {
"index" : "currencylookup",
"type" : "type",
"id" : "2",
"path" : "conv.currency"
},
"_cache_key" : "currencyexchange"
}
}
}
}
}'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [ {
"_index" : "currency",
"_type" : "type",
"_id" : "INR",
"_score" : 1.0,
"_source":{ "amount":"50", "currency":"inr", "location":"INDIA" }
}, {
"_index" : "currency",
"_type" : "type",
"_id" : "USA",
"_score" : 1.0,
"_source":{ "amount":"100", "currency":"usd", "location":"USA" }
} ]
}
}
Conclusion
Capital letters are culprit here.
You can see 'IDR' is in caps so the match is failed for it and 'JPY' is not in look up even if it was there it would not have got matched because it is in caps.
cross matching values must be in small letters or numbers like
eg:
abc
1abc

Resources