Elasticsearch - value in array filter - arrays

I want to filter out all documents which contain a specific value in an array field. I.e. the value is an element of that array field.
To be specific - I want to select all documents which names contains test-name, see the example below.
So when I do an empty search with
curl -XGET localhost:9200/test-index/_search
the result is
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 50,
"max_score": 1,
"hits": [
{
"_index": "test-index",
"_type": "test",
"_id": "34873ae4-f394-42ec-b2fc-41736e053c69",
"_score": 1,
"_source": {
"names": [
"test-name"
],
"age": 100,
...
}
},
...
}
}
But in case of a more specific query
curl -XPOST localhost:9200/test-index/_search -d '{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"term": {
"names": "test-name"
}
}
}
}
}'
I don't get any results
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
There are some questions similar to this one. Although, I cannot get any of the answers to work for me.
System specs: Elasticsearch 5.1.1, Ubuntu 16.04
EDIT
curl -XGET localhost:9200/test-index
...
"names": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
...

That's because the names field is analyzed and test name gets indexed as two tokens test and name.
Searching for the test name term will hence not yield anything. If you use match instead, you'll get the document.
If you want to check for the exact value test name (i.e. the two tokens one after another), then you need to change your names field to a keyword type instead of text
UPDATE
According to your mapping, the names field is analyzed, you need to use the names.keyword field instead and it will work, like this:
curl -XPOST localhost:9200/test-index/_search -d '{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"term": {
"names.keyword": "test-name"
}
}
}
}
}'

Related

How to Update Array dict Elements in mongodb based on another field

How can I update a value in a document based on applying functions to another field (which is in a different embedded document)?
With the sample data below, I want to
get the col field for the farm having id 12
multiply that by 0.025
add the current value of the statistic.crypt field
ensure the value is a double by converting it with $toDouble
store the result back into statistic.crypt
data:
{
"_id": {
"$oid": "6128c238c144326c57444227"
},
"statistic": {
"balance": 112570,
"diamond": 14,
"exp": 862.5,
"lvl": 76,
"mn_exp": 2.5,
"lvl_mn_exp": 15,
"coll_ms": 8047,
"all_exp": 67057.8,
"rating": 0,
"crypt": 0
},
"inventory": {
"farm": [{
"id": 12,
"col": 100,
"currency": "diamond",
"cost": 2,
"date": "2021-09-02 18:58:39"
}, {
"id": 14,
"col": 1,
"currency": "diamond",
"cost": 2,
"date": "2021-09-02 16:57:08"
}],
"items": []
},
...
}
My initial attempt is:
self.collection
.update_many({"inventory.farm.id": 12}, [{
"$set": {
"test": {
'$toDouble': {
"$sum": [
{'$multiply':["$inventory.farm.$[].col", 0.025]},
'$test'
]
}
} }
},])
This does not work as it applies to test rather than statistic.crypt, and I cannot figure out how to modify it to apply to statistic.crypt.
A field can be updated based on another in the following stages:
add a field containing the farm
set statistic.crypt to the result of the mathematical expression (applied to the newly embedded farm)
remove extra fields
In code:
self.collection.update_many({"inventory.farm.id": 12 }, [
{
$addFields: {
hh: {
$filter: {
input: "$inventory.farm",
as: "z",
cond: { $eq: ["$$z.id", 12] },
},
},
},
},
{
$set: {
"statistic.crypt": {
$toDouble: {
$sum: [
{
$multiply: [{ $first: "$hh.col" }, 0.025],
},
"statistic.crypt",
],
},
},
},
},
{
$project: {
id_pr: 1,
id_server: 1,
role: 1,
warns: 1,
id_clan: 1,
statistic: 1,
design: 1,
date: 1,
inventory: 1,
voice: 1,
},
},)

Array within Element within Array in Variant

How can I get the data out of this array stored in a variant column in Snowflake. I don't care if it's a new table, a view or a query. There is a second column of type varchar(256) that contains a unique ID.
If you can just help me read the "confirmed" data and the "editorIds" data I can probably take it from there. Many thanks!
Output example would be
UniqueID ConfirmationID EditorID
u3kd9 xxxx-436a-a2d7 nupd
u3kd9 xxxx-436a-a2d7 9l34c
R3nDo xxxx-436a-a3e4 5rnj
yP48a xxxx-436a-a477 jTpz8
yP48a xxxx-436a-a477 nupd
[
{
"confirmed": {
"Confirmation": "Entry ID=xxxx-436a-a2d7-3525158332f0: Confirmed order submitted.",
"ConfirmationID": "xxxx-436a-a2d7-3525158332f0",
"ConfirmedOrders": 1,
"Received": "8/29/2019 4:31:11 PM Central Time"
},
"editorIds": [
"xxsJYgWDENLoX",
"JR9bWcGwbaymm3a8v",
"JxncJrdpeFJeWsTbT"
] ,
"id": "xxxxx5AvGgeSHy8Ms6Ytyc-1",
"messages": [],
"orderJson": {
"EntryID": "xxxxx5AvGgeSHy8Ms6Ytyc-1",
"Orders": [
{
"DropShipFlag": 1,
"FromAddressValue": 1,
"OrderAttributes": [
{
"AttributeUID": 548
},
{
"AttributeUID": 553
},
{
"AttributeUID": 2418
}
],
"OrderItems": [
{
"EditorId": "aC3f5HsJYgWDENLoX",
"ItemAssets": [
{
"AssetPath": "https://xxxx573043eac521.png",
"DP2NodeID": "10000",
"ImageHash": "000000000000000FFFFFFFFFFFFFFFFF",
"ImageRotation": 0,
"OffsetX": 50,
"OffsetY": 50,
"PrintedFileName": "aC3f5HsJYgWDENLoX-10000",
"X": 50,
"Y": 52.03909266409266,
"ZoomX": 100,
"ZoomY": 93.75
}
],
"ItemAttributes": [
{
"AttributeUID": 2105
},
{
"AttributeUID": 125
}
],
"ItemBookAttribute": null,
"ProductUID": 52,
"Quantity": 1
}
],
"SendNotificationEmailToAccount": true,
"SequenceNumber": 1,
"ShipToAddress": {
"Addr1": "Addr1",
"Addr2": "0",
"City": "City",
"Country": "US",
"Name": "Name",
"State": "ST",
"Zip": "00000"
}
}
]
},
"orderNumber": null,
"status": "order_placed",
"submitted": {
"Account": "350000",
"ConfirmationID": "xxxxx-436a-a2d7-3525158332f0",
"EntryID": "xxxxx-5AvGgeSHy8Ms6Ytyc-1",
"Key": "D83590AFF0CC0000B54B",
"NumberOfOrders": 1,
"Orders": [
{
"LineItems": [],
"Note": "",
"Products": [
{
"Price": "00.30",
"ProductDescription": "xxxxxint 8x10",
"Quantity": 1
},
{
"Price": "00.40",
"ProductDescription": "xxxxxut Black 8x10",
"Quantity": 1
},
{
"Price": "00.50",
"ProductDescription": "xxxxx"
},
{
"Price": "00.50",
"ProductDescription": "xxxscount",
"Quantity": 1
}
],
"SequenceNumber": "1",
"SubTotal": "00.70",
"Tax": "1.01",
"Total": "00.71"
}
],
"Received": "8/29/2019 4:31:10 PM Central Time"
},
"tracking": null,
"updatedOn": 1.598736670503000e+12
}
]
So, this is how I'd query that exact JSON assuming the data is in column var in table x:
SELECT x.var[0]:confirmed:ConfirmationID::varchar as ConfirmationID,
f.value::varchar as EditorID
FROM x,
LATERAL FLATTEN(input => var[0]:editorIds) f
;
Since your sample output doesn't match the JSON that you provided, I will assume that this is what you need.
Also, as a note, your JSON includes outer [ ] which indicates that the entire JSON string is inside an array. This is the reason for var[0] in my query. If you have multiple records inside that array, then you should remove that. In general, you should exclude those and instead load each record into the table separately. I wasn't sure whether you could make that change, so I just wanted to make note.

Get object hit when searching array in Elasticsearch

I am trying to get an object out of a JSON array that is stored in elasticsearch. The layout is like this:
[
object{}
object{}
object{}
]
What I need for when I do a search and it hits on one of these objects, to get the specific object it matches to. Currently, using the java API I am searching with:
QueryBuilder qb = QueryBuilders.boolQuery()
.should(QueryBuilders.matchQuery("text", "pottery").boost(5)
.minimumShouldMatch("1"));
SearchResponse response = client.prepareSearch("stuff")
.setTypes("things")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(qb)
.setPostFilter(filter)//.setHighlighterQuery(qb)
.addField("places.numbers")
.addField("name")
.addField("city")
.setFrom(0).setSize(60).setExplain(true)
.execute()
.actionGet();
But this will just return the whole object that I hit or when I tell it to return the field "places.numbers" it will only return the first object in the "palces" array, not the one that was matched in the query.
Thank you for any help!
There are a couple of ways to handle this. I would probably do it with a nested type and inner hits, given what you've shown in your question, but it could also probably be done with the parent/child relationship.
Here is an example with nested docs. I set up a simple index like this:
PUT /test_index
{
"mappings": {
"parent_doc": {
"properties": {
"parent_name": {
"type": "string"
},
"nested_docs": {
"type": "nested",
"properties": {
"nested_name": {
"type": "string"
}
}
}
}
}
}
}
Then added a couple of simple documents:
POST /test_index/parent_doc/_bulk
{"index":{"_id":1}}
{"parent_name":"p1","nested_docs":[{"nested_name":"n1"},{"nested_name":"n2"}]}
{"index":{"_id":2}}
{"parent_name":"p2","nested_docs":[{"nested_name":"n3"},{"nested_name":"n4"}]}
And now I can search like this, using "inner_hits":
POST /test_index/_search
{
"query": {
"nested": {
"path": "nested_docs",
"query": {
"match": {
"nested_docs.nested_name": "n3"
}
},
"inner_hits" : {}
}
}
}
which returns:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 2.098612,
"hits": [
{
"_index": "test_index",
"_type": "parent_doc",
"_id": "2",
"_score": 2.098612,
"_source": {
"parent_name": "p2",
"nested_docs": [
{
"nested_name": "n3"
},
{
"nested_name": "n4"
}
]
},
"inner_hits": {
"nested_docs": {
"hits": {
"total": 1,
"max_score": 2.098612,
"hits": [
{
"_index": "test_index",
"_type": "parent_doc",
"_id": "2",
"_nested": {
"field": "nested_docs",
"offset": 0
},
"_score": 2.098612,
"_source": {
"nested_name": "n3"
}
}
]
}
}
}
}
]
}
}
Here's the code I used to test it:
http://sense.qbox.io/gist/ef7debf436fec2a10097ba2106d5ff30ff8d7c77

Unique Filter to Elastic Search Column not working (duplicate items inserted)

I've modified my contactNumber field to have a unique filter
by updating the index settings as follows
curl -XPUT localhost:9200/test-index2/_settings -d '
{
"index":{
"analysis":{
"analyzer":{
"unique_keyword_analyzer":{
"only_on_same_position":"true",
"filter":"unique"
}
}
}
},
"mappings":{
"business":{
"properties":{
"contactNumber":{
"analyzer":"unique_keyword_analyzer",
"type":"string"
}
}
}
}
}'
A sample Item looks like this,
doc_type:"Business"
contactNumber:"(+12)415-3499"
name:"Sam's Pizza"
address:"Somewhere on earth"
The Filter does not work, as duplicate items are inserted, I'd like NO two documents having the same contactNumber
in the above, I've also set only_on_same_position -> true so that existing duplicate values would be truncated/deleted
What am i doing wrong in the settings?
That's something Elasticsearch couldn't help you out of the box... you need to make this uniqueness functionality available in your app. The only idea that I can think of is to have the phone number as the _id of the document itself and whenever you insert/update something ES will use the contactNumber as _id and it will associate that document with the one that already exists or create a new one.
For example:
PUT /test-index2
{
"mappings": {
"business": {
"_id": {
"path": "contactNumber"
},
"properties": {
"contactNumber": {
"type": "string",
"analyzer": "keyword"
},
"address": {
"type": "string"
}
}
}
}
}
Then you index something:
POST /test-index2/business
{
"contactNumber": "(+12)415-3499",
"address": "whatever 123"
}
Getting it back:
GET /test-index2/business/_search
{
"query": {
"match_all": {}
}
}
It looks like this:
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test-index2",
"_type": "business",
"_id": "(+12)415-3499",
"_score": 1,
"_source": {
"contactNumber": "(+12)415-3499",
"address": "whatever 123"
}
}
]
}
You see there that the _id of the document is the phone number itself. If you want to change or insert another document (the address is different, there is a new field - whatever_field - but the contactNumber is the same):
POST /test-index2/business
{
"contactNumber": "(+12)415-3499",
"address": "whatever 123 456",
"whatever_field": "whatever value"
}
Elasticserach "updates" the existing document and responds back with:
{
"_index": "test-index2",
"_type": "business",
"_id": "(+12)415-3499",
"_version": 2,
"created": false
}
created is false, this means the document has been updated, not created. _version is 2 which again says that the document has been updated. And the _id is the phone number itself which indicate this is the document that has been updated.
Looking again in the index, ES stores this:
"hits": [
{
"_index": "test-index2",
"_type": "business",
"_id": "(+12)415-3499",
"_score": 1,
"_source": {
"contactNumber": "(+12)415-3499",
"address": "whatever 123 456",
"whatever_field": "whatever value"
}
}
]
So, the new field is there, the address has changed, the contactNumber and _id are exactly the same.

Elasticsearch arrays query/filter

I'm looking at Elasticsearch for the first time and spent around a day looking at it. We already use Lucene extensively and want to start using ES instead. I'm looking at alternative data structures to what we currently have.
If I run *match_all* query this is what I get at the moment. I am happy with this structure.
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 22,
"max_score": 1,
"hits": [
{
"_index": "integration-test-static",
"_type": "sport",
"_id": "4d38e07b-f3d3-4af2-9221-60450b18264a",
"_score": 1,
"_source": {
"Descriptions": [
{
"FeedSource": "dde58b3b-145b-4864-9f7c-43c64c2fe815",
"Value": "Football"
},
{
"FeedSource": "e4b9ad44-00d7-4216-adf5-3a37eafc4c93",
"Value": "Football"
}
],
"Synonyms": [
"Football"
]
}
}
]
}
}
What I can't figure out is how a query is written to pull back this document by searching for the synonym "Football". Looks like it should be easy!
I got this approach after reading this: http://gibrown.wordpress.com/2013/01/24/elasticsearch-five-things-i-was-doing-wrong/
He mentions storing multiple fields in arrays. I realise my example does not have multiple fields, but we will certainly be looking for a solution which can cater for them.
Tried various different queries with filters, bool things, term this and terms that, none return.
What does your search and mappings look like?
If you let Elasticsearch generate the mapping, it'll use the standard analyzer which lowercases the text (and removes stopwords).
So Football will actually be indexed as football. The term-family of queries/filters do not do text analysis, so term:Football will be looking for Football, which is not indexed. The match-family of queries do.
This is a very common problem, and is covered quite extensively in my article on Troubleshooting Elasticsearch searches, for Beginners, which can be worth skimming through. Text analysis is a very important part of working with search, so there's some more articles about it as well.
A simple match query would work in this scenario.
POST integration-test-static/_search
{
"query": {
"match": {
"Synonyms": "Football"
}
}
}
Which returns:
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.30685282,
"hits": [
{
"_index": "integration-test-static",
"_type": "sport",
"_id": "4d38e07b-f3d3-4af2-9221-60450b18264a",
"_score": 0.30685282,
"_source": {
"Descriptions": [
{
"FeedSource": "dde58b3b-145b-4864-9f7c-43c64c2fe815",
"Value": "Football"
},
{
"FeedSource": "e4b9ad44-00d7-4216-adf5-3a37eafc4c93",
"Value": "Football"
}
],
"Synonyms": [
"Football"
]
}
}
]
}
}

Resources