Have result that match condition first - database

Given a very simple document:
`concert_name` - String containing the name of the concert
`city` - City ID of the concert
`band` - Band ID
`relevance` - An integer that indicate how important the concert is
I want to have all concerts in a specific city but I want first those for a specific band (sorted by relevance) and the all the other sorted by relevance
So I can have query like:
Give me all concerts in Milan and return first those for Pearl Jam
How can I do this in Elastica 1.X ?

EDIT 1
I think this can be done with sorting on multiple fields and using script. You would have to enable dynamic scripting. I am assigning value of 10 to the band you would like to match and others get value of 0. Try something like this
{
"query": {
"match": {
"city": "milan"
}
},
"sort": [
{
"_script": {
"script": "if(doc['band'].value == 'pearl') {10} else {0}",
"type": "number",
"order": "desc"
}
},
{
"relevance": {
"order": "desc"
}
}
]
}
I am assuming higher number means more important concert. I have tested this on ES 1.7
Does this help?

Related

Elasticsearch match query

I'm searching for some text in a field.
but the problem is whenever two documents contain all search tokens, the document which has more search tokens gets more points instead of the document that has less length.
My ElasticSearch index contains some names of foods. and I wanna search for some food in it.
The documents structure are like this
{"text": "NAME OF FOOD"}
Now I have two documents like
1: {"text": "Apple Syrup Apple Apple Syrup Apple Smoczyk's"}
2: {"text": "Apple Apple"}
If I search using this query
{
"query": {
"match": {
"text": {
"query": "Apple"
}
}
}
}
The first document comes first because contains more Apple in it.
which is not my expected result. I will be good that the second document gets more point because has Apple in it and its length is shorter then first one.
Elastic search scoring gives weightage to term frequency , field length. In general shorter fields are scored higher but term frequency can offset it.
You can use unique filter to generate unique tokens for the text. This way multiple occurrence of same token will not effect the scoring.
Mapping
{
"mappings": {
"properties": {
"text": {
"type": "text",
"analyzer": "my_analyzer"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"unique", "lowercase"
]
}
}
}
}
}
Analyze
GET index29/_analyze
{
"text": "Apple Apple",
"analyzer": "my_analyzer"
}
Result
{
"tokens" : [
{
"token" : "apple",
"start_offset" : 0,
"end_offset" : 5,
"type" : "<ALPHANUM>",
"position" : 0
}
]
}
Only single token is generated even though apple appears twice.

MongoDB Array Query - Single out an array element

I am having trouble with querying a MongoDB collection with an array inside.
Here is the structure of my collection that I am querying. This is one record:
{
"_id": "abc123def4567890",
"profile_id": "abc123def4567890",
"image_count": 2,
"images": [
{
"image_id": "ABC123456789",
"image_url": "images/something.jpg",
"geo_loc": "-0.1234,11.234567890",
"title": "A Title",
"shot_time": "01:23:33",
"shot_date": "11/22/2222",
"shot_type": "scenery",
"conditions": "cloudy",
"iso": 16,
"f": 2.4,
"ss": "1/545",
"focal": 6.0,
"equipment": "",
"instructions": "",
"upload_date": 1234567890,
"update_date": 1234567890
},
{
"image_id": "ABC123456789",
"image_url": "images/something.jpg",
"geo_loc": "-0.1234,11.234567890",
"title": "A Title",
"shot_time": "01:23:33",
"shot_date": "11/22/2222",
"shot_type": "portrait",
"conditions": "cloudy",
"iso": "16",
"f": "2.4",
"ss": "1/545",
"focal": "6.0",
"equipment": "",
"instructions": "",
"upload_date": 1234567890,
"update_date": 1234567890
}
]
}
Forgive the formatting, I didn't know how else to show this.
As you can see, it's a profile with a series of images within an array called 'images' and there are 2 images. Each of the 'images' array items contain an object of attributes for the image (url, title, type, etc).
All I want to do is to return the object element whose attributes match certain criteria:
Select object from images which has shot_type = "scenery"
I tried to make it as simple as possible so i started with:
find( { "images.shot_type": "scenery" } )
This returns the entire record and both the images within. So I tried projection but I could not isolate the single object within the array (in this case object at position 0) and return it.
I think the answer lies with projection but I am unsure.
I have gone through the MongoDB documents for hours now and can't find inspiration. I have read about $elemMatch, $, and the other array operators, nothing seems to allow you to single out an array item based on data within. I have been through this page too https://docs.mongodb.com/manual/tutorial/query-arrays/ Still can't work it out.
Can anyone provide help?
Have I made an error by using '$push' to populate my images field (making it an array) instead of using '$set' which would have made it into an embedded document? Would this have made a difference?
Using aggregation:
db.collection.aggregate({
$project: {
_id: 0,
"result": {
$filter: {
input: "$images",
as: "img",
cond: {
$eq: [
"$$img.shot_type",
"scenery"
]
}
}
}
}
})
Playground
You can use $elemMatch in this way (simplified query):
db.collection.find({
"profile_id": "1",
},
{
"images": {
"$elemMatch": {
"shot_type": 1
}
}
})
You can use two objects into find query. The first will filter all document and will only get those whose profile_id is 1. You can omit this stage and use only { } if you wnat to search into the entire collection.
Then, the other object uses $elemMatch to get only the element whose shot_type is 1.
Check an example here

jq - find a value of a non-unique key when another value is known

I am struggling to construct a proper filter for jq to find the value of a specific key when the value of another key is known.
Here is the json file:
[
{
"Header": {
"Tenant": "TenantX",
"Rcode": 200
},
"Body": {
"values": [
{
"id": "aaaa0001-0a0a-0b0b-0a95-6625bef115e5",
"name": "Attribute1"
},
{
"id": "aaaa0001-0a0a-0b0b-9926-f5dc47d312dd",
"name": "Attribute2"
},
{
"id": "aaaa0001-0a0a-0b0b-aea9-6b39641a0695",
"name": "Attribute3"
},
{
"id": "aaaa0001-0a0a-0b0b-a62b-5b26838eeca7",
"name": "Attribute4"
}
]
}
}
]
My goal - for any given value of the "name" key, find the value of the "id" key in the same {} block. Or to have as an example as pseudocode/task:
When Header.Tenant ==="TenantX" and Body.values[].name =="Attribute1"
then
display the value of "id"
in the same block where "Attribute1" is
Values of the "id" and "name" are unique, but the position of the combination of name/id in the Body.values[] array can be anywhere. In other words, for one tenant Attribute1 can be in the first element of the array, and for another tenant - in 10th. Also, some tenants can have certain name/id in the array and others do not.
I guess if I can find the unique position n for the given attribute within the array - Body.values[n].name, then Body.values[n].id should give me the answer, right?
Thanks
You can get all the objects in the values array with
.[].Body.values[]
and then select the id of the objects where name matches your string:
.[].Body.values[] | select(.name == "Attribute1").id

Filter All using Elasticsearch

Let's say I have User table with fields like name, address, age, etc. There are more than 1000 records in this table, so I used Elasticsearch to retrieve this data one page at a time, 20 records.
And let's say I just wanted to search for some text "Alexia", so I wanted to display: is there any record contain Alexia? But special thing is that I wanted to search this text via all my fields within the table.
Does search text match the name field or age or address or any? IF it does, it should return values. We are not going to pass any specific field for Elastic query. If it returns more than 20 records matched with my text, the pagination should work.
Any idea of how to do such a query? or any way to connect Elasticsearch?
Yes you can do that by query String
{
"size": 20,
"query": {
"query_string": {
"query": "Alexia"
},
"range": {
"dateField": {
"gte": **currentTime** -------> This could be current time or age or any property that like to do a range query
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
For getting only 20 records you can pass the Size as 20 and for Pagination you can use RangeQuery and get the next set of Messages
{
"size": 20,
"query": {
"query_string": {
"query": "Alexia"
},
"range": {
"dateField": {
"gt": 1589570610732. ------------> From previous response
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
You can do the same by using match query as well . If in match query you specify _all it will search in all the fields.
{
"size": 20,
"query": {
"match": {
"_all": "Alexia"
},
"range": {
"dateField": {
"gte": **currentTime**
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
When you are using ElasticSearch to provide search functionality in search boxes , you should avoid using query_string because it throws error in case of invalid syntax, which other queries return empty result. You can read about this from query_string.
_all is deprecated from ES6.0, so if you are using ES version from 6.x ownwards you can use copy_to to copy all the values of field into single field and then search on that single field. You can refer more from copy_to.
For pagination you can make use of from and size parameter . size parameter tells you how many documents you want to retrieve and from tells from which hit you want to process.
Query :
{
"from" : <current-count>
"size": 20,
"query": {
"match": {
"_all": "Alexia"
},
"range": {
"dateField": {
"gte": **currentTime**
}
}
},
"sort": [
{
"dateField": {
"order": "desc"
}
}
]
}
from field value you can set incremently in each iteration to how much much documents you got. For e.g. first iteration you can set from as 0 . For next iteration you can set it as 21 (since in first iteration you got first 20 hits and in second iteration you want to get documents after first 20 hits). You can refer this.

Elasticsearch is Aggregating by "Partial Term" instead of "Entire Term"

I'm currently trying to do something fancy in elasticsearch...and it ALMOST works.
Use case: I have to limit the number of results per a certain field to (x) results.
Example: In a result set of restaurants I only want to return two locations per restaurant name. If I search Mexican Food, then I should get (x) Taco Bell hits, (x) Del Taco Hits and (x) El Torito Hits.
The Problem: My aggregation is currently only matching partials of the term.
For Instance: If I try to match company_name, it will create one bucket for taco and another bucket for bell, so Taco Bell might show up in 2 buckets, resulting in (x) * 2 results for that company.
I find it hard to believe that this is the desired behavior. Is there a way to aggregate by the entire search term?
Here's my current aggregation JSON:
"aggs": {
"by_company": {
"terms": {
"field": "company_name"
},
"aggs": {
"first_hit": {
"top_hits": {"size":1, "from": 0}
}
}
}
}
Your help, as always, is greatly appreciated!
Yes. If your "company_name" is just a regular string with the standard analyzer, OR your whatever analyzer you are using for "company_name" is splitting the name then this is your answer. ES stores "terms", not words, or entire text unless you are telling it to.
Assuming your current analyzer for that field does just what I described above, then you need another - let's call it "raw" - field that should mirror your company_name field but it should store the company name as is.
This is what I mean:
{
"mappings": {
"test": {
"properties": {
...,
"company_name": {
"type": "multi_field",
"fields": {
"company_name": {
"type": "string" #and whatever you currently have in your mapping for `company_name`
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
And in your query, you'll do it like this:
"aggs": {
"by_company": {
"terms": {
"field": "company_name.raw"
},
"aggs": {
"first_hit": {
"top_hits": {"size":1, "from": 0}
}
}
}
}

Resources