how to do a cumulative array search in elasticsearch? - arrays

I want to employ hashtag searching in combination with the standard text search.
Here is the kind of query I wish to be able to make:
"leather trousers #vintage #london"
So in effect I am wanting to strip off the #hashtaged elements and search for them by name, in a cumulative sense. Firstly I want it to prioritise on an exact match via the search string, then to ones with near match + hashtags, then lastly if no match with search string, via the hash tags.
So items with both Vintage and London would be placed higher than ones with either Vintage or London.
Here is my mapping
{
"title" : {
"type" : "string",
"analyzer" : "standard"
},
"hashtags" : {
"properties" : {
"id" : "integer",
"name" : "string"
}
}
}
So the query I want to make is
"exact or near match string" + "optional cumulative array match (preferably with fuzzyness)"
or in relation to my mapping
"near or exact match on 'title'" + "cumulative array match with fizzyness on hashtag.name"
I've tried a fuzzy match but get back too much results with not enough clarity. I've tried a simple simple_query_string but it returns weird results, and tried a bool match but get back nothing when I add the array.
Any help anyone can offer will be more than gratefully accepted. Let me know if you need more info or whatever? Many thanks in advance for your time to have even read this.

maybe a "dis_max" query can work for you. it enable to make multiples differents queries and concat the results. So her it make a first queries where "hashtags = 'vintage london'" then "hashtags = 'vintage'" then "hashtags = 'london'". you can also add wildcards (*) in the researched data like "hashtags = 'london*'"
{
"fields" : ["hashtags", "title"],
"query" : {
"dis_max" : {
"tie_breaker" : 0,
"queries" : [ {
"wildcard" : {
"hashtags" : "vintage london"
}
}, {
"wildcard" : {
"hashtags" : "vintage"
}
}, {
"wildcard" : {
"hashtags" : "london"
}
}
]
}
},
"sort" : {
"_score" : "desc"
} }

Related

How to set existing field and their value to array's objects

I'm new to MongoDB. I've an object below
{
"_id" : "ABCDEFGH1234",
"level" : 0.6,
"pumps" : [
{
"pumpNo" : 1
},
{
"pumpNo" : 2
}
]
}
And I just want to move level field to pumps array's objects like this
{
"_id" : "ABCDEFGH1234",
"pumps" : [
{
"pumpNo" : 1,
"level" : 0.6
},
{
"pumpNo" : 2,
"level" : 0.6
}
]
}
I've check on MongoDB doc in Aggregation section but didn't found anything. In SQL by JOIN or SUB Query I'm able to do but here it's No-SQL
Could you please help me with this? Thankyou
Try this on for size:
db.foo.aggregate([
// Run the existing pumps array through $map and for each
// item (the "in" clause), create a doc with the existing
// pumpNo and bring in the level field from doc. All "peer"
// fields to 'pumps' are addressable as $field.
// By $projecting to a same-named field (pumps), we effectively
// overwrite the old pumps array with the new.
{$project: {pumps: {$map: {
input: "$pumps",
as: "z",
in: {pumpNo:"$$z.pumpNo", level:"$level"}
}}
}}
]);
Strongly recommend you explore the power of $map, $reduce, $concatArrays, $slice, and other array functions that make MongoDB query language different from the more scalar-based approach in SQL.

ElasticSearch: how to perform search as MSSQL "LIKE word% with tokenized string?

Currently we are performing full text search within MSSQL with query:
select * from contract where number like 'word%'
the problem is that contract number may be like
АА-1641471
TST-100069
П-5112-90-00230
001-1000017
1617/292/000001
and ES split all this into tokens.
How to configure ES not to split all this contract numbers into tokens and perform same search like SQL query above ?
the closest solution i've found is to perform query like this:
{
"size": 10,
"query": {
"regexp": {
"contractNumber": {
"value": ".*п-11.*"
}
}
}
}
this solution work same as MSSQL LIKE 'word%' with value like 1111,2568 etc, but fails with п-11
One option could be to use the wildcard query which can perform any type of wildcard combination i.e %val%, %val or val%
{
"query": {
"wildcard" : { "contractNumber" : "*11" }
}
}
NOTE: It's not recommended to start with a wildcard in the search. Could be extremely slow
To make this work with string values to prevent them from being tokenized, you need to update your index and tell the analyser to stay away. One way of doing that is to define the property as type keyword instead of text
PUT /_template/template_1
{
"index_patterns" : ["your_index*"],
"order" : 0,
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"your_document_type" : {
"properties" : {
"contractNumber" : {
"type" : "keyword"
}
}
}
}
NOTE: replace your_index with your index name and your_document_type with the document type.
When the mapping is added, delete the current index and recreate it, then it will use the template for properties and your contractNumber will be indexed as a keyword

Fetch specific array elements from a array element within another array field in mongodb

My document structure is as below.
{
"_id" : {
"timestamp" : ISODate("2016-08-27T06:00:00.000+05:30"),
"category" : "marketing"
},
"leveldata" : [
{
"level" : "manager",
"volume" : [
"45",
"145",
"2145"
]
},
{
"level" : "engineer",
"volume" : [
"2145"
]
}
]
}
"leveldata.volume" embedded array document field can have around 60 elements in it.
In this case, "leveldata" is an array document.
And "volume" is another array field inside "leveldata".
We have a requirement to fetch specific elements from the "volume" array field.
For example, elements from specific positions, For Example, position 1-5 within the array element "volume".
Also, we have used positional operator to fetch the specific array element in this case based on "leveldata.level" field.
I tried using the $slice operator. But, it seems to work only with arrays not with array inside array fields, as that
is the case in my scenario.
We can do it from the application layer, but that would mean loading the entire the array element from mongo db to memory and
then fetching the desired elements. We want to avoid doing that and fetch it directly from mongodb.
The below query is what I had used to fetch the elements as required.
db.getCollection('mycollection').find(
{
"_id" : {
"timestamp" : ISODate("2016-08-26T18:00:00.000-06:30"),
"category" : "sales"
}
,
"leveldata.level":"manager"
},
{
"leveldata.$.volume": { $slice: [ 1, 5 ] }
}
)
Can you please let us know your suggestions on how to address this issue.
Thanks,
mongouser
Well yes you can use $slice to get that data like
db.getCollection('mycollection').find({"leveldata.level":"manager"} , { "leveldata.volume" : { $slice : [3 , 1] } } )

Selective field in output from Array of Documents in MongoDB

I am new to MongoDB and still trying to learn basics.
Here is my document in MOongoDB in which I am facing some issue.
{
"_id" : 19,
"name" : "Gisela Levin",
"scores" : [
{
"type" : "exam",
"score" : 44.51211101958831
},
{
"type" : "quiz",
"score" : 0.6578497966368002
},
{
"type" : "homework",
"score" : 93.36341655949683
},
{
"type" : "homework",
"score" : 49.43132782777443
}
]
}
Now, I am firing a query something like this:to find the scores of student where type is homework in sorted order and only the homework score needs to be found.
db.students.find({"name":"Gisela Levin","scores.type":"homework"},{"_id":1,"scores.score":1});
What I am trying to achieve is getting output only the score corresponding to homework in sorted order and not others. (i.e. I dont want score for exam and quiz in the output but only for homework in sorted order).
I am trying to use projection but I am kinda struck with no forward path.
Please guide.

MongoDB embedded vs array sub document performance

Given the below competing schemas with up to 100,000 friends I’m interested in finding the most efficient for my needs.
Doc1 (Index on user_id)
{
"_id" : "…",
"user_id" : "1",
friends : {
"2" : {
"id" : "2",
"mutuals" : 3
}
"3" : {
"id" : "3",
"mutuals": "1"
}
"4" : {
"id" : "4",
"mutuals": "5"
}
}
}
Doc2 (Compound multi key index on user_id & friends.id)
{
"_id" : "…",
"user_id" : "1",
friends : [
{
"id" : "2",
"mutuals" : 3
},
{
"id" : "3",
"mutuals": "1"
},
{
"id" : "4",
"mutuals": "5"
}
]}
I can’t seem to find any information on the efficiency of the sub field retrieval. I know that mongo implements data internally as BSON, so I’m wondering whether that means a projection lookup is a binary O(log n)?
Specifically, given a user_id to find whether a friend with friend_id exists, how would the two different queries on each schema compare? (Assuming the above indexes) Note that it doesn’t really matter what’s returned, only that not null is returned if the friend exists.
Doc1col.find({user_id : "…"}, {"friends.friend_id"})
Doc2col.find({user_id : "…", "friends.id" : "friend_id"}, {"_id":1})
Also of interest is how the $set modifier works. For schema 1,given the query Doc1col.update({user_id : "…"}, {"$set" : {"friends.friend_id.mutuals" : 5}), how does the lookup on the friends.friend_id work? Is this a O(log n) operation (where n is the number of friends)?
For schema 2, how would the query Doc2col.update({user_id : "…", "friends.id" : "friend_id"}, {"$set": {"friends.$.mutuals" : 5}) compare to that of the above?
doc1 is preferable if one's primary requirements is to present data to the ui in a nice manageable package. its simple to filter only the desired data using a projection {}, {friends.2 : 1}
doc2 is your strongest match since your use case does not care about the result Note that it doesn’t really matter what’s returned and indexing will speed up the fetch.
on top of that doc2 permits the much cleaner syntax
db.doc2.findOne({user_id: 1, friends.id : 2} )
versus
db.doc1.findOne({ $and : [{ user_id: 1 }, { "friends.2" : {$exists: true} }] })
on a final note, however, one can create a sparse index on doc1 (and use $exists) but your possibility of 100,000 friends -- each friend needed a sparse index -- makes that absurd. opposed to a reasonable number of entries say demographics gender [male,female], agegroups [0-10,11-16,25-30,..] or more impt things [gin, whisky, vodka, ... ]

Resources