MongoDB embedded vs array sub document performance - arrays

Given the below competing schemas with up to 100,000 friends I’m interested in finding the most efficient for my needs.
Doc1 (Index on user_id)
{
"_id" : "…",
"user_id" : "1",
friends : {
"2" : {
"id" : "2",
"mutuals" : 3
}
"3" : {
"id" : "3",
"mutuals": "1"
}
"4" : {
"id" : "4",
"mutuals": "5"
}
}
}
Doc2 (Compound multi key index on user_id & friends.id)
{
"_id" : "…",
"user_id" : "1",
friends : [
{
"id" : "2",
"mutuals" : 3
},
{
"id" : "3",
"mutuals": "1"
},
{
"id" : "4",
"mutuals": "5"
}
]}
I can’t seem to find any information on the efficiency of the sub field retrieval. I know that mongo implements data internally as BSON, so I’m wondering whether that means a projection lookup is a binary O(log n)?
Specifically, given a user_id to find whether a friend with friend_id exists, how would the two different queries on each schema compare? (Assuming the above indexes) Note that it doesn’t really matter what’s returned, only that not null is returned if the friend exists.
Doc1col.find({user_id : "…"}, {"friends.friend_id"})
Doc2col.find({user_id : "…", "friends.id" : "friend_id"}, {"_id":1})
Also of interest is how the $set modifier works. For schema 1,given the query Doc1col.update({user_id : "…"}, {"$set" : {"friends.friend_id.mutuals" : 5}), how does the lookup on the friends.friend_id work? Is this a O(log n) operation (where n is the number of friends)?
For schema 2, how would the query Doc2col.update({user_id : "…", "friends.id" : "friend_id"}, {"$set": {"friends.$.mutuals" : 5}) compare to that of the above?

doc1 is preferable if one's primary requirements is to present data to the ui in a nice manageable package. its simple to filter only the desired data using a projection {}, {friends.2 : 1}
doc2 is your strongest match since your use case does not care about the result Note that it doesn’t really matter what’s returned and indexing will speed up the fetch.
on top of that doc2 permits the much cleaner syntax
db.doc2.findOne({user_id: 1, friends.id : 2} )
versus
db.doc1.findOne({ $and : [{ user_id: 1 }, { "friends.2" : {$exists: true} }] })
on a final note, however, one can create a sparse index on doc1 (and use $exists) but your possibility of 100,000 friends -- each friend needed a sparse index -- makes that absurd. opposed to a reasonable number of entries say demographics gender [male,female], agegroups [0-10,11-16,25-30,..] or more impt things [gin, whisky, vodka, ... ]

Related

How to set existing field and their value to array's objects

I'm new to MongoDB. I've an object below
{
"_id" : "ABCDEFGH1234",
"level" : 0.6,
"pumps" : [
{
"pumpNo" : 1
},
{
"pumpNo" : 2
}
]
}
And I just want to move level field to pumps array's objects like this
{
"_id" : "ABCDEFGH1234",
"pumps" : [
{
"pumpNo" : 1,
"level" : 0.6
},
{
"pumpNo" : 2,
"level" : 0.6
}
]
}
I've check on MongoDB doc in Aggregation section but didn't found anything. In SQL by JOIN or SUB Query I'm able to do but here it's No-SQL
Could you please help me with this? Thankyou
Try this on for size:
db.foo.aggregate([
// Run the existing pumps array through $map and for each
// item (the "in" clause), create a doc with the existing
// pumpNo and bring in the level field from doc. All "peer"
// fields to 'pumps' are addressable as $field.
// By $projecting to a same-named field (pumps), we effectively
// overwrite the old pumps array with the new.
{$project: {pumps: {$map: {
input: "$pumps",
as: "z",
in: {pumpNo:"$$z.pumpNo", level:"$level"}
}}
}}
]);
Strongly recommend you explore the power of $map, $reduce, $concatArrays, $slice, and other array functions that make MongoDB query language different from the more scalar-based approach in SQL.

Fetch specific array elements from a array element within another array field in mongodb

My document structure is as below.
{
"_id" : {
"timestamp" : ISODate("2016-08-27T06:00:00.000+05:30"),
"category" : "marketing"
},
"leveldata" : [
{
"level" : "manager",
"volume" : [
"45",
"145",
"2145"
]
},
{
"level" : "engineer",
"volume" : [
"2145"
]
}
]
}
"leveldata.volume" embedded array document field can have around 60 elements in it.
In this case, "leveldata" is an array document.
And "volume" is another array field inside "leveldata".
We have a requirement to fetch specific elements from the "volume" array field.
For example, elements from specific positions, For Example, position 1-5 within the array element "volume".
Also, we have used positional operator to fetch the specific array element in this case based on "leveldata.level" field.
I tried using the $slice operator. But, it seems to work only with arrays not with array inside array fields, as that
is the case in my scenario.
We can do it from the application layer, but that would mean loading the entire the array element from mongo db to memory and
then fetching the desired elements. We want to avoid doing that and fetch it directly from mongodb.
The below query is what I had used to fetch the elements as required.
db.getCollection('mycollection').find(
{
"_id" : {
"timestamp" : ISODate("2016-08-26T18:00:00.000-06:30"),
"category" : "sales"
}
,
"leveldata.level":"manager"
},
{
"leveldata.$.volume": { $slice: [ 1, 5 ] }
}
)
Can you please let us know your suggestions on how to address this issue.
Thanks,
mongouser
Well yes you can use $slice to get that data like
db.getCollection('mycollection').find({"leveldata.level":"manager"} , { "leveldata.volume" : { $slice : [3 , 1] } } )

Selective field in output from Array of Documents in MongoDB

I am new to MongoDB and still trying to learn basics.
Here is my document in MOongoDB in which I am facing some issue.
{
"_id" : 19,
"name" : "Gisela Levin",
"scores" : [
{
"type" : "exam",
"score" : 44.51211101958831
},
{
"type" : "quiz",
"score" : 0.6578497966368002
},
{
"type" : "homework",
"score" : 93.36341655949683
},
{
"type" : "homework",
"score" : 49.43132782777443
}
]
}
Now, I am firing a query something like this:to find the scores of student where type is homework in sorted order and only the homework score needs to be found.
db.students.find({"name":"Gisela Levin","scores.type":"homework"},{"_id":1,"scores.score":1});
What I am trying to achieve is getting output only the score corresponding to homework in sorted order and not others. (i.e. I dont want score for exam and quiz in the output but only for homework in sorted order).
I am trying to use projection but I am kinda struck with no forward path.
Please guide.

MongoDB - Index for object update in nested Array

Assume we have the following collection, which I have a question about:
{
"_id" : 1,
"user_id" : 12345,
"items" : [
{
"item_id" : 1,
"value" : 21,
"status" : "active"
},
{
"item_id" : 2,
"value" : 22,
"status" : "active"
},
{
"item_id" : 3,
"value" : 23,
"status" : "active"
},
...
{
"item_id" : 1000,
"value" : 1001,
"status" : "active"
},
]
}
In a collection I have a lot of documents (as much as users in the system, at about 100K documents in collection). In every document I have around 1000 documents inside array "items"
The list of operations that will be used:
Read whole document once user logins to the system (rare operation).
Update a single document in a nested array items and set "value" and "status" almost on every "user click" (frequent operation)
db.items.update({_id : 1 , "items.item_id" : 1000} , {$set: {"items.$.value": 1000}})
Insert a new document to a collection with 1000 documents in nested array. This operation will be done on a new user registration (very rare operation)
The question is: Do I need to create a compound index like
db.items.createIndex( { "_id": 1, "items.item_id": 1 } )
to help the MongoDB to update certain document inside array or MongoDB does search in whole document no matter of compound index? Or maybe someone can propose a different schema for such a scenario?

how to do a cumulative array search in elasticsearch?

I want to employ hashtag searching in combination with the standard text search.
Here is the kind of query I wish to be able to make:
"leather trousers #vintage #london"
So in effect I am wanting to strip off the #hashtaged elements and search for them by name, in a cumulative sense. Firstly I want it to prioritise on an exact match via the search string, then to ones with near match + hashtags, then lastly if no match with search string, via the hash tags.
So items with both Vintage and London would be placed higher than ones with either Vintage or London.
Here is my mapping
{
"title" : {
"type" : "string",
"analyzer" : "standard"
},
"hashtags" : {
"properties" : {
"id" : "integer",
"name" : "string"
}
}
}
So the query I want to make is
"exact or near match string" + "optional cumulative array match (preferably with fuzzyness)"
or in relation to my mapping
"near or exact match on 'title'" + "cumulative array match with fizzyness on hashtag.name"
I've tried a fuzzy match but get back too much results with not enough clarity. I've tried a simple simple_query_string but it returns weird results, and tried a bool match but get back nothing when I add the array.
Any help anyone can offer will be more than gratefully accepted. Let me know if you need more info or whatever? Many thanks in advance for your time to have even read this.
maybe a "dis_max" query can work for you. it enable to make multiples differents queries and concat the results. So her it make a first queries where "hashtags = 'vintage london'" then "hashtags = 'vintage'" then "hashtags = 'london'". you can also add wildcards (*) in the researched data like "hashtags = 'london*'"
{
"fields" : ["hashtags", "title"],
"query" : {
"dis_max" : {
"tie_breaker" : 0,
"queries" : [ {
"wildcard" : {
"hashtags" : "vintage london"
}
}, {
"wildcard" : {
"hashtags" : "vintage"
}
}, {
"wildcard" : {
"hashtags" : "london"
}
}
]
}
},
"sort" : {
"_score" : "desc"
} }

Resources