Ensure an array of documents exist in a collection - database

I have a collection with approximately 40k documents that look like this:
{
"_id":{"$oid":"5e988b703117c034b0630f8"},
"name":"London Heathrow Airport",
"country":"United Kingdom",
"iata":"LHR",
...
}
I want to be able to work through an array (actually a Javascript Map, which could have many hundreds of elements) like this:
["LHR", "LGW", "BFS", ...]
The items in this array relate to the iata property on the document.
I want to return an array where the elements that cannot be found in the collection are returned, and if all elements can be found then to return either a null value or empty array. So for example if "LHR" and "LGW" match the iata property of a document in the collection, but "BFS" doesn't, it should return ["BFS"]
I could interate through the array, making an individual query for each item in the array, but if the input array has many hundreds of elements, this seems very inefficient. Is there a better way!?
Thanks

You can query your collection for matching iata codes and do a diff between the result and your array of iata codes:
Given a collection airports with the following documents:
{
"_id": 1,
"name":"London Heathrow Airport",
"country":"United Kingdom",
"iata":"LHR",
},
{
"_id": 2,
"name":"Ljubljana Airport",
"country":"Slovenia",
"iata":"LJU",
},
Given the array of iata below:
var iatas = ["LHR", "LJU", "BFS"];
Find the matching documents and return result as an array of iata values:
var result = [];
db.airports.find({
"iata": {
$in: iatas
}
}).toArray().map(
function(u) {
result.push(u.iata)
}
);
The result will look like:
//result
[
"LHR",
"LJU"
]
Now you can do a diff between result and iatas:
var nonexistentIatas = iatas.filter(x => !result.includes(x));
The content of nonExistentIatas will be:
[
"BFS"
]

Related

Remove oldest N elements from document array

I have a document in my mongodb that contains a very large array (about 10k items). I'm trying to only keep the latest 1k in the array (and so remove the first 9k elements). The document looks something like this:
{
"_id" : 'fakeid64',
"Dropper" : [
{
"md5" : "fakemd5-1"
},
{
"md5" : "fakemd5-2"
},
...,
{
"md5": "fakemd5-10000"
}
]
}
How do I accomplish that?
The correct operation to do here actually involves the $push operator using the $each and $slice modifiers. The usage may initially appear counter-intuitive that you would use $push to "remove" items from an array, but the actual use case is clear when you see the intended operation.
db.collection.update(
{ "_id": "fakeid64" },
{ "$push": { "Dropper": { "$each": [], "$slice": -1000 } }
)
You can in fact just run for your whole collection as:
db.collection.update(
{ },
{ "$push": { "Dropper": { "$each": [], "$slice": -1000 } },
{ "multi": true }
)
What happens here is that the modifier for $each takes an array of items to "add" in the $push operation, which in this case we leave empty since we do not actually want to add anything. The $slice modifier given a "negative" value is actually saying to keep the "last n" elements present in the array as the update is performed, which is exactly what you are asking.
The general "intended" case is to use $slice when adding new elements to "maintain" the array at a "maximum" given length, which in this case would be 1000. So you would generally use in tandem with actually "adding" new items like this:
db.collection.update(
{ "_id": "fakeid64" },
{ "$push": { "Dropper": { "$each": [{ "md5": "fakemd5-newEntry"}], "$slice": -1000 } }
)
This would append the new item(s) provided in $each whilst also removing any items from the "start" of the array where the total length given the addition was greater than 1000.
It is stated incorrectly elsewhere that you would use $pullAll with a supplied list of the array content already existing in the document, but the operation is actually two requests to the database.
The misconception being that the request is sent as "one", but it actually is not and is basically interpreted as the longer form ( with correct usage of .slice() ):
var md5s = db.collection.findOne({ "_id": "fakeid64" }).Dropper.slice(-1000);
db.collection.update(
{ "_id": "fakeid64" },
{ "$pullAll": { "Dropper": md5s } }
)
So you can see that this is not very efficient and is in fact quite dangerous when you consider that the state of the array within the document "could" possibly change in between the "read" of the array content and the actual "write" operation on update since they occur separately.
This is why MongoDB has atomic operators for $push with $slice as is demonstrated. Since it is not only more efficient, but also takes into consideration the actual "state" of the document being modified at the time the actual modification occurs.
you can use $pullAll operator
suppose you use python/pymongo driver:
yourcollection.update_one(
{'_id': fakeid64},
{'$pullAll': {'Dropper': yourcollection.find_one({'_id': 'fakeid64'})['Dropper'][:9000]}}
)
or in mongo shell:
db.yourcollection.update(
{ _id: 'fakeid64'},
{$pullAll: {'Dropper': db.yourcollection.findOne({'_id' : 'fakeid64'})['Dropper'].slice(0,9000)}}
)
(*) having saying that it would be much better if you didn't allow your document(s) to grow this much in first place
This is just a representation of query. Basically you can unwind with limit and skip, then use cursor foreach to remove the items like below :
db.your_collection.aggregate([
{ $match : { _id : 'fakeid64' } },
{ $unwind : "$Dropper"},
{ $skip : 1000},
{ $limit : 9000}
]).forEach(function(doc){
db.your_collection.update({ _id : doc._id}, { $pull : { Dropper : doc.Dropper} });
});
from mongo docs
db.students.update(
{ _id: 1 },
{
$push: {
scores: {
$each: [ { attempt: 3, score: 7 }, { attempt: 4, score: 4 } ],
$sort: { score: 1 },
$slice: -3
}
}
}
)
The following update uses the $push operator with:
the $each modifier to append to the array 2 new elements,
the $sort modifier to order the elements by ascending (1) score, and
the $slice modifier to keep the last 3 elements of the ordered array.

Add data to array within array in MongoDB

So heres my mongodb document:
{
"_id" : "",
"lists" : [
{
"name" : "list 1",
"items" : []
},
{
"name" : "list 2",
"items" : []
}
]
}
How would I go about adding an object inside "items"?
This is the code I have so far, but it doesn't work:
xxx.update(_id, {$push: { "lists.$.items": item}});
Note that I have access to the index (variable called 'index'), so its possible to insert an item at index, 0, 1, 2..., etc.
I tried this before, but it won't work:
xxx.update({_id, "lists": index}, {$push: { "lists.$.items": item}});
I also looked at other similar questions and couldn't find anything. Most of them have some sort of id field in their arrays, but I don't.
What about
xxx.update({_id}, {$push: { "lists.index.items": item}});
Of course this would fail, what I mean is replace index with real index values
xxx.update({_id}, {$push: { "lists.2.items": item}});
You can manipulate the update json based on the index maybe as below.
var update = '{$push: { "lists.'+index+'.items": '+item+'}}';
var updateObj = JSON.parse(update);
xxx.update({_id}, updateObj);
Not sure if it will work as it is or it would need further tweaking, but you get the idea.

MongoDB find all not in this array

I'm trying to find all users except for a few, like this:
// get special user IDs
var special = db.special.find({}, { _id: 1 }).toArray();
// get all users except for the special ones
var users = db.users.find({_id: {$nin: special}});
This doesn't work because the array that I'm passing to $nin is not and array of ObjectId but an array of { _id: ObjectId() }
Variable special looks like this after the first query:
[ { _id: ObjectId(###) }, { _id: ObjectId(###) } ]
But $nin in the second query needs this:
[ ObjectId(###), ObjectId(###) ]
How can I get just the ObjectId() in an array from the first query so that I can use them in the second query?
Or, is there a better way of achieving what I'm trying to do?
Use the cursor.map() method returned by the find() function to transform the list of { _id: ObjectId(###) } documents to an array of ObjectId's as in the following
var special = db.special.find({}, { _id: 1 }).map(function(doc){
return doc._id;
});
Another approach you can consider is using the $lookup operator in the aggregation framework to do a "left outer join" on the special collection and filtering the documents on the new "joined" array field. The filter should match on documents whose array field is empty.
The following example demonstrates this:
db.users.aggregate([
{
"$lookup": {
"from": "special",
"localField": "_id",
"foreignField": "_id",
"as": "specialUsers" // <-- this will produce an arry of "joined" docs
}
},
{ "$match": { "specialUsers.0": { "$exists": false } } } // <-- match on empty array
])

How to get keys from anArray Object in reactjs

i have the following Array Object :
[
{
"key1": abc,
"key2":xyz
},
{
"key1": abc,
"key2":xyz
}
]
Now what i want is to print "key1" & "key2". I know we can iterate through values using map, but i also want to iterate through Array keys.
Assuming ArrayObj contains the key:value pairs, we can do the following:
let keys = Object.keys(ArrayObj);
for(index=0;index<keys.length;index++)
{
console.log(keys[index]);
}

How to retrieve a specific field from a subdocument array with mongoose

I'm trying to get a specific field from a subdocument array
I'm not gonna include any of the fields in the parent doc
Here is the sample document
{
"_id" : ObjectId("5409dd36b71997726532012d"),
"hierarchies" : [
{
"rank" : 1,
"_id" : ObjectId("5409df85b719977265320137"),
"name" : "CTO",
"userId" : [
ObjectId("53a47a639c52c9d83a2d71db")
]
}
]
}
I would like to return the rank of the hierarchy if the a userId is in the userId array
here's what I have so far in my query
collectionName.find({{hierarchies:
{$elemMatch : {userId: ObjectId("53a47a639c52c9d83a2d71db")}}}
, "hierarchies.$.rank", function(err,data){}
so far it returns the entire object in the hierarchies array I want, but I would like to limit it to just the rank property of the object.
The projection available to .find() queries generally in MongoDB does not do this sort of projection for internal elements of an array. All you can generally do is return the "matched" element of the array entirely.
For what you want, you use the aggregation framework instead, which gives you more control over matching and projection:
Model.aggregate([
{ "$match": {
"hierarchies.userId": ObjectId("53a47a639c52c9d83a2d71db")
}},
{ "$unwind": "$hierarchies" },
{ "$match": {
"hierarchies.userId": ObjectId("53a47a639c52c9d83a2d71db")
}},
{ "$project": {
"rank": "$hierarchies.rank"
}}
],function(err,result) {
})
That basically matches the documents, filters the array content of the document to just the match and then projects only the required field.

Resources