I found two different syntaxes for querying in mongo where an array is non-empty. I imagine the crux of this may actually be my data rather than the query, but in the event mongo is doing some null-shenanigans I don't understand, I wanted to check here---what is the preferred way of selecting documents where the 'institution.tags' array is ["populated", "and"] not []?
First option---check that the 0-th item of the array exists:
> db.coll.find({'institution.tags.0': {'$exists':true}}).count()
7330
Second option---check that this list field is not null:
> db.coll.find({'institution.tags': {"$ne":null}}).count()
28014
In theory, all fields named 'institution.tags' are an array type---I don't expect any to be dictionary-types, strings, or numbers. But I do otherwise see dramatically different counts, so I'm wondering what I should be expecting, and which query is better both semantically (is it doing what I think it is doing) or for performance.
The following snippet from the Mongo shell should clarify your question:
> db.coll.insert({_id:1})
> db.coll.insert({_id:2, array: []})
> db.coll.insert({_id:3, array: ["entry"]})
> db.coll.insert({_id:4, array: "no_array"})
> db.coll.find().pretty()
{ "_id" : 1 }
{ "_id" : 2, "array" : [ ] }
{ "_id" : 3, "array" : [ "entry" ] }
{ "_id" : 4, "array" : "no_array" }
> db.coll.count({array: {$exists:true}})
3
> db.coll.count({array: {$ne:null}})
3
> db.coll.count({"array.0": {$exists:true}})
1
> db.coll.count({$and: [{$where: "Array.isArray(this.array)"}, {array: {$size: {$gt: 0}}}]})
1
Your first option is probably the right way to go, but if you want to be as explicit as possible have a look at the last query in my example which uses $where to check the $type as well as $size. Your second option simply checks if the field exists, you will not know
whether it is an array
or - if so - whether it is empty
Related
Suppose I have a collection whose documents have this structure
{
...
"ts": [NumberLong("5"), NumberLong("7")]
...
}
where ts is an array of two Long elements, where the second is strictly bigger than the first one.
I want to retrieve all documents where all the elements of the array ts are within a range (bigger than a value but smaller than another).
Suppose the range is between 4 and 9; I am trying this query, but I find unexpected results:
db.segments.find({$nor: [{ "ts": {$gt:9, $lt:4}}]}).toArray()
If you have fixed number of array length then you can use the index of array in query part,
db.segments.find({
"ts.0": { $gt: 4 },
"ts.1": { $lt: 9 }
}).toArray()
Playground
I have documents with an array of events objects :
{
"events": [
{
"name": "A"
},
{
"name": "C"
},
{
"name": "D"
},
{
"name": "B"
},
{
"name": "E"
}
]
},
{
"events": [
{
"name": "A"
},
{
"name": "B"
},
{
"name": "S"
},
{
"name": "C"
}
]
}
]
In this array, I want to count the number of events that are in a said order, with intervening events. For example, I look for the order [A,B,C], with the array [A,x,x,B,x], I should count 2, with [A,B,x,x,C] I should have 3. (x is just a placeholder for anything else)
I want to summarize this information for all my documents in the shape of an array, with the number of matches for each element. With the previous example that would give me [2,2,1], 2 matches for A, 2 matches for B, 1 match for C.
My Current aggregation is generated in javascript and follow this pattern :
Match documents with the event array containing A
Slice the array from A to the end of the array
Count the number of documents
Append the count of matching document to the summarizing array
Match documents with the event array containing B
Slice the array from B to the end of the array
Count the number of documents
etc
However, when an event does not appear in any of the arrays, it falls shorts, as there are no documents, I do not have a way to store the summarizing array. For example, with the events array [A,x,x,B,x] [A,B,x,x,C] and trying to match [A,B,D], I would expect [2,2,0], but I have [] as when trying to match D nothing comes up, and the aggregation cannot continue.
Here is the aggregation I'm working with : https://mongoplayground.net/p/rEdQD4FbyC4
change the matching letter l.75 to something not in the array to have the problematic behavior.
So is there a way to not lose my data when there is no match? like bypassing aggregation stages, I could not find anything related to bypassing stages in the mongoDB documentation.
Or are you aware of another way of doing this?
We ended using a reduce to solve our problem
The reduce is on the events array and with every event we try to match it with the element in sequence at the position "size of the accumulator", if it is a match it's added to the accumulator, ortherwise no, etc
here is the mongo playground : https://mongoplayground.net/p/ge4nlFWuLsZ\
The sequence we want to match is in the field "sequence"
The matched elements are in the "matching" field
Say I have a doc of the format:
{
arr: [{id: 0}, {id: 1}, ...., {id: m-1}, {id: m}],
n: number
}
so an array of objects and an n property. I want to get the nth element of the array (arr[n]).
each object in the array also has an id property that correlates to it's index so another option is to query the array for the element with id=n.
I did some research on how to get the Nth item of an array using $slice, as well as on $elemMatch.
I couldn't figure out how can I write a query that returns the Nth element of the array, when I don't know the N value, and must get it from the doc itself during the same query.
I could get the entire array, but it can get very large (even 100K+ elements) and so I'd much rather get the one I need, either in the query or the projection part of the find.
Any ideas?
Thanks,
Sefi
Figured it out :)
Turns out the way to do it is by using aggregate and not find, and defining query vars using $let:
db.getCollection('<collection>').aggregate([
{$match: {key: '<key>'}},
{$project: {
obj: {
$let: {
vars: {
idx: '$n',
objects: '$arr'
},
in: {$arrayElemAt: ['$$objects', '$$idx']}
}
}
}}
])
I have a keyword array field (say f) and I want to filter documents with an exact array (e.g. filter docs with f = [1, 3, 6] exactly, same order and number of terms).
What is the best way of doing this?
Regards
One way to achieve this is to add a script to the query which would also check the number of elements in the array.
it script would be something like
"filters": [
{
"script": {
"script": "doc['f'].values.length == 3"
}
},
{
"terms": {
"f": [
1,
3,
6
],
"execution": "and"
}
}
]
Hope you get the idea.
I think an even better idea would be to store the array as a string (if there are not many changes to the structure of the graph) and matching the string directly. This would be much faster too.
I want to use an array as a counter for documents that are associated with this document. Each array element corresponds to a document, and the number in that array corresponds to some piece of data about that document. These numbers are not stored with their associated documents so queries can be performed on the fields of the 'primary' document as well as the data in the array for the 'associated' documents.
Due to the vagaries of $inc and array initialization, there appears to be no way to do this:
> use foo
switched to db foo
> db.foo.save({id: []})
> db.foo.update({}, {$inc: {'id.4': 1}})
> db.foo.find().pretty()
{
"_id" : ObjectId("5279b3339a056b26e64eef0d"),
"id" : [
null,
null,
null,
null,
1
]
}
> db.foo.update({}, {$inc: {'id.2': 1}})
Cannot apply $inc modifier to non-number
Is there some way to increment a null or initialize the array with zeros?
An alternative is storing the array values in a key/value format which would therefore be a sparse array, but there's no way to upsert into an array, which makes that alternative infeasible.
While I'm not in favor of potentially unlimited in size arrays, there is a trick you can use to make the right thing happen here:
db.foo.insert({id:[]})
Whenever you increment a counter at index N, make sure you increment by 0 all indexes up to N.
db.foo.update({}, {$inc: {'id.0':0, 'id.1':0, 'id.2':0, 'id.3':0, 'id.4':1 } })
db.foo.update({}, {$inc: {'id.0':0, 'id.1':0, 'id.2':1 } })
db.foo.find()
{ "_id" : ObjectId("5279c8de0e05b308d0cf21ca"), "id" : [ 0, 0, 1, 0, 1 ] }