MongoDB increment at arbitrary array position - arrays

I want to use an array as a counter for documents that are associated with this document. Each array element corresponds to a document, and the number in that array corresponds to some piece of data about that document. These numbers are not stored with their associated documents so queries can be performed on the fields of the 'primary' document as well as the data in the array for the 'associated' documents.
Due to the vagaries of $inc and array initialization, there appears to be no way to do this:
> use foo
switched to db foo
> db.foo.save({id: []})
> db.foo.update({}, {$inc: {'id.4': 1}})
> db.foo.find().pretty()
{
"_id" : ObjectId("5279b3339a056b26e64eef0d"),
"id" : [
null,
null,
null,
null,
1
]
}
> db.foo.update({}, {$inc: {'id.2': 1}})
Cannot apply $inc modifier to non-number
Is there some way to increment a null or initialize the array with zeros?
An alternative is storing the array values in a key/value format which would therefore be a sparse array, but there's no way to upsert into an array, which makes that alternative infeasible.

While I'm not in favor of potentially unlimited in size arrays, there is a trick you can use to make the right thing happen here:
db.foo.insert({id:[]})
Whenever you increment a counter at index N, make sure you increment by 0 all indexes up to N.
db.foo.update({}, {$inc: {'id.0':0, 'id.1':0, 'id.2':0, 'id.3':0, 'id.4':1 } })
db.foo.update({}, {$inc: {'id.0':0, 'id.1':0, 'id.2':1 } })
db.foo.find()
{ "_id" : ObjectId("5279c8de0e05b308d0cf21ca"), "id" : [ 0, 0, 1, 0, 1 ] }

Related

updating a document using the $max operator on arrays

i have the following document:
{
_id: 12,
item: 'envelope',
qty: ISODate("2021-12-05T00:00:00.000Z"),
arrayField: [ 128, 190, 1 ]
}
and i try to update it using this command
products> db.products.update({_id:12},{$max : { arrayField : [1,190,1879]} })
the output is as follows:
{
acknowledged: true,
insertedId: null,
matchedCount: 1,
modifiedCount: 0,
upsertedCount: 0
}
I don't really understand how the comparison between the existing arrayField and the new one is being done. They are both Arrays, so there should be some kind of comparison on every element, but how exactly does it work?
From the documentation i read this:
With arrays, a less-than comparison or an ascending sort compares the smallest element of arrays, and a greater-than comparison or a descending sort compares the largest element of the arrays. As such, when comparing a field whose value is a single-element array (e.g. 1 ) with non-array fields (e.g. 2), the comparison is between 1 and 2. A comparison of an empty array (e.g. [ ]) treats the empty array as less than null or a missing field.
But i still don't understand exactly... Could someone provide an example in my case?
Thanks in advance
MongoDB takes the min or max, to represent the array in comparisons, for the query operators.
Numbers
{"ar" : [1,2,3]}
(<= ar 1) => (<= min(1,2,3) 3) => (<= 1 3) true
(>= ar 3) => (>= max(1,2,3) 3) => (<= 3 3) true
(= ar 2) => true because it contains the element
For empty arrays either < or > its always false compared to a number
Arrays
(again take the min if < , or max it >)
{"ar" : [1,2,3]}
(<= ar [0 1 2 3]) false because its like min(ar)=1 <min(0,1,2,3)=0
(= ar [1]) false we need all elements =
For the update $max operator
if both are arrays => elements are compared one by one.
max [1 2 3] [5] => [5]
max [1 2] [1 2 -100] => [1 2 -100]
Those are only for the $gte,$gt,$lt,$lte, $eq query operators.
The aggregate ones with same names are strict and don't work like this.
*Its not like complete because we have many types,i think the above are ok, answer might help, it was big to fit in comments.

Mongodb query: find documents having all elements of array in a given range

Suppose I have a collection whose documents have this structure
{
...
"ts": [NumberLong("5"), NumberLong("7")]
...
}
where ts is an array of two Long elements, where the second is strictly bigger than the first one.
I want to retrieve all documents where all the elements of the array ts are within a range (bigger than a value but smaller than another).
Suppose the range is between 4 and 9; I am trying this query, but I find unexpected results:
db.segments.find({$nor: [{ "ts": {$gt:9, $lt:4}}]}).toArray()
If you have fixed number of array length then you can use the index of array in query part,
db.segments.find({
"ts.0": { $gt: 4 },
"ts.1": { $lt: 9 }
}).toArray()
Playground

Field name generated from embedded document. Am I crazy?

What are the potential problems for the following design decision?
Suppose you have a MongoDB collection where, for each document, you want to store many documents in one of the embedded fields. Think about a kind of one-to-many relationship.
For different reasons, an array is to be avoided, meaning, the documents in the collection won't be like this
{
p: 1,
q: [
{ k1: 1, k2 : "p", x: "aaa" },
{ k1: 2, k2 : "b", x: "bbb" }
]
}
Instead, I choose to do the following
{
p: 1,
q: {
KEY1 : { k1: 1, k2 : "a", x: "aaa" },
KEY2 : { k1: 2, k2 : "b", x: "bbb" }
}
}
where KEY1 and KEY are unique strings representing the documents {k1: 1, k2 : "a" } and {k1: 2, k2 : "b"}, respectively.
Such string could be calculated in many ways, as far as the representation is unique. For example, {k1: 1, k2 : "a"} and {k2 : "a" , k1: 1} should have the same string, and should be different from the one of {k2 : "a" , k1: "1"}. It should take into account that the values for some ki could also be a document.
By the way, I cannot use a hash function for calculating KEY, as I need to store all the documents.
(If you are still here, the reason I didn't use an array is because I need atomicity when adding documents to the field q, and I need to modify the field x, although k1 and k2 will not be modified was added to q. This design was based on this question: MongoDB arrays - atomic update or push element. $addToSet only work for whole documents)
Two possible source of problems:
The numbers of possible KEY would grow fast. (Although in my case it should be under the thousands)
The keys themselves could be very long strings. Can it degrade the performance?
Technical view
About feasibility, the documentation of MongoDB only says fields cannot be _id and cannot include the characters $, . or null. The BSON spec only says it should be an Modified UTF-8 string, not including null.
I'd have done the following, but MongoDB complained the keys should be non mutable:
{
p: 1,
q: {
{ k1: 1, k2 : "a" } : { x: "aaa" },
{ k1: 2, k2 : "b" } : { x: "bbb" }
}
}
You can, however, use related similar notation with using the operator $group in the aggregation framework. Just similar notation: You cannot save such things into a collection.
This whole idea, it seems, would not be necessary if the documents {k1: 1, k2 : "a" } where to be store directly in the collection, meaning not embedded. In that case, I just would set k1 and k2 as a unique index, and then use update/upsert to insert without repetition. All this overkill is because that cannot in an Array. Indeed, it seem an array is almost like a collection, where the _id is the position in the array. If I'm not wrong in this paragraph, then what ever is representable in the top-level collection, should be representable in an embedded document.
EDIT: What about using a collection instead of embedding?
(Edited after comment by #hd)
My end goal is to have a one-to-many relationship with atomicity, especially while updating the many-side.
Lets explore the idea of having separate documents for representing the one-to-many relationship. It mean two collections:
Collection cOne
{
p: 1,
q: _id_in_Many
}
Collection cMany
{
id: ...,
p: 1,
q: { k1: 1, k2 : "p" },
x: "aaa",
}
In this case, I should use an unique index in cMany + updateOne/upsert to ensure uniqueness, {p: 1, q: 1}. But then indexes has a limit of 1024 bytes per entry, but {k1: ..., k2 : ...} could go beyond it, especially if the values contain utf-8 strings.
If I use anyway a KEY generated as explained, like this
{
id: ...,
p: 1,
key: KEY1,
k1 : 1,
k2 : "p" ,
x: "aaa",
}
Then the posibility of hitting the 1024 bytes limit persist for the index {p : 1, key: 1}. I got to say, I don't expect the {k1: ..., k2 : ...} to go far beyond 1k. I'm aware of the 16b limit per document.
Maybe there is a principled way to have collections unique on a field which values would let the index entries go over 1k, but I couldn't find it. The mongo documentation of upsert says "To avoid multiple upserts, ensure that the filter fields are uniquely indexed."
In contrast, it seems there is no official restriction on the length of the fields names, and field assignment should, as any other document update, unique.
EDIT 2: Are arrays and documents more powerful than Collections?
(Edited after comment by #hd)
Since I haven't found a way to add an arbitrary document to a Collection, preserving uniqueness, we could argue that Documents and Arrays are more powerful, uniqueness-wise, than Collections. Documents field names are unique, and Arrays at least support $addToSet, that would be enough if I only had the keys k1, k2 but not mutable x.

Mapping array values to hash with incrementing key name

I have an array of strings that are hex color codes like so:
["#121427", "#291833", "#4B2E4D", "#5D072F", "#BB2344", "#ED9F90"]
and I want to convert them into a hash with distinct key name where there is a distinct name "color" and then a integer value which increments as it transverses the array adding them like so:
{"color1" => "#121427", "color2" => "#291833", "color3" => "#4B2E4D", "color4" => "#5D072F", "color5" => "#BB2344", "color6" => "#ED9F90"}
The integer value can be 0 based or 1 based, it doesn't matter which ever is cleaner.
I've tried using the map method along with the to_h method, although I cannot figure out how to create an incremental key name as described.
It's not too hard to do this using the each_with_index method which is zero-indexed by default:
Hash[colors.each_with_index.map { |c, i| [ 'color%d' % i, c ] }]
You were close with map, you just needed to expand it into value/index pairs.
Another possible way:
colors.each_with_object({}).with_index(1){|(e, h), i| h["color#{i}"] = e}
It uses:
Enumerable#each_with_object to skip one temporary array and modify the resulting hash directly
Enumerator#with_index to also supply an auto-incrementing integer (it can even start from 1)
If arr is your array, you could do this:
idx = 1.step
# => #<Enumerator: 1:step>
arr.each_with_object({}) { |s,h| h["color#{idx.next}"] = s }
#=> {"color1"=>"#121427", "color2"=>"#291833", "color3"=>"#4B2E4D",
# "color4"=>"#5D072F", "color5"=>"#BB2344", "color6"=>"#ED9F90"}

MongoDB selecting all where an array is non-empty, 2 different results

I found two different syntaxes for querying in mongo where an array is non-empty. I imagine the crux of this may actually be my data rather than the query, but in the event mongo is doing some null-shenanigans I don't understand, I wanted to check here---what is the preferred way of selecting documents where the 'institution.tags' array is ["populated", "and"] not []?
First option---check that the 0-th item of the array exists:
> db.coll.find({'institution.tags.0': {'$exists':true}}).count()
7330
Second option---check that this list field is not null:
> db.coll.find({'institution.tags': {"$ne":null}}).count()
28014
In theory, all fields named 'institution.tags' are an array type---I don't expect any to be dictionary-types, strings, or numbers. But I do otherwise see dramatically different counts, so I'm wondering what I should be expecting, and which query is better both semantically (is it doing what I think it is doing) or for performance.
The following snippet from the Mongo shell should clarify your question:
> db.coll.insert({_id:1})
> db.coll.insert({_id:2, array: []})
> db.coll.insert({_id:3, array: ["entry"]})
> db.coll.insert({_id:4, array: "no_array"})
> db.coll.find().pretty()
{ "_id" : 1 }
{ "_id" : 2, "array" : [ ] }
{ "_id" : 3, "array" : [ "entry" ] }
{ "_id" : 4, "array" : "no_array" }
> db.coll.count({array: {$exists:true}})
3
> db.coll.count({array: {$ne:null}})
3
> db.coll.count({"array.0": {$exists:true}})
1
> db.coll.count({$and: [{$where: "Array.isArray(this.array)"}, {array: {$size: {$gt: 0}}}]})
1
Your first option is probably the right way to go, but if you want to be as explicit as possible have a look at the last query in my example which uses $where to check the $type as well as $size. Your second option simply checks if the field exists, you will not know
whether it is an array
or - if so - whether it is empty

Resources