What are the potential problems for the following design decision?
Suppose you have a MongoDB collection where, for each document, you want to store many documents in one of the embedded fields. Think about a kind of one-to-many relationship.
For different reasons, an array is to be avoided, meaning, the documents in the collection won't be like this
{
p: 1,
q: [
{ k1: 1, k2 : "p", x: "aaa" },
{ k1: 2, k2 : "b", x: "bbb" }
]
}
Instead, I choose to do the following
{
p: 1,
q: {
KEY1 : { k1: 1, k2 : "a", x: "aaa" },
KEY2 : { k1: 2, k2 : "b", x: "bbb" }
}
}
where KEY1 and KEY are unique strings representing the documents {k1: 1, k2 : "a" } and {k1: 2, k2 : "b"}, respectively.
Such string could be calculated in many ways, as far as the representation is unique. For example, {k1: 1, k2 : "a"} and {k2 : "a" , k1: 1} should have the same string, and should be different from the one of {k2 : "a" , k1: "1"}. It should take into account that the values for some ki could also be a document.
By the way, I cannot use a hash function for calculating KEY, as I need to store all the documents.
(If you are still here, the reason I didn't use an array is because I need atomicity when adding documents to the field q, and I need to modify the field x, although k1 and k2 will not be modified was added to q. This design was based on this question: MongoDB arrays - atomic update or push element. $addToSet only work for whole documents)
Two possible source of problems:
The numbers of possible KEY would grow fast. (Although in my case it should be under the thousands)
The keys themselves could be very long strings. Can it degrade the performance?
Technical view
About feasibility, the documentation of MongoDB only says fields cannot be _id and cannot include the characters $, . or null. The BSON spec only says it should be an Modified UTF-8 string, not including null.
I'd have done the following, but MongoDB complained the keys should be non mutable:
{
p: 1,
q: {
{ k1: 1, k2 : "a" } : { x: "aaa" },
{ k1: 2, k2 : "b" } : { x: "bbb" }
}
}
You can, however, use related similar notation with using the operator $group in the aggregation framework. Just similar notation: You cannot save such things into a collection.
This whole idea, it seems, would not be necessary if the documents {k1: 1, k2 : "a" } where to be store directly in the collection, meaning not embedded. In that case, I just would set k1 and k2 as a unique index, and then use update/upsert to insert without repetition. All this overkill is because that cannot in an Array. Indeed, it seem an array is almost like a collection, where the _id is the position in the array. If I'm not wrong in this paragraph, then what ever is representable in the top-level collection, should be representable in an embedded document.
EDIT: What about using a collection instead of embedding?
(Edited after comment by #hd)
My end goal is to have a one-to-many relationship with atomicity, especially while updating the many-side.
Lets explore the idea of having separate documents for representing the one-to-many relationship. It mean two collections:
Collection cOne
{
p: 1,
q: _id_in_Many
}
Collection cMany
{
id: ...,
p: 1,
q: { k1: 1, k2 : "p" },
x: "aaa",
}
In this case, I should use an unique index in cMany + updateOne/upsert to ensure uniqueness, {p: 1, q: 1}. But then indexes has a limit of 1024 bytes per entry, but {k1: ..., k2 : ...} could go beyond it, especially if the values contain utf-8 strings.
If I use anyway a KEY generated as explained, like this
{
id: ...,
p: 1,
key: KEY1,
k1 : 1,
k2 : "p" ,
x: "aaa",
}
Then the posibility of hitting the 1024 bytes limit persist for the index {p : 1, key: 1}. I got to say, I don't expect the {k1: ..., k2 : ...} to go far beyond 1k. I'm aware of the 16b limit per document.
Maybe there is a principled way to have collections unique on a field which values would let the index entries go over 1k, but I couldn't find it. The mongo documentation of upsert says "To avoid multiple upserts, ensure that the filter fields are uniquely indexed."
In contrast, it seems there is no official restriction on the length of the fields names, and field assignment should, as any other document update, unique.
EDIT 2: Are arrays and documents more powerful than Collections?
(Edited after comment by #hd)
Since I haven't found a way to add an arbitrary document to a Collection, preserving uniqueness, we could argue that Documents and Arrays are more powerful, uniqueness-wise, than Collections. Documents field names are unique, and Arrays at least support $addToSet, that would be enough if I only had the keys k1, k2 but not mutable x.
Related
I have documents with an array of events objects :
{
"events": [
{
"name": "A"
},
{
"name": "C"
},
{
"name": "D"
},
{
"name": "B"
},
{
"name": "E"
}
]
},
{
"events": [
{
"name": "A"
},
{
"name": "B"
},
{
"name": "S"
},
{
"name": "C"
}
]
}
]
In this array, I want to count the number of events that are in a said order, with intervening events. For example, I look for the order [A,B,C], with the array [A,x,x,B,x], I should count 2, with [A,B,x,x,C] I should have 3. (x is just a placeholder for anything else)
I want to summarize this information for all my documents in the shape of an array, with the number of matches for each element. With the previous example that would give me [2,2,1], 2 matches for A, 2 matches for B, 1 match for C.
My Current aggregation is generated in javascript and follow this pattern :
Match documents with the event array containing A
Slice the array from A to the end of the array
Count the number of documents
Append the count of matching document to the summarizing array
Match documents with the event array containing B
Slice the array from B to the end of the array
Count the number of documents
etc
However, when an event does not appear in any of the arrays, it falls shorts, as there are no documents, I do not have a way to store the summarizing array. For example, with the events array [A,x,x,B,x] [A,B,x,x,C] and trying to match [A,B,D], I would expect [2,2,0], but I have [] as when trying to match D nothing comes up, and the aggregation cannot continue.
Here is the aggregation I'm working with : https://mongoplayground.net/p/rEdQD4FbyC4
change the matching letter l.75 to something not in the array to have the problematic behavior.
So is there a way to not lose my data when there is no match? like bypassing aggregation stages, I could not find anything related to bypassing stages in the mongoDB documentation.
Or are you aware of another way of doing this?
We ended using a reduce to solve our problem
The reduce is on the events array and with every event we try to match it with the element in sequence at the position "size of the accumulator", if it is a match it's added to the accumulator, ortherwise no, etc
here is the mongo playground : https://mongoplayground.net/p/ge4nlFWuLsZ\
The sequence we want to match is in the field "sequence"
The matched elements are in the "matching" field
I'm trying to understand arrays in Firebase a bit more. Currently, I'm storing maps in arrays, where one of the fields inside the map is a position that I can use in my mobile app to sort the array with on retrieval and show results in the order of position.
The docs on Firebase say:
Arrays are sorted by elements. If elements are equal, the arrays are sorted by length.
For example, [1, 2, 3] < [1, 2, 3, 1] < [2].
And then there's a section describing how maps are sorted as well:
Key ordering is always sorted. For example, if you write {c: "foo", a: "bar", b: "qux"} the map is sorted by key and saved as {a: "foo", b: "bar", c: "qux"}.
Map fields are sorted by key and compared by key-value pairs, first comparing the keys and then the values. If the first key-value pairs are equal, the next key-value pairs are compared, and so on. If two maps start with the same key-value pairs, then map length is considered. For example, the following maps are in ascending order:
{a: "aaa", b: "baz"}
{a: "foo", b: "bar"}
{a: "foo", b: "bar", c: "qux"}
{a: "foo", b: "baz"}
{b: "aaa", c: "baz"}
{c: "aaa"}
But then I tried this in Firestore: I jumbled up the order of the maps in the above example, and stored them in an array:
data= [{"c": "aaa"}, {"a": "aaa", "b": "baz"}, {"a": "foo", "b": "baz"}, {"b": "aaa", "c": "baz"}, {"a": "foo", "b": "bar", "c": "qux"}, {"a": "foo", "b": "bar"}]
And upon inserting into a Firestore document, the array did not get sorted! While the keys themselves do get sorted within a single Map, the elements in the array stay in the same order.
So does sorting in arrays even work when elements are Maps? Here's an example of what I'm storing in Firestore:
{
"car_collection": {
"models": {
data: [
{
"model": "Honda",
"color": "black",
"position": 0
},
{
"model": "Hyundai",
"color": "red",
"position": 1
},
{
"model": "Chevrolet",
"color": "yellow"
"position": 2
}
]
}
}
}
I'm storing an additional field called "position", and the order of maps stays the same on every retrieval. Wondering if I even need to store this field, or data will be sorted in the order that I store it in.
Submitted a ticket to Google to improve the documentation for Array type, and I think it's helpful and accurate as seen through some smoke testing.
https://firebase.google.com/docs/firestore/manage-data/data-types
Copy-pasting the current version here:
An array cannot contain another array value as one of its elements.
Within an array, elements maintain the position assigned to them. When sorting two or more arrays, arrays are ordered based on their element values.
When comparing two arrays, the first elements of each array are compared. If the first elements are equal, then the second elements are compared and so on until a difference is found. If an array runs out of elements to compare but is equal up to that point, then the shorter array is ordered before the longer array.
For example, [1, 2, 3] < [1, 2, 3, 1] < [2]. The array [2] has the greatest first element value. The array [1, 2, 3] has elements equal to the first three elements of [1, 2, 3, 1] but is shorter in length.
So it seems you can safely expect the order of elements to be maintained in Firestore, while understanding the effects of addition/removal as well.
You will have to sort your array before posting it to Firestore.
Arrays are not sorted in RTD nor Firestore objects however are sorted by it's keys.
Or sort the arrays on the client side.
Say I have a doc of the format:
{
arr: [{id: 0}, {id: 1}, ...., {id: m-1}, {id: m}],
n: number
}
so an array of objects and an n property. I want to get the nth element of the array (arr[n]).
each object in the array also has an id property that correlates to it's index so another option is to query the array for the element with id=n.
I did some research on how to get the Nth item of an array using $slice, as well as on $elemMatch.
I couldn't figure out how can I write a query that returns the Nth element of the array, when I don't know the N value, and must get it from the doc itself during the same query.
I could get the entire array, but it can get very large (even 100K+ elements) and so I'd much rather get the one I need, either in the query or the projection part of the find.
Any ideas?
Thanks,
Sefi
Figured it out :)
Turns out the way to do it is by using aggregate and not find, and defining query vars using $let:
db.getCollection('<collection>').aggregate([
{$match: {key: '<key>'}},
{$project: {
obj: {
$let: {
vars: {
idx: '$n',
objects: '$arr'
},
in: {$arrayElemAt: ['$$objects', '$$idx']}
}
}
}}
])
I have a keyword array field (say f) and I want to filter documents with an exact array (e.g. filter docs with f = [1, 3, 6] exactly, same order and number of terms).
What is the best way of doing this?
Regards
One way to achieve this is to add a script to the query which would also check the number of elements in the array.
it script would be something like
"filters": [
{
"script": {
"script": "doc['f'].values.length == 3"
}
},
{
"terms": {
"f": [
1,
3,
6
],
"execution": "and"
}
}
]
Hope you get the idea.
I think an even better idea would be to store the array as a string (if there are not many changes to the structure of the graph) and matching the string directly. This would be much faster too.
In my Solr index I have two different types of items, A and B, in my index that have distinct fields foo and bar, respectively, but similar values that I need to group together for faceting.
A:
foo: /* could be "abc", "def" or "ghi" */
B:
bar: /* could be "abc", "ghi", or "jkl" */
It's easy enough to get the facet information for each of these fields separately:
http://myServer:<port>/<SolrPath>/q=<query>&facet.field=foo&facet.field=bar
Which gives me:
"facet_count": {
"facet_fields": {
"foo": ["abc", 10, "def", 20 "ghi", 30],
"bar": ["abc", 3, "ghi", 8, "jkl", 1]
}
}
Is there a way in Solr to specify that I want the fields A.foo and B.bar to be "lumped together" into the same facet? In other words, I need to make the facet information in the response looks like this:
"facet_count": {
"facet_fields": {
"foo": ["abc", 13, "def", 20 "ghi", 38, "jkl", 1]
}
}
No, my advice would be to index the values into a single field. Using copy field directives this would look like this (in schema.xml):
<copyField source="foo" dest="foobar" />
<copyField source="bar" dest="foobar" />
The would preserve the original foo and bar fields. To get your combined facets, simply facet on the new field:
?q=*:*
&facet=true
&facet.field=foobar
Edit: it might be possible with facet queries, but only if the list of unique values is small and limited, and you're willing to write a separate facet query for each value. Even then, the results will look different (count per query instead an array of field value, count).