I have collections named Book and Author. A book can be written by a few authors. So the collections look like this:
Book: {
id: objectId,
name: string,
price: number,
authors: [objectId]
}
Author: {
id: objectId,
nameL string
}
The author's field of Book collection is an array of author objectId.
When I want to get some books and their authors, I will get the author objectId array. Then, for each objectId element, I will get the author.
Another way to do this is using "populate". But I don't know is this way like "join" in a relational database? Does it have better performance than the first way?
I assume you are going to have some book data with each authors. Instead of using populate (because populate is in mongoose), I have using $lookup in mongodb and it's more like join in relational database also. Here are the documentation
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#use-lookup-with-an-array
Related
I am really confused about
' How document of other collection is stored in a document?'.
Does ref just keep the address of the document and populate on demand in O(1) or it stores like a relational DB and search throughout the table to get the relevant document?
e.g We have two collections, User and Posts.
User {
_id: mongoId of User,
name: String,
post: reference of post id
}
Post {
_id: mongoId of Post,
title: String,
body: String
}
Now, Users stores Post in form of a reference. While fetching the document which is posted by a particular user, will it go through all the documents in post and fetch the one which is relevant to us or it just store the direct reference to that document and fetch in O(1).
What I want to do is avoid data duplication (embedded), but also avoid fields relations by IDs.
Example:
{ Books: [
{ id: 1, name: "Foo1", author: referenceToTheAuthorWithId1 },
{ id: 2, name: "Foo2", author: referenceToTheAuthorWithId1 }
]}
{ Authors: [
{id: 1, name: "Bar"}
]}
So each time I want to get a book, it will also retrieve its author, but that author object would be a reference to the Author collection.
You can manually set _id fields for all of the documents you insert, and you can have identical ids in documents in various collections that refer to some common data.
In your example, if you replaced Authors with Authorships it would make some sense to have authorship id be the same as book id (at least this schema would support an author writing multiple books).
However, such a schema does not support multiple authors per book.
Need help,
Let's say I have a model of 2 tables with relation of many to many . teachers and students.
I thought to save a document per student. to be able to search by student name but to be able to filter also by the teacher.
so I thought to create the document like that:
{
id: string
studentName: string,
class: string
teachers: [teacherId1: string, teacherId2: string...]
...
}
but what should happens when I remove the teacher from some class, now I need to update all the class student's documents (and I have thousands, how much time it will take (my document itself is not huge)).
Is there an easy way to that? instead of updating a document one by one.
(let's say I have all the list of studentIds)
maybe my document model structure is not correct.
Is there any other good idea.
Thanks,
I'm wondering in terms of database design what is the best approach between storing reference id, or embedded document even if it's means that multiple document can appears more than once.
Let's say I have that kind of model for the moment :
Collection User :
{
name: String,
types : List<Type>
sharedTypes: List<Type>
}
If I use the embedded model and don't use another collection it may result in duplicate object Type. For example, user A create Type aa and user B create Type bb. When they share each other they type it will result in :
{
name: UserA,
types : [{name: aa}]
sharedTypes: [{name:bb}]
},
{
name: UserB,
types : [{name: bb}]
sharedTypes: [{name:aa}]
}
Which results in duplication, so I guess it's pretty bad design. Should I use another approach like creating collection Type and store referenceId ?
Collection Type :
{
id: String
name: String
}
Which will still result in duplication but not one whole document, I guess it's better.
{
name: UserA,
types : ["randomString1"]
sharedTypes: ["randomString2"]
},
{
name: UserA,
types : ["randomString2"]
sharedTypes: ["randomString1"]
}
And the last one approach and maybe the best is to store from the collection types like this.
Collection User :
{
id: String
name: String
}
Collection Type :
{
id: String
name: String,
createdBy: String (id of user),
sharedWith: List<String> (ids of user)
}
What is the best approach between this 3.
I'm doing query like, I got one group of user, so for each user, I want the type created and the type people shared with me.
Broadly, the decision to embed vs. use a reference ID comes down to this:
Do you need to easily preserve the referential integrity of the joined data at point in time, meaning you want to ensure that the state of the joined data is "permanently associated" with the parent data? Then embedding is a good idea. This is also a good practice in the "insert only" design paradigm. Very often other requirements like immutability, hashing/checksum, security, and archiving make the embedded approach easier to manage in the long run because version / createDate management is vastly simplified.
Do you need the fastest, most quick-hit scalability? Then embed and ensure indexes are appropriately constructed. An indexed lookup followed by the extraction of a rich shape with arbitrarily complex embedded data is a very high performance operation.
(Opposite) Do you want to ensure that updates to joined data are quickly and immediately reflected in a join with parents? Then use a reference ID and the $lookup function to bring the data together.
Does the joined data grow essentially without bound, like transactions against an account? This is likely better handled through a reference ID to a separate transaction collection and joined with $lookup.
Recently i have designed a database model or ERD using Hackalode.
So the problem I'm currently facing is that base on my current design, i can't query it correctly as I wanted. I studied ERD with MYSQL and do know that Mongo doesn't work the same
The idea was simple, I want a recipe that has a array list of ingredients, and the ingredients are from separate collection.
The recipe also consist of measurement of the ingredient ie. (1 tbps sugar)
Can also query from list of ingredients and find the recipe that contains the ingredients
I wanted this collections to be in Many to Many relationship and the recipe can use the ingredients that are already in the database.
I just don't know how to query the data
I have tried a lot of ways by using $elemMatch and populate and all i get is empty array list as a result.
Im expecting two types of query where i can query by name of ingredients or by the recipe
My expectation result would be like this
[{
id: ...,
name: ....,
description: ...,
macros: [...],
ingredients: [
{
id,
amount: ....,
unit: ....
ingredient: {
id: ....,
name: ....
}
}
}, { ... }]
But instead of getting
[]
Imho, your design is utterly wrong. You over normalized your data. I would do something much simpler and use embedding. The reasoning behind that is that you define your use cases first and then you model your data to answer the question arising from your use cases in the most efficient way.
Assumed use cases
As a user, I want a list of all recipes.
As a user, I want a list of all recipes by ingredient.
As a designer, I want to be able to show a list of all ingredients.
As a user, I want to be able to link to recipes for compound ingredients, should it be present on the site.
Surely, this is just a small excerpt, but it is sufficient for this example.
How to answer the questions
Ok, the first one is extremely simple:
db.recipes.find()[.limit()[.skip()]]
Now, how could we find by ingredient? Simple answer: do a text index on ingredient names (and probably some other fields, as you can only have one text index per collection. Then, the query is equally simple:
db.recipes.find({$text:{$search:"ingredient name"}})
"Hey, wait a moment! How do I get a list of all ingredients?" Let us assume we want a simple list of ingredients, with a number on how often they are actually used:
db.recipes.aggregate([
// We want all ingredients as single values
{$unwind:"$Ingredients"},
// We want the response to be "Ingredient"
{$project:{_id:0,"Ingredient":"$Ingredients.Name"}
// We count the occurrence of each ingredient
// in the recipes
{$group:{_id:"$Ingredient",count:{$sum:1}}}
])
This would actually be sufficient, unless you have a database of gazillions of recipes. In that case, you might want to have a deep look into incremental map/reduce instead of an aggregation. Hint: You should add a timestamp to the recipes to be able to use incremental map/reduce.
If you have a couple of hundred K to a couple of million recipes, you can also add an $out stage to preaggregate your data.
On measurements
Imho, it makes no sense to have defined measurements. There are teaspoons, tablespoons, metric and imperial measurements, groupings like "dozen" or specifications like "clove". Which you really do not want to convert to each other or even set to a limited number of measurements. How many ounces is a clove of garlic? ;)
Bottom line: Make it a free text field, maybe with some autocomplete suggestions.
Revised data model
Recipe
{
_id: new ObjectId(),
Name: "Surf & Turf Kebap",
Ingredients: [
{
Name: "Flunk Steak",
Measurement: "200 g"
},
{
Name: "Prawns",
Measurement: "300g",
Note: "Fresh ones!"
},
{
Name: "Garlic Oil",
Measurement: "1 Tablespoon",
Link: "/recipes/5c2cc4acd98df737db7c5401"
}
]
}
And the example of the text index:
db.recipes.createIndex({Name:"text","Ingredients.Name":"text"})
The theory behind it
A recipe is you basic data structure, as your application is supposed to store and provide them, potentially based on certain criteria. Ingredients and measurements (to the extend where it makes sense) can easily be derived from the recipes. So why bother to store ingredients and measurements independently. It only makes your data model unnecessarily complicated, while not providing any advantage.
hth