One-To-Many Relationship MongoDB (WITH MILLIONS OF COMMENTS EMBEDED)

One-To-Many Relationship MongoDB (WITH MILLIONS OF COMMENTS EMBEDED) - database

I am new to MongoDB, coming from a relational database background. I have designed a post structure with many comments, but I don't know how to load them. A record is given below from that collection:
{
_id: ObjectId("63173b1411db4b2f8e32f3cf"),
title: "How to load data in mongoDB",
comments: [
{
userId: ObjectId("63173b1411db4b2f8e32fcfb"),
comment: "Thanks",
},
{
userId: ObjectId("63173b1411db4b2f8e323fcb"),
comment: "Nice Post",
},
...
]
}
Now when there are hundreds of millions of comments, then how should I load them, because if I load them at once it takes a lot of time and space.
What can be the optimal solution for this?

Related

How to properly design a database, a single object vs multiple with relations

I am working on a project that will rely kinda heavily on a database, I've done several projects by now, but most of them barely had any significant database need, that would require me to put significant thought in it, but that time has come. I don't know much about databases, and I am not looking for someone to do it instead of me, but I am just looking for pointers and what to study/read in order to figure out how to handle the situation.
What the app will do is keep track of the items in stock (in the inventory) in a company, related to the products that it manufactures. I will have to be able to pull various kind of information, wether its all invoices, or all invoices related to a given product, or all items related to a given product, or all items for all products, and so on, you get the idea.
The basic structure that I have for now looks like this:
const inventory = {
products: [
{
title: "",
production: {
finalized: 0,
inProduction: 0,
},
items: [
{
title: "",
price: {
per: 0,
total: 0,
},
quantity: {
current: 0,
type: "",
},
operations: {
added: [
{
date: new Date(),
quantity: 0,
invoice: {
number: "",
supplier: "",
},
},
],
removed: [
{
date: new Date(),
quantity: 0,
expenseId: "",
},
],
},
},
],
},
],
};
It is not complex at all, an array of products, and information about these products. Quantities, when a new quantity was added, when a quantity was taken, and so on.
The question is, do I make this with just 1 model for the whole inventory, which I am guessing will work, but will probably not be optimal.
Another approach that comes to mind is to make models for the items, invoices, products and so on, and then just have some logic that relates all of that within the inventory to get the structure that I have above. That performance wise will probably be better, because if I want to pull all invoices, I will just have to pull the invoices model instead of the whole inventory and loop through it to find my invoices.
What other options do I have?
I forgot to add that I will be using Mongoose, Express, React and Node.

How to model this NoSQL data structure in Firestore (Review my first approach)

I am a fairly new web developer and would need your help with a project I am currently working on. I have worked in the past on a very simple realtime database example and have little to none experience in firestore or NoSql in general.
I want to create a system which allows end-users to get an email once a week that contains a list of special offers from bars the end-user has subscribed to. The offers change each day of the week. Bar owners can fill out a form in a vue.js web application every week with their weekly special offers.
Every Monday morning a cron job has to look up which end user has subscribed to which bars and then aggregate the data and send it via email.
The question is how would you structure the data so that I can easily compose the email and send it via a cloud function?
My approach would be to have three main collections: RestaurantOwner, EndUser, SpecialOfferings
Please see the graphic for an example process:
BarOwner and EndUser are pretty straight forward. However, the difficult part is how to structure the SpecialOffers in order to be queried the right way.
My idea would be to structure it based on the calendar week and link it to the uid from the barOwner:
specialOffers: {
2019_CW27: {
barUID001: {
mon: {
title: 'Banana Daiquir',
price: 4.99,
},
tue: {
title: 'After Five',
price: 2.99,
},
wed: {
title: 'Cool Colada',
price: 6.99
},
thu: {
title: 'Crantini',
price: 5.99
},
fri: {
title: 'French Martini',
price: 4.99
}
},
barUID002: {
mon: {
title: 'Gin & Tonic',
price: 8.99,
},
tue: {
title: 'Cratini',
price: 4.99,
},
wed: {
title: 'French Martini',
price: 4.99
},
thu: {
title: 'After Five',
price: 3.99
},
fri: {
title: 'Cool Colada',
price: 6.99
}
}
},
2019_CW28: {
barUID01: {~~~},
barUID02: {~~~}
}
}
The disadvantage of this approach is that it creates a deeply nested object when you imagine that there are 52 calendar weeks, f.e 100 signed up bars à 5 special offers per week and I am not sure if I am able to query it the way I need to.
Is this approach reasonable or what would you do differently?
Thank you so much for your help! I highly appreciate it.

I'm assuming the following scenarios:
1) The bar owners make modifications to their offers very often.
2) The bar owners should be the only ones allowed to modify each bar's offers.
If you have these two scenarios, I would recommend a sub-collections approach here.
When to use sub-collections:
1) When there are lot of fields in a document. Cloud Firestore has 20,000 field limit. (If the number of Bars can exceed more than 20,000 fields)
2) When updating the parent collection is a common operation. Firestore only lets you update the document at rate of 1 write/second. (If the SpecialOffers information of each bar is modified very often. If two bar owners modify their offers, only 1 write is successful and the second write operation waits until the first is completed. This can delay the updation offers particularly at the end of a week when almost all the bars update the offers.)
3) When you want to limit the access to particular fields of a document. (If you want to restrict the access to a Bar's Offers to the barOwner alone. You can restrict the access to each document in the Bars sub-collection according to its owner using Firestore Security Rules)
So I would recommend a sub-collection Bars under the main collection SpecialOffers. This way the design becomes scalable and you can add restaurants and super-markets as other similar sub-collections in the future without heavily altering your design.
Another advantage is that sub-collections are basically collections and they don't have a limit for number of documents it can hold. So even if the number of bars registered is above 20,000 which is the limit of number of fields for a fire-store document, your sub-collection wont be having a problem but your document will run out of fields to save the offers for a new bar.
Ultimately the choice depends on your use cases.
Hope this helps.

Where to store a reference to other data models (in mongoDB) for best performance

In my project I have users and circles. Circles can have multiple users and a user can be in multiple circles. Lastly there are events. Each event can have multiple users in one circle. Later, events will get a lot of content, so there will be a lot of stuff to load (images, comments, etc.).
I was thinking that these would be a good data models:
User = {
_id: "uuid",
name: "string",
password: "string",
circles: [Circle._id],
}
Event = {
_id: "uuid",
name: "string",
location: "string",
circles:Circle._id,
participants: [User._id],
}
Circle = {
_id: "uuid",
name: "string"
}
Once the user logs in, he/she selects one of his circles, users and events in that circle will be displayed.
An API with these data models (I think) would mean to get the users and events from one circle, the database has to search through all users and events and check check if they are in that circle. With a lot of users and events, I think this might not be the most efficient way?
So I was thinking of putting the user and events into arrays of the circle like this:
User = {
_id: "uuid",
name: "string",
password: "string",
}
Event = {
_id: "uuid",
name: "string",
location: "string",
participants: [User._id],
}
Circle = {
_id: "uuid",
name: "string",
users:[User._id],
events:[Event._id]
}
Now, when the user selects the circle, the circle loads slower, because the users and events have to be loaded first. But I was thinking, that searching for users and events would now be faster. Is this the correct approach/thinking? Would it make sense to keep a reference to the specific circle ids in the User and Event data model?

If you want to use mongoDb to its full strength, I strongly recommend denormalising your data.
If you normalize your data, you might have to use $lookup to club multiple collections. Even if you save up on your harddisk, you will end up with relatively heavier computation.
Assuming that an application generally has 90% of hits as reads and 10% as writes, it makes sense to model your data in read friendly way. Hence highly denormalize your data untill its really necessary to create references to other collection. Optimizations can be later achieved by indexing and caching, but give below scema a thought.
User = {
_id: "uuid",
name: "string",
password: "string",
circles: ["circle1","circle2"],
events : ["event1","event2"]
}
Event = {
_id: "uuid",
name: "string",
location: "string"
}
Circle = {
_id: "uuid",
name: "string"
}
Try and know your queries beforehand, archiving most of your data in User collection. circles and events field in User collection can also be an array of objects [{},{}] if there are more properties to be stored.
I am certain that the more collections you club, the more complicated your queries will get and the computation will also be more.
I wont recommend storing userId's in circle or event collections as users may grow over time and you dont want to endup with a collection that has a document with one field storing thousands of array elements. On the contrary a user can be a part of 100's of circles and events, and if we store this data in User collection, it becomes quite easy to query and manage.
Long story short : Do not treat a nosql db as a relational db. It will never fit in. Model your database keeping your future queries in mind. Highly denormalize your data to make your read simpler i.e avoid references.

Efficient way to store frequently requested key-value data with relations?

Let's say I'm building Twitter.
One of the tasks is to track, which tweets are read by particular user and store this data on server. When user requests somebody's feed, server should return:
[
{
id: 1,
tweet: "Hey there!",
isRead: false
},
{
id: 2,
tweet: "Here's my cat, look",
isRead: true
},
{
id: 2,
tweet: "Blue or yellow? Thats the question",
isRead: true
},
...
]
Which is the most efficient way to store data for which tweets are read by which user, and retrieving this data when returning somebody's feed for particular user?
Any ideas about data storing architecture are highly appreciated. My current stack is PostgreSQL for storing users and "tweets". Redis, MongoDB and neo4j are also used in the project, so available.
The first guess was to use Redis, like:
user_id: tweet_id
-----------------
user_id: tweet_id
-----------------
....
But I think, there may be better variants, more suitable for persistent data storage.
Thank you in advance.

Have a look at this Twitter-clone that Redis' author, antirez (a.k.a Salvatore Sanfilippo), had made: http://redis.io/topics/twitter-clone

How to query in the nested array.(using pymongo)

I'm new bee in mongodb.
I made a nested array document like this.
data = {
"title": "mongo community",
"description": "I am a new bee",
"topics": [{
"title": "how to find object in array",
"comments": [{
"description": "desc1"
}]
},
{
"title": "the case to use ensureIndex",
"comments": [{
"description": "before query"
},
{
"description": "If you want"
}
]
}
]
}
after that, put it in the "community"
db.community.insert(data)
so,I would like to accumulate "comments" which topics title is "how to find object in array"
then I tried,
data = db.community.find_one({"title":"mongo community","topics.title":"how to find object in array" } )
the result is
>>> print data
{
u 'topics': [{
u 'comments': [{
u 'description': u 'desc1'
}],
u 'title': u 'how to find object in array'
},
{
u 'comments': [{
u 'description': u 'before query'
},
{
u 'description': u 'If you want'
}],
u 'title': u 'the case to use ensureIndex'
}],
u '_id': ObjectId('4e6ce188d4baa71250000002'),
u 'description': u 'I am a new bee',
u 'title': u 'mongo community'
}
I don't need the topics "the case to use ensureIndex"
Whould you give me any advice.
thx.

It looks like you're embedding topics as an array all in a single document. You should try to avoid returning partial documents frequently from MongoDB. You can do it with the "fields" argument of the find method, but it isn't very easy to work with if you're doing it frequently.
So to solve this you could try to make each topic a separate document. I think that would be easier for you too. If you want to save information about the "community" for forum, put it in a separate collection. For example, you could use the following in the monbodb shell:
// ad a forum:
var forum = {
title:"mongo community",
description:"I am a new bee"
};
db.forums.save(forum);
// add first topic:
var topic = {
title: "how to find object in array",
comments: [ {description:"desc1"} ],
forum:"mongo community"
};
db.topics.save(topic);
// add second topic:
var topic = {
title: "the case to use ensureIndex",
comments: [
{description:"before query"},
{description:"If you want"}
],
forum:"mongo community"
};
db.topics.save(topic);
print("All topics:");
printjson(db.topics.find().toArray());
print("just the 'how to find object in array' topic:")
printjson(db.topics.find({title:"how to find object in array"}).toArray());
Also, see the document Trees In MongoDB about schema design in MongoDB. It happens to be using a similar schema to what you are working with and expands on it for more advanced use cases.

MongoDB operates on documents, that is, the top level documents (the things you save, update, insert, find, and find_one on). Mongo's query language lets you search within embedded objects, but will always return, update, or manipulate one (or more) of these top-level documents.
MongoDB is often called "schema-less," but something more like "(has) flexible schemas" or "(has) per-document schemas" would be a more accurate description. This is a case where your schema design -- having topics embedded directly within a community -- is not working for this particular query. However there are probably other queries that this schema supports more efficiently, like listing the topics within a community in a single query. You might want to consider the queries you want to make and re-design your schema accordingly.
A few notes on MongoDB limitations:
top-level documents are always returned (optionally with only a subset of fields, as #scott noted -- see the mongodb docs on this topic)
each document is limited to 16 megabytes of data (as of version 1.8+), so this schema will not work well if the communities have a long list of topics
For help with schema design, see the mongodb docs on schema design, Kyle Banker's video "Schema Design Basics", and Eliot Horowitz's video "Schema Design at Scale" for an introduction, tips, and considerations.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight