Efficient way to store frequently requested key-value data with relations? - database

Let's say I'm building Twitter.
One of the tasks is to track, which tweets are read by particular user and store this data on server. When user requests somebody's feed, server should return:
[
{
id: 1,
tweet: "Hey there!",
isRead: false
},
{
id: 2,
tweet: "Here's my cat, look",
isRead: true
},
{
id: 2,
tweet: "Blue or yellow? Thats the question",
isRead: true
},
...
]
Which is the most efficient way to store data for which tweets are read by which user, and retrieving this data when returning somebody's feed for particular user?
Any ideas about data storing architecture are highly appreciated. My current stack is PostgreSQL for storing users and "tweets". Redis, MongoDB and neo4j are also used in the project, so available.
The first guess was to use Redis, like:
user_id: tweet_id
-----------------
user_id: tweet_id
-----------------
....
But I think, there may be better variants, more suitable for persistent data storage.
Thank you in advance.

Have a look at this Twitter-clone that Redis' author, antirez (a.k.a Salvatore Sanfilippo), had made: http://redis.io/topics/twitter-clone

Related

One-To-Many Relationship MongoDB (WITH MILLIONS OF COMMENTS EMBEDED)

I am new to MongoDB, coming from a relational database background. I have designed a post structure with many comments, but I don't know how to load them. A record is given below from that collection:
{
_id: ObjectId("63173b1411db4b2f8e32f3cf"),
title: "How to load data in mongoDB",
comments: [
{
userId: ObjectId("63173b1411db4b2f8e32fcfb"),
comment: "Thanks",
},
{
userId: ObjectId("63173b1411db4b2f8e323fcb"),
comment: "Nice Post",
},
...
]
}
Now when there are hundreds of millions of comments, then how should I load them, because if I load them at once it takes a lot of time and space.
What can be the optimal solution for this?

Is a bucket pattern in MongoDb the best way to handle large unbounded arrays?

I'm implementing social features to a MERN stack app (follow/unfollow users), and trying to come up with a good MongoDB solution for avoiding issues with potentially large unbounded arrays of followers. Specifically I'm hoping to avoid:
MongoDB having to move a large follower array on disk and rebuild indexes as it grows larger
hitting the 16mb bson limit if a user ever hits a very large number of followers (> 1 million)
slow performance when querying/returning followers to display via pagination, or when calculating/displaying follower count
From everything Iv'e researched, it seems like using a bucket pattern approach is the best solution... two good articles I found on this:
https://www.mongodb.com/blog/post/paging-with-the-bucket-pattern--part-1
https://www.mongodb.com/blog/post/paging-with-the-bucket-pattern--part-2
I've started to approach it like this...
Follower model:
const mongoose = require('mongoose');
const Schema = mongoose.Schema;
const FollowerSchema = new Schema({
user: {
type: Schema.Types.ObjectId,
ref: 'user',
},
// creating an array of followers
followers: [
{
user: {
type: Schema.Types.ObjectId,
ref: 'user',
},
datefol: {
type: Date,
default: Date.now,
},
},
],
count: {
type: Number,
},
createdate: {
type: Date,
default: Date.now,
required: true,
},
});
module.exports = Follower = mongoose.model('follower', FollowerSchema);
Upsert in Node.js api to add a follower to an array bucket (each bucket will contain 100 followers):
const follow = await Follower.updateOne(
{ user: req.params.id, count: { $lt: 100 } },
{
$push: {
followers: {
user: req.user.id,
datefol: Date.now(),
},
},
$inc: { count: 1 },
$setOnInsert: { user: req.params.id, createdate: Date.now() },
},
{ upsert: true }
);
Basically every time a follower is added, this will add them to the first bucket found that contains less than 100 followers (tracked by the count).
Is this the best approach for handling potentially large arrays? My concerns are:
if someone unfollows a user and the app runs a $pull to remove the follower from the array in one of the buckets... multiple buckets could then contain less than 100 followers. New followers will no longer be added to the most recent bucket so later when querying and trying to return followers based on most recent by bucket createdate... some of the newest followers might be in an older bucket and not returned correctly. The articles above mention some expressive update instructions introduced in MongoDb 4.2 that solve this problem, but it's not really clear to me how.
if I corrected for that by returning all follower buckets for a user and sorting by follow date... it seems like that could become very slow if someone had tons of followers
if I want to be able to paginate and return 100 followers per page, starting with the latest, how would that work with this approach? Should I add a pagenumber entry to the model and somehow have it be incremented each time a bucket is created (first bucket contains pagenumber 1, next pagnumber 2 etc), then on the front end if a user jumps to follower page 500 a query runs to pull bucket 500?
The bucket pattern is not the perfect match on the case you expose.
The pattern best fits your needs is the outlier pattern https://www.mongodb.com/blog/post/building-with-patterns-the-outlier-pattern
Your case is practically the same as the example on this article.

How to model this NoSQL data structure in Firestore (Review my first approach)

I am a fairly new web developer and would need your help with a project I am currently working on. I have worked in the past on a very simple realtime database example and have little to none experience in firestore or NoSql in general.
I want to create a system which allows end-users to get an email once a week that contains a list of special offers from bars the end-user has subscribed to. The offers change each day of the week. Bar owners can fill out a form in a vue.js web application every week with their weekly special offers.
Every Monday morning a cron job has to look up which end user has subscribed to which bars and then aggregate the data and send it via email.
The question is how would you structure the data so that I can easily compose the email and send it via a cloud function?
My approach would be to have three main collections: RestaurantOwner, EndUser, SpecialOfferings
Please see the graphic for an example process:
BarOwner and EndUser are pretty straight forward. However, the difficult part is how to structure the SpecialOffers in order to be queried the right way.
My idea would be to structure it based on the calendar week and link it to the uid from the barOwner:
specialOffers: {
2019_CW27: {
barUID001: {
mon: {
title: 'Banana Daiquir',
price: 4.99,
},
tue: {
title: 'After Five',
price: 2.99,
},
wed: {
title: 'Cool Colada',
price: 6.99
},
thu: {
title: 'Crantini',
price: 5.99
},
fri: {
title: 'French Martini',
price: 4.99
}
},
barUID002: {
mon: {
title: 'Gin & Tonic',
price: 8.99,
},
tue: {
title: 'Cratini',
price: 4.99,
},
wed: {
title: 'French Martini',
price: 4.99
},
thu: {
title: 'After Five',
price: 3.99
},
fri: {
title: 'Cool Colada',
price: 6.99
}
}
},
2019_CW28: {
barUID01: {~~~},
barUID02: {~~~}
}
}
The disadvantage of this approach is that it creates a deeply nested object when you imagine that there are 52 calendar weeks, f.e 100 signed up bars à 5 special offers per week and I am not sure if I am able to query it the way I need to.
Is this approach reasonable or what would you do differently?
Thank you so much for your help! I highly appreciate it.
I'm assuming the following scenarios:
1) The bar owners make modifications to their offers very often.
2) The bar owners should be the only ones allowed to modify each bar's offers.
If you have these two scenarios, I would recommend a sub-collections approach here.
When to use sub-collections:
1) When there are lot of fields in a document. Cloud Firestore has 20,000 field limit. (If the number of Bars can exceed more than 20,000 fields)
2) When updating the parent collection is a common operation. Firestore only lets you update the document at rate of 1 write/second. (If the SpecialOffers information of each bar is modified very often. If two bar owners modify their offers, only 1 write is successful and the second write operation waits until the first is completed. This can delay the updation offers particularly at the end of a week when almost all the bars update the offers.)
3) When you want to limit the access to particular fields of a document. (If you want to restrict the access to a Bar's Offers to the barOwner alone. You can restrict the access to each document in the Bars sub-collection according to its owner using Firestore Security Rules)
So I would recommend a sub-collection Bars under the main collection SpecialOffers. This way the design becomes scalable and you can add restaurants and super-markets as other similar sub-collections in the future without heavily altering your design.
Another advantage is that sub-collections are basically collections and they don't have a limit for number of documents it can hold. So even if the number of bars registered is above 20,000 which is the limit of number of fields for a fire-store document, your sub-collection wont be having a problem but your document will run out of fields to save the offers for a new bar.
Ultimately the choice depends on your use cases.
Hope this helps.

Where to store a reference to other data models (in mongoDB) for best performance

In my project I have users and circles. Circles can have multiple users and a user can be in multiple circles. Lastly there are events. Each event can have multiple users in one circle. Later, events will get a lot of content, so there will be a lot of stuff to load (images, comments, etc.).
I was thinking that these would be a good data models:
User = {
_id: "uuid",
name: "string",
password: "string",
circles: [Circle._id],
}
Event = {
_id: "uuid",
name: "string",
location: "string",
circles:Circle._id,
participants: [User._id],
}
Circle = {
_id: "uuid",
name: "string"
}
Once the user logs in, he/she selects one of his circles, users and events in that circle will be displayed.
An API with these data models (I think) would mean to get the users and events from one circle, the database has to search through all users and events and check check if they are in that circle. With a lot of users and events, I think this might not be the most efficient way?
So I was thinking of putting the user and events into arrays of the circle like this:
User = {
_id: "uuid",
name: "string",
password: "string",
}
Event = {
_id: "uuid",
name: "string",
location: "string",
participants: [User._id],
}
Circle = {
_id: "uuid",
name: "string",
users:[User._id],
events:[Event._id]
}
Now, when the user selects the circle, the circle loads slower, because the users and events have to be loaded first. But I was thinking, that searching for users and events would now be faster. Is this the correct approach/thinking? Would it make sense to keep a reference to the specific circle ids in the User and Event data model?
If you want to use mongoDb to its full strength, I strongly recommend denormalising your data.
If you normalize your data, you might have to use $lookup to club multiple collections. Even if you save up on your harddisk, you will end up with relatively heavier computation.
Assuming that an application generally has 90% of hits as reads and 10% as writes, it makes sense to model your data in read friendly way. Hence highly denormalize your data untill its really necessary to create references to other collection. Optimizations can be later achieved by indexing and caching, but give below scema a thought.
User = {
_id: "uuid",
name: "string",
password: "string",
circles: ["circle1","circle2"],
events : ["event1","event2"]
}
Event = {
_id: "uuid",
name: "string",
location: "string"
}
Circle = {
_id: "uuid",
name: "string"
}
Try and know your queries beforehand, archiving most of your data in User collection. circles and events field in User collection can also be an array of objects [{},{}] if there are more properties to be stored.
I am certain that the more collections you club, the more complicated your queries will get and the computation will also be more.
I wont recommend storing userId's in circle or event collections as users may grow over time and you dont want to endup with a collection that has a document with one field storing thousands of array elements. On the contrary a user can be a part of 100's of circles and events, and if we store this data in User collection, it becomes quite easy to query and manage.
Long story short : Do not treat a nosql db as a relational db. It will never fit in. Model your database keeping your future queries in mind. Highly denormalize your data to make your read simpler i.e avoid references.

MongoDB: Query and retrieve objects inside embedded array?

Let's say I have the following document schema in a collection called 'users':
{
name: 'John',
items: [ {}, {}, {}, ... ]
}
The 'items' array contains objects in the following format:
{
item_id: "1234",
name: "some item"
}
Each user can have multiple items embedded in the 'items' array.
Now, I want to be able to fetch an item by an item_id for a given user.
For example, I want to get the item with id "1234" that belong to the user with name "John".
Can I do this with mongoDB? I'd like to utilize its powerful array indexing, but I'm not sure if you can run queries on embedded arrays and return objects from the array instead of the document that contains it.
I know I can fetch users that have a certain item using {users.items.item_id: "1234"}. But I want to fetch the actual item from the array, not the user.
Alternatively, is there maybe a better way to organize this data so that I can easily get what I want? I'm still fairly new to mongodb.
Thanks for any help or advice you can provide.
The question is old, but the response has changed since the time. With MongoDB >= 2.2, you can do :
db.users.find( { name: "John"}, { items: { $elemMatch: { item_id: "1234" } } })
You will have :
{
name: "John",
items:
[
{
item_id: "1234",
name: "some item"
}
]
}
See Documentation of $elemMatch
There are a couple of things to note about this:
1) I find that the hardest thing for folks learning MongoDB is UN-learning the relational thinking that they're used to. Your data model looks to be the right one.
2) Normally, what you do with MongoDB is return the entire document into the client program, and then search for the portion of the document that you want on the client side using your client programming language.
In your example, you'd fetch the entire 'user' document and then iterate through the 'items[]' array on the client side.
3) If you want to return just the 'items[]' array, you can do so by using the 'Field Selection' syntax. See http://www.mongodb.org/display/DOCS/Querying#Querying-FieldSelection for details. Unfortunately, it will return the entire 'items[]' array, and not just one element of the array.
4) There is an existing Jira ticket to add this functionality: it is https://jira.mongodb.org/browse/SERVER-828 SERVER-828. It looks like it's been added to the latest 2.1 (development) branch: that means it will be available for production use when release 2.2 ships.
If this is an embedded array, then you can't retrieve its elements directly. The retrieved document will have form of a user (root document), although not all fields may be filled (depending on your query).
If you want to retrieve just that element, then you have to store it as a separate document in a separate collection. It will have one additional field, user_id (can be part of _id). Then it's trivial to do what you want.
A sample document might look like this:
{
_id: {user_id: ObjectId, item_id: "1234"},
name: "some item"
}
Note that this structure ensures uniqueness of item_id per user (I'm not sure you want this or not).

Resources