I am trying to build a schema for chat application in mongodb. I have 2 types of user models - Producer and Consumer. Producer and Consumer can have conversations with each other. My ultimate goal is to fetch all the conversations for any producer and consumer and show them in a list, just like all the messaging apps (eg Facebook) do.
Here is the schema I have come up with:
Producer: {
_id: 123,
'name': "Sam"
}
Consumer:{
_id: 456,
name: "Mark"
}
Conversation: {
_id: 321,
producerId: 123,
consumerId: 456,
lastMessageId: 1111,
lastMessageDate: 7/7/2018
}
Message: {
_id: 1111,
conversationId: 321,
body: 'Hi'
}
Now I want to fetch say all the consersations of Sam. I want to show them in a list just like Facebook does, grouping them with
each Consumer and sorting according to time.
I think I need to do following queries for this:
1) Get all Conversations where producerId is 123 sorted by lastMessageDate.
I can then show the list of all Conversations.
2) If I want to know all the messages in a conversation, I make query on Message and get all messages where conversationId is 321
Now, here for each new message, I also need to update the conversation with new messageId and date everytime. Is this the right way to proceed and is this optimal considering the number of queries involved. Is there a better way I can proceed with this? Any help would be highly appreciated.
Design:
I wouldn't say it's bad. Depending on the case you've described, it's actually pretty good. Such denormalization of last message date and ID is great, especially if you plan a view with a list of all conversations - you have the last message date in the same query. Maybe go even one step further and add last message text, if it's applicable in this view.
You can read more on pros and cons of denormalization (and schema modeling in general) on the MongoDB blog (parts 1, 2 and 3). It's not that fresh but not outdated.
Also, if such multi-document updates might scary you with some possible inconsistencies, MongoDB v4 got you covered with transactions.
Querying:
On one hand, you can involve multiple queries, and it's not bad at all (especially, when few of them are easily cachable, like the producer or consumer data). On the other hand, you can use aggregations to fetch all these things at once if needed.
Related
I am aiming to build a scalable infrastructure (for fun) for a user posting system where timelines exist, such as a feed system. Feeds would include posts of a users followings.
Anyways, the current approach is this.
UserA follows UserB
Save the follows relationship in the DB
Create a redis list <userBiD-followers> -> [userAid, ...]
UserB creates a post
Save that post into the DB
Push the <postId> into their Redis list <userId-posts> -> [postId1, postId2, postId3, ...]
Now since UserA has a feed redis list <userAid-feed> -> [postId1, postId2, ...]
I essentially iterate over UserB followings, and push UserBs new post into all of their timelines. (fan-out)
Is this the right approach? Anything better I can do?
Another question of mine is, is it much faster to retrieve a redis list of postIds, and pass them into Prisma to get the actual posts than doing queries with just Prisma?
Example below.
const redisUserPosts = redis.lrange(`${userId}-posts`, 0, -1) = ['1', '2', ...]
posts.findMany({
where: {
id: {
in: redisUserPosts
}
}
})
The alternative would be to find by user, do potential other joins, etc, rather than knowing the exact postId which belongs to a user.
I am simply asking for advice on design, this is already implemented.
I have a data collection which contains records in the following format.
{
"_id": 22,
"title": "Hibernate in Action",
"isbn": "193239415X",
"pageCount": 400,
"publishedDate": ISODate("2004-08-01T07:00:00Z"),
"thumbnailUrl": "https://s3.amazonaws.com/AKIAJC5RLADLUMVRPFDQ.book-thumb-images/bauer.jpg",
"shortDescription": "\"2005 Best Java Book!\" -- Java Developer's Journal",
"longDescription": "Hibernate practically exploded on the Java scene. Why is this open-source tool so popular Because it automates a tedious task: persisting your Java objects to a relational database. The inevitable mismatch between your object-oriented code and the relational database requires you to write code that maps one to the other. This code is often complex, tedious and costly to develop. Hibernate does the mapping for you. Not only that, Hibernate makes it easy. Positioned as a layer between your application and your database, Hibernate takes care of loading and saving of objects. Hibernate applications are cheaper, more portable, and more resilient to change. And they perform better than anything you are likely to develop yourself. Hibernate in Action carefully explains the concepts you need, then gets you going. It builds on a single example to show you how to use Hibernate in practice, how to deal with concurrency and transactions, how to efficiently retrieve objects and use caching. The authors created Hibernate and they field questions from the Hibernate community every day - they know how to make Hibernate sing. Knowledge and insight seep out of every pore of this book.",
"status": "PUBLISH",
"authors": ["Christian Bauer", "Gavin King"],
"categories": ["Java"]
}
I want to print title, and authors count where the number of authors is greater than 4.
I used the following command to extract records which has more than 4 authors.
db.books.find({authors:{$exists:true},$where:'this.authors.length>4'},{_id:0,title:1});
But unable to print the number of authors along with the title. I tried to use the following command too. But it gave only the title list.
db.books.find({authors:{$exists:true},$where:'this.authors.length>4'},{_id:0,title:1,'this.authors.length':1});
Could you please help me to print the number of authors here along with the title?
You can use aggregation framework's $project with $size to reshape your data and then $match to apply filtering condition:
db.collection.aggregate([
{
$project: {
title: 1,
authorsCount: { $size: "$authors" }
}
},
{
$match: {
authorsCount: { $gt: 4 }
}
}
])
Mongo Playground
Recently i have designed a database model or ERD using Hackalode.
So the problem I'm currently facing is that base on my current design, i can't query it correctly as I wanted. I studied ERD with MYSQL and do know that Mongo doesn't work the same
The idea was simple, I want a recipe that has a array list of ingredients, and the ingredients are from separate collection.
The recipe also consist of measurement of the ingredient ie. (1 tbps sugar)
Can also query from list of ingredients and find the recipe that contains the ingredients
I wanted this collections to be in Many to Many relationship and the recipe can use the ingredients that are already in the database.
I just don't know how to query the data
I have tried a lot of ways by using $elemMatch and populate and all i get is empty array list as a result.
Im expecting two types of query where i can query by name of ingredients or by the recipe
My expectation result would be like this
[{
id: ...,
name: ....,
description: ...,
macros: [...],
ingredients: [
{
id,
amount: ....,
unit: ....
ingredient: {
id: ....,
name: ....
}
}
}, { ... }]
But instead of getting
[]
Imho, your design is utterly wrong. You over normalized your data. I would do something much simpler and use embedding. The reasoning behind that is that you define your use cases first and then you model your data to answer the question arising from your use cases in the most efficient way.
Assumed use cases
As a user, I want a list of all recipes.
As a user, I want a list of all recipes by ingredient.
As a designer, I want to be able to show a list of all ingredients.
As a user, I want to be able to link to recipes for compound ingredients, should it be present on the site.
Surely, this is just a small excerpt, but it is sufficient for this example.
How to answer the questions
Ok, the first one is extremely simple:
db.recipes.find()[.limit()[.skip()]]
Now, how could we find by ingredient? Simple answer: do a text index on ingredient names (and probably some other fields, as you can only have one text index per collection. Then, the query is equally simple:
db.recipes.find({$text:{$search:"ingredient name"}})
"Hey, wait a moment! How do I get a list of all ingredients?" Let us assume we want a simple list of ingredients, with a number on how often they are actually used:
db.recipes.aggregate([
// We want all ingredients as single values
{$unwind:"$Ingredients"},
// We want the response to be "Ingredient"
{$project:{_id:0,"Ingredient":"$Ingredients.Name"}
// We count the occurrence of each ingredient
// in the recipes
{$group:{_id:"$Ingredient",count:{$sum:1}}}
])
This would actually be sufficient, unless you have a database of gazillions of recipes. In that case, you might want to have a deep look into incremental map/reduce instead of an aggregation. Hint: You should add a timestamp to the recipes to be able to use incremental map/reduce.
If you have a couple of hundred K to a couple of million recipes, you can also add an $out stage to preaggregate your data.
On measurements
Imho, it makes no sense to have defined measurements. There are teaspoons, tablespoons, metric and imperial measurements, groupings like "dozen" or specifications like "clove". Which you really do not want to convert to each other or even set to a limited number of measurements. How many ounces is a clove of garlic? ;)
Bottom line: Make it a free text field, maybe with some autocomplete suggestions.
Revised data model
Recipe
{
_id: new ObjectId(),
Name: "Surf & Turf Kebap",
Ingredients: [
{
Name: "Flunk Steak",
Measurement: "200 g"
},
{
Name: "Prawns",
Measurement: "300g",
Note: "Fresh ones!"
},
{
Name: "Garlic Oil",
Measurement: "1 Tablespoon",
Link: "/recipes/5c2cc4acd98df737db7c5401"
}
]
}
And the example of the text index:
db.recipes.createIndex({Name:"text","Ingredients.Name":"text"})
The theory behind it
A recipe is you basic data structure, as your application is supposed to store and provide them, potentially based on certain criteria. Ingredients and measurements (to the extend where it makes sense) can easily be derived from the recipes. So why bother to store ingredients and measurements independently. It only makes your data model unnecessarily complicated, while not providing any advantage.
hth
I have the following structure in my firestore database:
messages:
m1:
title: "Message 1"
...
archived: false
m2:
title: "Message 2"
...
archived: true
Let's say I have 20k messages and I want to get archived messages using a "where" clause, will my query be slower than if I structured my database as following ?
nonArchivedMessages:
m1:
title: "Message 1"
...
archivedMessages:
m2:
title: "Message 2"
...
Using the second structure seems, to me, more adapted for large datasets but implies issues in some cases, such as getting a message without knowing whether it is archived or not.
One of the guarantees for Cloud Firestore is that the time it takes to retrieve a certain number of documents is not dependent on the total number of documents in the collection.
That means that in your first data model, if you load 100 archived documents and (for example) it takes 1 second, you know that it'll always take about 1 second to load 100 archived documents, no matter how many documents there are in the collection.
With that knowledge the only difference between your two data models is that in the first model you need a query to capture the archived messages, while in the second model you don't need a query. Queries on Cloud Firestore run by accessing an index, so the difference is that there is one (more) index being read in the first data model. While this has a minimal impact on the execution time, it is going to be insignificant compared to the time it takes to actually read the documents and return them to the client.
So: there may be other reasons to prefer the second data model, but the performance to read the archived messages is going to be the same between them.
This has kept me awake until these wee hours.
I want a db to keep track of a soccer tournament.
Each match has two teams, home and away.
Each team can be the home or the away of many matches.
I've got one db, and two document types, "match" (contains: home: teamId and away: teamId) and team (contains: teamId, teamName, etc).
I've managed to write a working view but it would imply adding to each team the id of every match it is involved in, which doesn't make much logical sense - it's such an hack.
Any idea on how this view should be written? I am nearly tempted to just throw the sponge in and use postgres instead.
EDIT: what I want is to have the team info for both the home and away teams, given the id of a match. Pretty easy to do with two calls, but I don't want to make two calls.
Just emit two values in map for each match like this:
function (doc) {
if (!doc.home || !doc.away) return;
emit([doc._id, "home"], { _id: doc.home });
emit([doc._id, "away"], { _id: doc.away });
}
After querying the view for the match id MATCHID with:
curl 'http://localhost:5984/yourdb/_design/yourpp/_view/yourview?startkey=\["MATCHID"\]&endkey=\["MATCHID",\{\}\]&include_docs=true'
you should be able to get both teams' documents in doc fields in list of results (row), possibly like below:
{"total_rows":2,"offset":0,"rows":[
{"id":"MATCHID","key":["MATCHID","home"],"value":{"_id":"first_team_id"},"doc":{...full doc of the first team...}},
{"id":"MATCHID","key":["MATCHID","away"],"value":{"_id":"second_team_id"},"doc":{...full doc of the second team...}}
]}
Check the CouchDB Book: http://guide.couchdb.org/editions/1/en/why.html
It's free and includes answers to a beginner :)
If you like it, consider buying it. Even thought Chris and Jan are so awesome they just put their book out there for free you should still support the great work they did with their book.